A simple trick for MSB

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

A simple trick for MSB

Post by stegemma »

Searching a way to find the most significant bit, I've found this simple trick, that works on bitboard where h1=bit 0, a8=bit 63 and only for direction Sud East (from upper right to bottom left) of bishop moves:

Code: Select all


--------
--------
--------
--------
---B----
--/-----
-*------
o-------

multiply by h column (0x0101010101010101)

o*------
||------
||------
||------
||-B----
||/-----
|*------
o-------

shift by 56

--------
--------
--------
--------
--------
--------
--------
o*------

keep LSB only

--------
--------
--------
--------
--------
--------
--------
-*------

multiply by h column

-|------
-|------
-|------
-|------
-|------
-|------
-|------
-*------

intersect (&) with original bishop path

-|------
-|------
-|------
-|------
-|-B----
-|------
-*------
o|------

here's the MSB = captured piece on bishop path:

--------
--------
--------
--------
--------
--------
-*------
--------

I know: rotated/magic bitboards are better... but I share this little trick to anybody can find useful.

For other bishop directions, the same trick can be used adding a 256 entries array, to handle separately the empty squares and capture square. I've done this on Satana and it gives a little speed-up:

Code: Select all


for&#40;uint64_t i=0; i<0xffULL; i++)
&#123;
	if &#40;i == boEmpty&#41;
	&#123;
		VertMSBE&#91;i&#93; = VertEmptyE&#91;i&#93; = VertEmptyW&#91;i&#93; = boEmpty;
	&#125;else
	&#123;
		uint64_t posE = LastBit&#40;i&#41;;
		uint64_t takenE = LastBit&#40;i ^ posE&#41;;
		VertMSBE&#91;i&#93; = posE * BOARD_H_COL;
		VertEmptyE&#91;i&#93; = (&#40;posE - &#40;takenE == boEmpty ? 1 &#58; takenE&#41;) & ~takenE&#41; * BOARD_H_COL;
		uint64_t posW = FirstBit&#40;i&#41;;
		uint64_t takenW = FirstBit&#40;i ^ posW&#41;;
		VertEmptyW&#91;i&#93; = ((&#40;takenW == boEmpty ? 0x100ULL &#58; takenW&#41; - posW&#41; & ~posW&#41; *BOARD_H_COL;
	&#125;
&#125;
The array VertMSBE substitutes part of the previous trick with precomputed values. The array VertEmptyE is an array of the vertical empty squares in East directions (W for West directions), with the trick to use as index the starting square too. Notes that without using the starting square (those of the bishop) in the index, we would need to have 8 arrays instead of one. The idea is the same of the previous trick: moving the diagonal bits of the slide of the bishop to the first 8 bits (row 1) and then use the array to get all the vertical squares where there are empty squares:

Code: Select all


--------
--------
--------
--------
---B----
--------
------*-
--------

--------
--------
--------
--------
--------
--------
--------
---B--*-

----||--
----||--
----||--
----||--
----||--
----||--
----||--
----||--

Then you simply intersect that bitboard with original bishop slide intersected by all empty squares and you get bishop moves in that direction.

With a little extra work, this could be done for two directions at a time.

For rook moves, you can multiply by a diagonal, instead of the h column.

This idea avoid using MSB function, that is expensive, without intrinsics.
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A simple trick for MSB

Post by bob »

stegemma wrote:Searching a way to find the most significant bit, I've found this simple trick, that works on bitboard where h1=bit 0, a8=bit 63 and only for direction Sud East (from upper right to bottom left) of bishop moves:

Code: Select all


--------
--------
--------
--------
---B----
--/-----
-*------
o-------

multiply by h column &#40;0x0101010101010101&#41;

o*------
||------
||------
||------
||-B----
||/-----
|*------
o-------

shift by 56

--------
--------
--------
--------
--------
--------
--------
o*------

keep LSB only

--------
--------
--------
--------
--------
--------
--------
-*------

multiply by h column

-|------
-|------
-|------
-|------
-|------
-|------
-|------
-*------

intersect (&) with original bishop path

-|------
-|------
-|------
-|------
-|-B----
-|------
-*------
o|------

here's the MSB = captured piece on bishop path&#58;

--------
--------
--------
--------
--------
--------
-*------
--------

I know: rotated/magic bitboards are better... but I share this little trick to anybody can find useful.

For other bishop directions, the same trick can be used adding a 256 entries array, to handle separately the empty squares and capture square. I've done this on Satana and it gives a little speed-up:

Code: Select all


for&#40;uint64_t i=0; i<0xffULL; i++)
&#123;
	if &#40;i == boEmpty&#41;
	&#123;
		VertMSBE&#91;i&#93; = VertEmptyE&#91;i&#93; = VertEmptyW&#91;i&#93; = boEmpty;
	&#125;else
	&#123;
		uint64_t posE = LastBit&#40;i&#41;;
		uint64_t takenE = LastBit&#40;i ^ posE&#41;;
		VertMSBE&#91;i&#93; = posE * BOARD_H_COL;
		VertEmptyE&#91;i&#93; = (&#40;posE - &#40;takenE == boEmpty ? 1 &#58; takenE&#41;) & ~takenE&#41; * BOARD_H_COL;
		uint64_t posW = FirstBit&#40;i&#41;;
		uint64_t takenW = FirstBit&#40;i ^ posW&#41;;
		VertEmptyW&#91;i&#93; = ((&#40;takenW == boEmpty ? 0x100ULL &#58; takenW&#41; - posW&#41; & ~posW&#41; *BOARD_H_COL;
	&#125;
&#125;
The array VertMSBE substitutes part of the previous trick with precomputed values. The array VertEmptyE is an array of the vertical empty squares in East directions (W for West directions), with the trick to use as index the starting square too. Notes that without using the starting square (those of the bishop) in the index, we would need to have 8 arrays instead of one. The idea is the same of the previous trick: moving the diagonal bits of the slide of the bishop to the first 8 bits (row 1) and then use the array to get all the vertical squares where there are empty squares:

Code: Select all


--------
--------
--------
--------
---B----
--------
------*-
--------

--------
--------
--------
--------
--------
--------
--------
---B--*-

----||--
----||--
----||--
----||--
----||--
----||--
----||--
----||--

Then you simply intersect that bitboard with original bishop slide intersected by all empty squares and you get bishop moves in that direction.

With a little extra work, this could be done for two directions at a time.

For rook moves, you can multiply by a diagonal, instead of the h column.

This idea avoid using MSB function, that is expensive, without intrinsics.
The multiplies have to be slower than BSF/BSR. Have you tested the speed carefully?
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: A simple trick for MSB

Post by stegemma »

bob wrote:[...]
The multiplies have to be slower than BSF/BSR. Have you tested the speed carefully?
No, I simply don't want to use intrinsic (for portability) so I haven't compared with BSR.
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: A simple trick for MSB

Post by stegemma »

bob wrote:[...]

The multiplies have to be slower than BSF/BSR. Have you tested the speed carefully?
I've done some measures comparing the old function with the new. The perft from starting position is about 6% faster. An alfabeta search from starting position is almost equal. An alfabeta search from this position:

r1bqk2r/pppp1ppp/2n2n2/2b1p3/2B1P3/2N2N2/PPPP1PPP/R1BQK2R w KQkq - 6 5

is about 4% faster with the new function: 3776 Knps against 3630 Knps (medium after 5 moves in about 7 seconds each, Intel Pentium i7 4790K 4 GHz 8 Gb RAM - no hash tables).

Of course this could only means that previous function was bad... but still in my program I've found a little improvement with this trick and that's good.

Now I'll try to compare with BSR but I think that it will be a lot faster, of course.
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: A simple trick for MSB

Post by stegemma »

Here's the test with intrinsic (_BitScanForward64) both with and without the trick:

Code: Select all

With BSF64 + trick&#58; 3909 Knps
With BSF64 no trick&#58; 3778 Knps
Without BSF64 + trick&#58; 3808 Knps
Without BSF64 no trick&#58; 3648 Knps
The trick has been apply only to MakeBishopMoves (and queen diagonal moves) while Get/PopLastBit are used for Rook moves, check testing and somewhere in evaluation.

I realize that using intrinsic I can speed up the engine almost the same that making better make moves functions (the "trick"). using both gives better results.
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: A simple trick for MSB

Post by Gerd Isenberg »

Nice reflection trick to reverse the bit order.
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: A simple trick for MSB

Post by stegemma »

Gerd Isenberg wrote:Nice reflection trick to reverse the bit order.
This morning I'm trying to extend the trick to all bishop moves, using something that I can call "the billiard trick"... I will post the detail if it works.
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: A simple trick for MSB

Post by stegemma »

This function seems to works but it is very complex. It use a 256 array and doesn't use if statements. The logic is to merge all bits to 'a' column then move it and reflect to '8' row, so that we can extract the highest bit as the lowest one; then we move and reflect again until we can intersect the most significant byte with the columns of the most significant bit. If you look at the binary debug of the temporary values, you find that the bits moves around as they were bounce on the edge of a billiard:

Code: Select all

#define CHDEBUG&#40;a&#41; /* your debug routine */

#define BOARD_A_COL   0x8080808080808080ULL
#define BOARD_B_COL   0x4040404040404040ULL
#define BOARD_C_COL   0x2020202020202020ULL
#define BOARD_D_COL   0x1010101010101010ULL
#define BOARD_E_COL   0x0808080808080808ULL
#define BOARD_F_COL   0x0404040404040404ULL
#define BOARD_G_COL   0x0202020202020202ULL
#define BOARD_H_COL   0x0101010101010101ULL
#define BOARD_1_ROW   0x00000000000000FFULL
#define BOARD_2_ROW   0x000000000000FF00ULL
#define BOARD_3_ROW   0x0000000000FF0000ULL
#define BOARD_4_ROW   0x00000000FF000000ULL
#define BOARD_5_ROW   0x000000FF00000000ULL
#define BOARD_6_ROW   0x0000FF0000000000ULL
#define BOARD_7_ROW   0x00FF000000000000ULL
#define BOARD_8_ROW   0xFF00000000000000ULL

uint64_t VertMSBE&#91;256&#93;;

void Fill&#40;)
&#123;
  for&#40;uint64_t i=0; i<0xffULL; i++)
  &#123;
    if &#40;i == 0&#41;
    &#123;
      VertMSBE&#91;i&#93; = 0;
    &#125;else
    &#123;
      uint64_t posE = LastBit&#40;i&#41;; // call your MSB function
			VertMSBE&#91;i&#93; = posE * BOARD_H_COL;
    &#125;
  &#125;
&#125;

uint64_t MSB&#40;uint64_t bo&#41;
&#123;
  uint64_t ca;
  ca = &#40;bo | bo << 4&#41; & &#40;BOARD_A_COL | BOARD_B_COL | BOARD_C_COL | BOARD_D_COL&#41;;
  ca = &#40;ca | ca << 2&#41; & &#40;BOARD_A_COL | BOARD_B_COL&#41;;
  ca = &#40;ca | ca << 1&#41; & &#40;BOARD_A_COL&#41;; CHDEBUG&#40;ca&#41;;
  uint64_t ch = ca >> 7; CHDEBUG&#40;ch&#41;;
  uint64_t r8 = &#40;ch * H1A8&#41; & BOARD_8_ROW; CHDEBUG&#40;r8&#41;;
  uint64_t mbit = FirstBit&#40;r8&#41;; CHDEBUG&#40;mbit&#41;;
  uint64_t r1a = mbit >> 56; CHDEBUG&#40;r1a&#41;;
  uint64_t ca2 = &#40;r1a * H1A8&#41; & BOARD_A_COL; CHDEBUG&#40;ca2&#41;;
  uint64_t ch2 = ca2 >> 7; CHDEBUG&#40;ch2&#41;;
  uint64_t msbyte = &#40;ch2 * BOARD_1_ROW&#41; & bo; CHDEBUG&#40;msbyte&#41;;
  uint64_t r8b = &#40;msbyte * BOARD_H_COL&#41;; CHDEBUG&#40;r8b&#41;;
  uint64_t r1b = r8b >> 56; CHDEBUG&#40;r1b&#41;;
  uint64_t mask = VertMSBE&#91;r1b&#93;; CHDEBUG&#40;mask&#41;;
  uint64_t msbit = msbyte & mask; CHDEBUG&#40;msbit&#41;;
  return msbit;
&#125;
PS: in this form in effect it is not very useful
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: A simple trick for MSB

Post by stegemma »

A complete test with 32/64 version of the same software, running ont he same machine (i7 4790K), perft 6 from start:

Code: Select all

VS 2013 64 bit - Intrinsic - Nodes/s&#58; 49983343 
VS 2013 64 bit - No intrinsic - Nodes/s&#58; 42582376
VS 2013 32 bit - Intrisic - Nodes/s&#58; 23432458
VS 2013 32 bit - No intrinsic - Nodes/s&#58; 19206375
BCB++ 6.0 32 bit - No Intrinsic - Nodes/s&#58; 13727697
The recent Visual Studio compiles faster code even at 32 bit and without intrinsic, than the old Borland C++ Builder (and that was expected, of course, because there are about 20 years of evolution in between). The difference between 64/32 bits is more than 2 times faster for the 64 bit version.

I would like to compare with the xCode on OS X version but the CPU is different (i3 at 3.06 GHz versus i7 at 4.0 GHz). Maybe more interesting would be to compare with gcc/intel compiler on windows 64 bit but I have not so many spare time to do it.
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: A simple trick for MSB

Post by stegemma »

Here's my code for first/last bit. Changing USE_INTRINSIC and QIDX you can test various version of the same functions. The one choose are those faster, on my environment. Some function of QIDX are just copy from the web but I don't use it, they seems to be slower than other solutions:

Code: Select all

inline int GetSquareCol&#40;const uint64_t &boSquare&#41;
&#123;
	uint64_t bo = &#40;boSquare * BOARD_H_COL&#41; >> 56;
	return log2ff&#91;&#40;int&#41;bo&#93; - 1;
&#125;

inline int GetSquareRow&#40;const uint64_t &boSquare&#41;
&#123;
#if 0
	return &#40;GetSquareCol&#40;((&#40;boSquare * BOARD_1_ROW&#41; & BOARD_A_COL&#41; * H1A8&#41; >> 56&#41; - 1&#41; & 0x07;
#else
	int row = 0;
	uint64_t col = boSquare;
	if (&#40;col & 0xffffffffULL&#41; == EMPTY&#41;
	&#123;
		col >>= 32;
		row = 4;
	&#125;
	if (&#40;col & 0xffffULL&#41; == EMPTY&#41;
	&#123;
		col >>= 16;
		row += 2;
	&#125;
	if (&#40;col & 0xffULL&#41; == EMPTY&#41;
	&#123;
		row += 1;
	&#125;
	return row;
#endif
&#125;

inline uint64_t FirstBit&#40;const uint64_t &bo&#41;
&#123;
	// const uint64_t k = bo - 1;		
	// 10011100 bo
	// 10011011 k = bo-1
	// 10011111 | bo
	// 00000100 ^ k
	//return &#40;k | bo&#41; ^ k;

	// 10011100 bo
	// 10011011 k = bo-1
	// 01100100 ~ bo - 1
	// 00000100 & k
	return bo & ~&#40;bo - 1&#41;;
&#125;

inline uint64_t PopFirstBit&#40;uint64_t &boBits&#41;
&#123;
	uint64_t boFirst = FirstBit&#40;boBits&#41;;
	boBits ^= boFirst;
	return boFirst;
&#125;

inline uint64_t LastBit&#40;const uint64_t &bo&#41;
&#123;
#if USE_INTRINSIC
	unsigned long index;
#ifdef _M_X64
	if (_BitScanReverse64&#40;&index, bo&#41;) return squares&#91;index&#93;; // index ? &#40;1ULL << index&#41; &#58; 1;
#else
	if (_BitScanReverse&#40;&index, bo >> 32&#41;) return squares&#91;index + 32&#93;;
	if (_BitScanReverse&#40;&index, &#40;uint32_t&#41;bo&#41;) return squares&#91;index&#93;;
#endif
	return 0;
#else
	uint64_t k = bo;
	uint64_t boLast;
	for &#40;boLast = 0ULL; k != 0ULL; boLast = PopFirstBit&#40;k&#41;);
	return boLast;
#endif
&#125;

inline uint64_t PopLastBit&#40;uint64_t &boBits&#41;
&#123;
	uint64_t boLast = LastBit&#40;boBits&#41;;
	boBits ^= boLast;
	return boLast;
&#125;

#if USE_INTRINSIC
inline int GetSquareIndex&#40;const uint64_t &boSquare&#41;
&#123;
	unsigned long idx;
#ifdef _M_X64
	return _BitScanForward64&#40;&idx, boSquare&#41; == 0 ? 0 &#58; idx;
#else
	return _BitScanForward&#40;&idx, boSquare >> 32&#41; == 0 ? (_BitScanForward&#40;&idx, uint32_t&#40;boSquare&#41;) == 0 ? 0 &#58; idx&#41; &#58; idx + 32;
#endif
&#125;
#else
#define QIDX 3
#if QIDX==1
inline int GetSquareIndex&#40;const uint64_t &boSquare&#41;
&#123;
	/**
	* trailingZeroCount
	* @param bb bitboard to scan
	* @return index &#40;0..63&#41; of least significant one bit, 64 if bb is zero
	*/
	static const int lookup67&#91;67 + 1&#93; = &#123;
		64, 0, 1, 39, 2, 15, 40, 23,
		3, 12, 16, 59, 41, 19, 24, 54,
		4, -1, 13, 10, 17, 62, 60, 28,
		42, 30, 20, 51, 25, 44, 55, 47,
		5, 32, -1, 38, 14, 22, 11, 58,
		18, 53, 63, 9, 61, 27, 29, 50,
		43, 46, 31, 37, 21, 57, 52, 8,
		26, 49, 45, 36, 56, 7, 48, 35,
		6, 34, 33, -1 &#125;;
	return lookup67&#91;&#40;boSquare & -boSquare&#41; % 67&#93;;
&#125;
#elif QIDX==2
#include <intrin.h>
#ifdef __INTRIN_H_x
inline int GetSquareIndex&#40;const uint64_t &boSquare&#41;
&#123;
	//unsigned char _BitScanForward64&#40;unsigned long * Index,  unsigned __int64 Mask&#41;;
	//unsigned char _BitScanReverse64&#40;unsigned long * Index, unsigned __int64 Mask&#41;; 1
	unsigned long index /* = 0 */;
	/* if&#40;boSquare!=0ull&#41; */ _BitScanForward64&#40;&index, boSquare&#41;;
	return index;
&#125;
#endif
#elif QIDX==3
inline int GetSquareIndex&#40;const uint64_t &boSquare&#41;
&#123;
	return GetSquareRow&#40;boSquare&#41; * 8 + GetSquareCol&#40;boSquare&#41;;
&#125;
#elif QIDX==4
inline int GetSquareIndex&#40;const uint64_t &boSquare&#41;
&#123;
	const int deb&#91;64&#93; =
	&#123;
		0, 1, 2, 53, 3, 7, 54, 27, 4, 38, 41, 8, 34, 55, 48, 28, 62, 5, 39, 46, 44, 42, 22, 9, 24, 35, 59, 56, 49, 18, 29, 11,
		63, 52, 6, 26, 37, 40, 33, 47, 61, 45, 43, 21, 23, 58, 17, 10, 51, 25, 36, 32, 60, 20, 57, 16, 50, 31, 19, 15, 30, 14, 13, 12,
	&#125;;
	return deb&#91;(&#40;boSquare & -boSquare&#41; * 0x022fdd63cc95386d&#41; >> 58&#93;;
&#125;
#endif
#endif // USE_INTRINSIC
PS: GetSquareRow is not optimal, it can be faster but I don't use it with intrinsic.
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com