More uses for magics?

Jacob · Post by **Jacob** » Mon Jul 30, 2007 6:15 am

I was looking through Pradu's magic move generator and it got me thinking; what if instead of returning a bitboard for a given square and occupancy, you return a score with a smaller data-type based on such factors as mobility and piece-placement? For rooks you could even pre-compute scores for open/half-open files and the like. With a one-byte score, the bishop database is only 32kb, and 256kb for the rook database.

Michael Sherwin · Post by **Michael Sherwin** » Mon Jul 30, 2007 7:02 am

AFAIK, there is nothing new in what you propose.

Nice thinking though!

Jacob · Post by **Jacob** » Mon Jul 30, 2007 7:08 am

Oops :-/, I'm still fairly new to all of this, so it'd probably be best to keep quiet, hehe,... Thanks for letting me know.

Michael Sherwin · Post by **Michael Sherwin** » Tue Jul 31, 2007 1:19 am

Jacob wrote:Oops :-/, I'm still fairly new to all of this, so it'd probably be best to keep quiet, hehe,... Thanks for letting me know.

Don't be quiet. What if it really was new, then we wouldn't find out about it!

Jacob · Post by **Jacob** » Sat Aug 04, 2007 8:49 pm

Don't be quiet. What if it really was new, then we wouldn't find out about it!

Then if one day a new beta program comes out that smokes Rybka, you'll know why =). ... I feel a bit intimidated too since I'm just a high school graduate among all you professional programmers :-/.

Another idea I had was a way to generate pawn moves without separate cases for black and white. The basic principle is simple enough; for white pawns, the pawn bitboard is shifted and compared to the occupancy. For black pawns, the occupancy is shifted by the same amount and compared to the pawn bitboard. I used a 17 element array to store my piece bitboards, which allows me to use some simple algebra to generate the pawn moves.

Here's some code:

Code: Select all

U32* GenNonCaptures&#40;U32* list&#41;
&#123;
	U64 pc, moves;
	int sqr&#91;2&#93;; // from, to

	// ...

	// - Pawns forward one -
	pieces&#91;TEMP&#93; = &#40;pieces&#91;xoccu_wp&#91;side&#93;&#93; << 8&#41; 
		& pieces&#91;xoccu_bp&#91;xside&#93;&#93;;

	moves = pieces&#91;TEMP&#93; & xprom_mask&#91;side&#93;;
	while &#40;moves&#41; &#123;
		sqr&#91;side&#93; = PopLSB&#40;&moves&#41;;
		sqr&#91;xside&#93; = sqr&#91;side&#93; - 8;
		AddMove&#40;sqr&#91;0&#93;, sqr&#91;1&#93;, pos&#91;sqr&#91;0&#93;&#93;);
	&#125;

	moves = pieces&#91;TEMP&#93; & prom_mask&#91;side&#93;;
	while &#40;moves&#41; &#123;
		sqr&#91;side&#93; = PopLSB&#40;&moves&#41;;
		sqr&#91;xside&#93; = sqr&#91;side&#93; - 8;
		AddUnderProm&#40;sqr&#91;0&#93;, sqr&#91;1&#93;, PROM|side, 0&#41;;
	&#125;

	// - Pawns forward two -
	moves = (&#40;pieces&#91;xoccu_tmp&#91;side&#93;&#93; & mask_5_3&#91;side&#93;) << &#40;8 << xside&#41;) 
		& pieces&#91;xoccu_tmp&#91;xside&#93;&#93;;

	while &#40;moves&#41; &#123;
		sqr&#91;side&#93; = PopLSB&#40;&moves&#41;;
		sqr&#91;xside&#93; = sqr&#91;side&#93; - 16;
		AddMove&#40;sqr&#91;0&#93;, sqr&#91;1&#93;, pos&#91;sqr&#91;0&#93;&#93;);
	&#125;

	return list;
&#125;

Code: Select all

U32* GenCaptures&#40;U32* list&#41;
&#123;
	U64 pc, moves;
	int sqr&#91;2&#93;; // from, to

	// ...

	// - Pawns forward one -
	pieces&#91;TEMP&#93; = &#40;pieces&#91;xoccu_wp&#91;side&#93;&#93; << 8&#41; 
		& pieces&#91;xoccu_bp&#91;xside&#93;&#93;;

	moves = pieces&#91;TEMP&#93; & prom_mask&#91;side&#93;;
	while &#40;moves&#41; &#123;
		sqr&#91;side&#93; = PopLSB&#40;&moves&#41;;
		sqr&#91;xside&#93; = sqr&#91;side&#93; - 8;
		AddProm&#40;sqr&#91;0&#93;, sqr&#91;1&#93;, PROM|side, 0&#41;;
	&#125;

	// - Pawns seven-shift -
	pc = (&#40;pieces&#91;woccu_wp&#91;side&#93;&#93; & 0xFEFEFEFEFEFEFEFEULL&#41; << 7&#41;
		& pieces&#91;boccu_bp&#91;xside&#93;&#93;;

	moves = pc & xprom_mask&#91;side&#93;;
	while &#40;moves&#41; &#123;
		sqr&#91;side&#93; = PopLSB&#40;&moves&#41;;
		sqr&#91;xside&#93; = sqr&#91;side&#93; - 7;
		AddCapt&#40;sqr&#91;0&#93;, sqr&#91;1&#93;, PAWN|side, pos&#91;sqr&#91;1&#93;&#93;);
	&#125;

	moves = pc & prom_mask&#91;side&#93;;
	while &#40;moves&#41; &#123;
		sqr&#91;side&#93; = PopLSB&#40;&moves&#41;;
		sqr&#91;xside&#93; = sqr&#91;side&#93; - 7;
		AddProm&#40;sqr&#91;0&#93;, sqr&#91;1&#93;, PROM|side, pos&#91;sqr&#91;1&#93;&#93;);
	&#125;

	// - Pawns nine-shift -
	pc = (&#40;pieces&#91;woccu_wp&#91;side&#93;&#93; & 0x7F7F7F7F7F7F7F7FULL&#41; << 9&#41;
		& pieces&#91;boccu_bp&#91;xside&#93;&#93;;

	moves = pc & xprom_mask&#91;side&#93;;
	while &#40;moves&#41; &#123;
		sqr&#91;side&#93; = PopLSB&#40;&moves&#41;;
		sqr&#91;xside&#93; = sqr&#91;side&#93; - 9;
		AddCapt&#40;sqr&#91;0&#93;, sqr&#91;1&#93;, PAWN|side, pos&#91;sqr&#91;1&#93;&#93;);
	&#125;

	moves = pc & prom_mask&#91;side&#93;;
	while &#40;moves&#41; &#123;
		sqr&#91;side&#93; = PopLSB&#40;&moves&#41;;
		sqr&#91;xside&#93; = sqr&#91;side&#93; - 9;
		AddProm&#40;sqr&#91;0&#93;, sqr&#91;1&#93;, PROM|side, pos&#91;sqr&#91;1&#93;&#93;);
	&#125;

	// - En Passant -
	if &#40;status&#91;ply&#93;.ep_t&#41; &#123;
		pc = pawn_attacks&#91;xside&#93;&#91;status&#91;ply&#93;.ep_t&#93; & pieces&#91;PAWN|side&#93;;

		while &#40;pc&#41; &#123;
			sqr&#91;0&#93; = PopLSB&#40;&pc&#41;;
			AddCapt&#40;sqr&#91;0&#93;, status&#91;ply&#93;.ep_t, PAWN|side, EP_CAPT|side&#41;;
		&#125;
	&#125;

	return list;
&#125;

I threw together a perft program using this concept, I could post some results/link to my source if anyone's interested.

Edsel Apostol · Post by **Edsel Apostol** » Sun Aug 05, 2007 12:35 pm

Hi Jacob,

Welcome to the club.

Hope you would be successful in your chess engine programming.

I am a chess engine author too and I'm looking for better ways to implement faster move and attack generations in my program.

There are a lot of pioneering work here regarding magic moves by Pradu Kannan, Michael Sherwin, etc..

I don't know which implementation is the best. I am currently using now the Kindergarten bitboard ideas for sliding pieces by Gerd Isenberg.

I am interested in your idea. Would you mind sharing your code and test results?

Edsel Apostol

Michael Sherwin · Post by **Michael Sherwin** » Sun Aug 05, 2007 10:05 pm

Hi Jacob,

This is interesting! It may be very similar to the way Crafty does it, but I have not looked at Crafty's code for a couple of years now, so I can not be sure. I have saved this page and will study it more closely when I have some time.

OBTW, I am not a professional programmer. I am just a self taught amateur. And not even all that bright. I just wish that I had got started at your age!

Jacob · Post by **Jacob** » Tue Aug 07, 2007 7:23 pm

It may be very similar to the way Crafty does it, but I have not looked at Crafty's code for a couple of years now, so I can not be sure.

Yes, it is similar to Crafty, except that Crafty generates white and black moves separately, and it doesn't use a single array for all the pieces.

Here's the source to "Bitboard Perft". The magic move files are not my original source and were written by Pradyumna Kannan.

main.c
magicmoves.h
magicmoves.c

It can accept an fen-string (without quotes) as an argument or defaults to the start position and begins perfting infinitely. The fen-parsing is very primitive and should break on invalid fens

.

Here are some results in 32-bit (Visual Studio 2006) and 64-bit mode (GCC under amd64 Fedora). My processor is a dual-core Conroe (not sure of the exact type) running at 3.4ghz. I'm comparing this to my chess engine, which are close to identical except that Étude generates white/black moves separately, is written in C++ with some classes, and doesn't use an array to store the piece boards. Both move generators generate legal moves, detect checks in leaf nodes, and generate a partial list of pins in the leaf nodes (pins against the side to move). They also use the debruijn hashing method for bitscans.

Test positions:
A: (depth 5) r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq -
B: (depth 7) r3k2r/p6p/8/8/8/8/P6P/R3K2R w - - 0 1 (which can be compared to engines in another post).
C: (depth 6) rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

Étude (64 bit):
A: 5. 193690690 (7.86s)
B: 7. 528388659 (20.59s)
C: 6. 119060324 (4.49s)

Bitboard Perft (64 bit):
A: 5. 193690690 (7.06s)
B: 7. 528388659 (18.41s)
C: 6. 119060324 (4.19s)

Étude (32 bit):
A: 5. 193690690 (13.39s)
B: 7. 528388659 (36.56s)
C: 6. 119060324 (8.11s)

Bitboard Perft (32 bit):
A: 5. 193690690 (14.08s)
B: (misplaced

)
C: 6. 119060324 (8.63s)

I'm surprised that Bitboard Perft ran faster in 64-bit mode, I wonder if it has something to do with the compiler,... Or the increase in available registers?

Pradu · Post by **Pradu** » Tue Aug 07, 2007 11:34 pm

Hi Jacob

Jacob wrote:
It may be very similar to the way Crafty does it, but I have not looked at Crafty's code for a couple of years now, so I can not be sure.
Yes, it is similar to Crafty, except that Crafty generates white and black moves separately, and it doesn't use a single array for all the pieces.

Here's the source to "Bitboard Perft". The magic move files are not my original source and were written by Pradyumna Kannan.

main.c
magicmoves.h
magicmoves.c

It can accept an fen-string (without quotes) as an argument or defaults to the start position and begins perfting infinitely. The fen-parsing is very primitive and should break on invalid fens .

Here are some results in 32-bit (Visual Studio 2006) and 64-bit mode (GCC under amd64 Fedora). My processor is a dual-core Conroe (not sure of the exact type) running at 3.4ghz. I'm comparing this to my chess engine, which are close to identical except that Étude generates white/black moves separately, is written in C++ with some classes, and doesn't use an array to store the piece boards. Both move generators generate legal moves, detect checks in leaf nodes, and generate a partial list of pins in the leaf nodes (pins against the side to move). They also use the debruijn hashing method for bitscans.

For 64-bits you could try using intrinsics or processor instructions for the bitscan. It should be quite a bit faster on your processor:

Code: Select all

#ifndef INLINE
	#ifdef _MSC_VER
		#define INLINE __forceinline
	#elif defined&#40;__GNUC__)
		#define INLINE __inline__ __attribute__(&#40;always_inline&#41;)
	#else
		#define INLINE inline
	#endif
#endif

#ifndef __64_BIT_INTEGER_DEFINED__
	#define __64_BIT_INTEGER_DEFINED__
	#if defined&#40;_MSC_VER&#41; && _MSC_VER<1300
		typedef unsigned __int64 U64; //For the old microsoft compilers
	#else
		typedef unsigned long long  U64; //Supported by MSC 13.00+ and GCC
	#endif //defined&#40;_MSC_VER&#41; && _MSC_VER<1300
#endif //__64_BIT_INTEGER_DEFINED__


/*Note that MSC interprets long as a 32-bit type and GCC interprets long as a 64-bit type. However, both interpret int as a 32-bit type.  This is why the prototype definition for the GCC implementation of the intrinsics use "int" instead of the "long" used in Microsoft's documentation.*/
#ifdef _MSC_VER
	#include <intrin.h>
	#pragma message&#40;"MSC compatible compiler detected -- turning off warning 4146")
	#pragma warning&#40; disable &#58; 4146&#41;
	#if defined&#40;_WIN64&#41; || defined&#40;__LP64__) 
		#pragma intrinsic&#40;_BitScanForward64&#41;
		#pragma intrinsic&#40;_BitScanReverse64&#41;
		#define USING_INTRINSICS
	#endif
#elif defined&#40;__GNUC__) && defined&#40;__LP64__)
	static INLINE unsigned char _BitScanForward64&#40;unsigned int* const Index, const U64 Mask&#41;
	&#123;
		U64 Ret;
		__asm__
		(
			"bsfq %&#91;Mask&#93;, %&#91;Ret&#93;"
			&#58;&#91;Ret&#93; "=r" &#40;Ret&#41;
			&#58;&#91;Mask&#93; "mr" &#40;Mask&#41;
		);
		*Index = &#40;unsigned int&#41;Ret;
		return Mask?1&#58;0;
	&#125;
	static INLINE unsigned char _BitScanReverse64&#40;unsigned int* const Index, const U64 Mask&#41;
	&#123;
		U64 Ret;
		__asm__
		(
			"bsrq %&#91;Mask&#93;, %&#91;Ret&#93;"
			&#58;&#91;Ret&#93; "=r" &#40;Ret&#41;
			&#58;&#91;Mask&#93; "mr" &#40;Mask&#41;
		);
		*Index = &#40;unsigned int&#41;Ret;
		return Mask?1&#58;0;
	&#125;
	#define USING_INTRINSICS
#endif

Also you could try the Intel compiler for Linux, which is free. It seems to optimize better than GCC--atleast for my program.

Test positions:
A: (depth 5) r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq -
B: (depth 7) r3k2r/p6p/8/8/8/8/P6P/R3K2R w - - 0 1 (which can be compared to engines in another post).
C: (depth 6) rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

Étude (64 bit):
A: 5. 193690690 (7.86s)
B: 7. 528388659 (20.59s)
C: 6. 119060324 (4.49s)

Bitboard Perft (64 bit):
A: 5. 193690690 (7.06s)
B: 7. 528388659 (18.41s)
C: 6. 119060324 (4.19s)

Étude (32 bit):
A: 5. 193690690 (13.39s)
B: 7. 528388659 (36.56s)
C: 6. 119060324 (8.11s)

Bitboard Perft (32 bit):
A: 5. 193690690 (14.08s)
B: (misplaced )
C: 6. 119060324 (8.63s)

I'm surprised that Bitboard Perft ran faster in 64-bit mode, I wonder if it has something to do with the compiler,... Or the increase in available registers?

Bitboard engines should naturally be faster on 64-bit processors because like you said you have 64-bit registers and operations. However, I guess 32-bit to 64-bit speedup can vary significantly between bitboard programs so it probably won't do much good trying to compare an engine's 32-bit to 64-bit speedup to other bitboard engines.

Jacob · Post by **Jacob** » Wed Aug 08, 2007 3:14 am

For 64-bits you could try using intrinsics or processor instructions for the bitscan. It should be quite a bit faster on your processor:

Code: Select all

&#40;omitted code...)
#if defined&#40;__GNUC__) && defined&#40;__LP64__)
   static INLINE unsigned char _BitScanForward64&#40;unsigned int* const Index, const U64 Mask&#41;
   &#123;
      U64 Ret;
      __asm__
      (
         "bsfq %&#91;Mask&#93;, %&#91;Ret&#93;"
         &#58;&#91;Ret&#93; "=r" &#40;Ret&#41;
         &#58;&#91;Mask&#93; "mr" &#40;Mask&#41;
      );
      *Index = &#40;unsigned int&#41;Ret;
      return Mask?1&#58;0;
   &#125;
   #define USING_INTRINSICS
#endif

Thanks, that's exactly what I've been looking for

. I only have 32-bit windows, but using the assembly under Fedora yields around a 5-10% speed increase in perft with make/unmake in leaf nodes, but up to 50% without the last make/unmake.

Bitboard engines should naturally be faster on 64-bit processors because like you said you have 64-bit registers and operations.

I apologize, I wasn't very clear. I was surprised that Bitboard Perft ran faster than Etude in 64-bit mode but slower in 32-bit mode, considering that they're very, very similar, and Bitboard Perft seems to be doing more work in the move generation because of having to swap occupancy/piece boards for the generalized pawn move generation/etc. I'll try in the Intel compiler when I get a chance, thanks for the suggestion.

With the code updates (I'll get them uploaded tonight):

Code: Select all

r3k2r/p6p/8/8/8/8/P6P/R3K2R w - - 0 1
&#40;compare to http&#58;//64.68.157.89/forum/viewtopic.php?t=13829&#41;
&#40;64 bit mode&#41;
 1.             14    0.00s
 2.            192    0.00s
 3.           3466    0.00s
 4.          60120    0.00s
 5.        1223784    0.04s
 6.       24161970    0.80s
 7.      528388659   16.87s

rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
No make/undo in leaf nodes
 1.             20    0.00s
 2.            400    0.00s
 3.           8902    0.00s
 4.         197281    0.00s
 5.        4865609    0.05s
 6.      119060324    1.02s
 7.     3195901860   26.52s

HG Muller's QuickPerft results for comparison&#58;
perft&#40;1&#41;=                  20 ( 0.000 sec&#41;
perft&#40;2&#41;=                 400 ( 0.000 sec&#41;
perft&#40;3&#41;=                8902 ( 0.000 sec&#41;
perft&#40;4&#41;=              197281 ( 0.000 sec&#41;
perft&#40;5&#41;=             4865609 ( 0.046 sec&#41;
perft&#40;6&#41;=           119060324 ( 0.844 sec&#41;
perft&#40;7&#41;=          3195901860 &#40;22.828 sec&#41;
- Show quoted text -

Bitboard Perft is only 15% slower than QuickPerft under these conditions, I wonder if I can make it faster,...

More uses for magics?

More uses for magics?

Re: More uses for magics?

Re: More uses for magics?

Re: More uses for magics?

Generalized bitboard pawn moves

Re: Generalized bitboard pawn moves

Re: Generalized bitboard pawn moves

Re: Generalized bitboard pawn moves

Re: Generalized bitboard pawn moves

Re: Generalized bitboard pawn moves