Magic BitBoards and memory

ericlangedijk · Post by **ericlangedijk** » Wed Jun 01, 2011 12:00 pm

Finally I understand wha'ts going on in the magic bitboards / magic multipliers. When contemplating it, it's more simple than rotated bitboards.

Is it useful to create a 3-dimensional array 2MB table for rookmoves and bishopmoves, combining the directions (ranks+files and a1-h8+a8-h1)?
Is there any speed gain? Does anyone do that?

micron · Post by **micron** » Wed Jun 01, 2011 1:20 pm

ericlangedijk wrote: Is it useful to create a 3-dimensional array 2MB table for rookmoves and bishopmoves, combining the directions (ranks+files and a1-h8+a8-h1)?
Is there any speed gain? Does anyone do that?

Magic arrays tend to be indexed with square numbers 0..63. The extra dimension from your idea would slightly slow the address calculation.

The plainest and clearest magic is a fixed-shift large-arrays form, summarised by Gerd Isenberg in the thread
http://www.talkchess.com/forum/viewtopi ... =0&t=35858

Code: Select all

U64 mBishopAttacks[64] [512]; // 256 K 
U64 mRookAttacks  [64][4096]; // 2048K 
  
struct SMagic { 
   U64 mask;  // to mask relevant squares of both lines (no outer squares) 
   U64 magic; // magic 64-bit factor 
}; 

SMagic mBishopTbl[64]; 
SMagic mRookTbl  [64]; 

U64 bishopAttacks(U64 occ, enumSquare sq) { 
   occ &= mBishopTbl[sq].mask; 
   occ *= mBishopTbl[sq].magic; 
   occ >>= 64-9; 
   return mBishopAttack[sq][occ]; 
}

On modern hardware with a big L3 cache, it performs as well as more complicated schemes.

Robert P.

bob · Post by **bob** » Wed Jun 01, 2011 9:09 pm

ericlangedijk wrote:Finally I understand wha'ts going on in the magic bitboards / magic multipliers. When contemplating it, it's more simple than rotated bitboards.

Is it useful to create a 3-dimensional array 2MB table for rookmoves and bishopmoves, combining the directions (ranks+files and a1-h8+a8-h1)?
Is there any speed gain? Does anyone do that?

3D arrays should be avoided when possible. If you have an array:

int a[100][200][300]

and you want to access element a[x][y][z]

You need to compute this:

offset = (x * 200 + y) * 300 + z

2 multiplies. That's not very fast overall.

sje · Post by **sje** » Thu Jun 02, 2011 2:26 am

Accessing arrays which have more than one dimension is more efficient when:

1) The bottom N-1 (i.e., most rapidly changing stride) dimensions are each a power of two, and

2) The element size is a power of two.

With both of the above constraints, only shifting and addition is needed for indexing; not multiplication.

micron · Post by **micron** » Thu Jun 02, 2011 7:59 am

The general case indeed requires 2 multiplications to calculate the address

Code: Select all

UInt64    gArray[100][200][300]; 
 
void Foo( int i, int j, int k ) 
{ 
  gArray[i][j][k] = 0; 
}  

_Foo:
	pushq	%rbp
	movq	%rsp, %rbp
	movslq	%esi, %rax
	imulq	$2400, %rax, %rax
	movslq	%edi, %rcx
	imulq	$480000, %rcx, %rcx
	addq	_gArray@GOTPCREL(%rip), %rcx
	addq	%rax, %rcx
	movslq	%edx, %rax
	movq	$0, (%rcx,%rax,8)
	xorl	%eax, %eax
	popq	%rbp
	ret

But modern compilers work hard to minimise or avoid explicit multiplications, if shifts and adds can be deployed instead. Clang's address calculation is custom-tailored to the array structure.

Code: Select all

UInt64    gArray[3][5][3]

_Foo:
	pushq	%rbp
	movq	%rsp, %rbp
	movslq	%esi, %rax
	leaq	(%rax,%rax,2), %rax
	movslq	%edi, %rcx
	imulq	$120, %rcx, %rcx
	addq	_gArray@GOTPCREL(%rip), %rcx
	leaq	(%rcx,%rax,8), %rax
	movslq	%edx, %rcx
	movq	$0, (%rax,%rcx,8)
	xorl	%eax, %eax
	popq	%rbp
	ret

Code: Select all

 UInt64    gArray[3][3][3]; 

_Foo:
	pushq	%rbp
	movq	%rsp, %rbp
	movslq	%esi, %rax
	leaq	(%rax,%rax,2), %rax
	movslq	%edi, %rcx
	leaq	(%rcx,%rcx,8), %rcx
	shlq	$3, %rcx
	addq	_gArray@GOTPCREL(%rip), %rcx
	leaq	(%rcx,%rax,8), %rax
	movslq	%edx, %rcx
	movq	$0, (%rax,%rcx,8)
	xorl	%eax, %eax
	popq	%rbp
	ret

Code: Select all

UInt64    gArray[3][4][4]; 

_Foo:
	pushq	%rbp
	movq	%rsp, %rbp
	movslq	%esi, %rax
	shlq	$5, %rax
	movslq	%edi, %rcx
	shlq	$7, %rcx
	addq	_gArray@GOTPCREL(%rip), %rcx
	addq	%rax, %rcx
	movslq	%edx, %rax
	movq	$0, (%rcx,%rax,8)
	xorl	%eax, %eax
	popq	%rbp
	ret

The last case has the most efficient-looking assembler, because (as Steven Edwards said) the last N-1 dimensions are powers of 2.
Robert P.

Magic BitBoards and memory

Magic BitBoards and memory

Re: Magic BitBoards and memory

Re: Magic BitBoards and memory

Re: Magic BitBoards and memory

Re: Magic BitBoards and memory