Magic BitBoards and memory

Discussion of chess software programming and technical issues.

Moderator: Ras

ericlangedijk

Magic BitBoards and memory

Post by ericlangedijk »

Finally I understand wha'ts going on in the magic bitboards / magic multipliers. When contemplating it, it's more simple than rotated bitboards.

Is it useful to create a 3-dimensional array 2MB table for rookmoves and bishopmoves, combining the directions (ranks+files and a1-h8+a8-h1)?
Is there any speed gain? Does anyone do that?
micron
Posts: 155
Joined: Mon Feb 15, 2010 9:33 am
Location: New Zealand

Re: Magic BitBoards and memory

Post by micron »

ericlangedijk wrote: Is it useful to create a 3-dimensional array 2MB table for rookmoves and bishopmoves, combining the directions (ranks+files and a1-h8+a8-h1)?
Is there any speed gain? Does anyone do that?
Magic arrays tend to be indexed with square numbers 0..63. The extra dimension from your idea would slightly slow the address calculation.

The plainest and clearest magic is a fixed-shift large-arrays form, summarised by Gerd Isenberg in the thread
http://www.talkchess.com/forum/viewtopi ... =0&t=35858

Code: Select all

U64 mBishopAttacks[64] [512]; // 256 K 
U64 mRookAttacks  [64][4096]; // 2048K 
  
struct SMagic { 
   U64 mask;  // to mask relevant squares of both lines (no outer squares) 
   U64 magic; // magic 64-bit factor 
}; 

SMagic mBishopTbl[64]; 
SMagic mRookTbl  [64]; 

U64 bishopAttacks(U64 occ, enumSquare sq) { 
   occ &= mBishopTbl[sq].mask; 
   occ *= mBishopTbl[sq].magic; 
   occ >>= 64-9; 
   return mBishopAttack[sq][occ]; 
}
On modern hardware with a big L3 cache, it performs as well as more complicated schemes.

Robert P.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Magic BitBoards and memory

Post by bob »

ericlangedijk wrote:Finally I understand wha'ts going on in the magic bitboards / magic multipliers. When contemplating it, it's more simple than rotated bitboards.

Is it useful to create a 3-dimensional array 2MB table for rookmoves and bishopmoves, combining the directions (ranks+files and a1-h8+a8-h1)?
Is there any speed gain? Does anyone do that?
3D arrays should be avoided when possible. If you have an array:

int a[100][200][300]

and you want to access element a[x][y][z]

You need to compute this:

offset = (x * 200 + y) * 300 + z

2 multiplies. That's not very fast overall.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Magic BitBoards and memory

Post by sje »

Accessing arrays which have more than one dimension is more efficient when:

1) The bottom N-1 (i.e., most rapidly changing stride) dimensions are each a power of two, and

2) The element size is a power of two.

With both of the above constraints, only shifting and addition is needed for indexing; not multiplication.
micron
Posts: 155
Joined: Mon Feb 15, 2010 9:33 am
Location: New Zealand

Re: Magic BitBoards and memory

Post by micron »

The general case indeed requires 2 multiplications to calculate the address

Code: Select all

UInt64    gArray[100][200][300]; 
 
void Foo( int i, int j, int k ) 
{ 
  gArray[i][j][k] = 0; 
}  

_Foo:
	pushq	%rbp
	movq	%rsp, %rbp
	movslq	%esi, %rax
	imulq	$2400, %rax, %rax
	movslq	%edi, %rcx
	imulq	$480000, %rcx, %rcx
	addq	_gArray@GOTPCREL(%rip), %rcx
	addq	%rax, %rcx
	movslq	%edx, %rax
	movq	$0, (%rcx,%rax,8)
	xorl	%eax, %eax
	popq	%rbp
	ret
But modern compilers work hard to minimise or avoid explicit multiplications, if shifts and adds can be deployed instead. Clang's address calculation is custom-tailored to the array structure.

Code: Select all

UInt64    gArray[3][5][3]

_Foo:
	pushq	%rbp
	movq	%rsp, %rbp
	movslq	%esi, %rax
	leaq	(%rax,%rax,2), %rax
	movslq	%edi, %rcx
	imulq	$120, %rcx, %rcx
	addq	_gArray@GOTPCREL(%rip), %rcx
	leaq	(%rcx,%rax,8), %rax
	movslq	%edx, %rcx
	movq	$0, (%rax,%rcx,8)
	xorl	%eax, %eax
	popq	%rbp
	ret

Code: Select all

 UInt64    gArray[3][3][3]; 

_Foo:
	pushq	%rbp
	movq	%rsp, %rbp
	movslq	%esi, %rax
	leaq	(%rax,%rax,2), %rax
	movslq	%edi, %rcx
	leaq	(%rcx,%rcx,8), %rcx
	shlq	$3, %rcx
	addq	_gArray@GOTPCREL(%rip), %rcx
	leaq	(%rcx,%rax,8), %rax
	movslq	%edx, %rcx
	movq	$0, (%rax,%rcx,8)
	xorl	%eax, %eax
	popq	%rbp
	ret

Code: Select all

UInt64    gArray[3][4][4]; 

_Foo:
	pushq	%rbp
	movq	%rsp, %rbp
	movslq	%esi, %rax
	shlq	$5, %rax
	movslq	%edi, %rcx
	shlq	$7, %rcx
	addq	_gArray@GOTPCREL(%rip), %rcx
	addq	%rax, %rcx
	movslq	%edx, %rax
	movq	$0, (%rcx,%rax,8)
	xorl	%eax, %eax
	popq	%rbp
	ret
The last case has the most efficient-looking assembler, because (as Steven Edwards said) the last N-1 dimensions are powers of 2.
Robert P.