Code: Select all
If anyone is interested in trying these 32 bit friendly sliding-piece 'folded' bitboard move generators then I suggest this as a starting point. The reason that they only return an index is so, it can be used to lookup mobility as well as an attack set. Also the index can be saved if it might be needed later.
Please note that there is not one single 64 bit access in either routine. It is optimized/targeted for 32 bit machines. People with 64 bit machines that do not mind having two sets of routines for both 32 bit and 64 bit machines are welcome to create the 64 bit counter parts.
u32 BishopIndex(square *sq, u64*occ) { // b1
u32 index = (u32)*occ | (*(occ + 4) >> 1); // b2
index |= (index & 0xffff0000) >> sq->bishopShift; // b3
return sq->bPerfect1[index & 255] | sq->rPerfect2[(index >> 8) & 255]; // b4
}
U32 RookIndex(square *sq, u64 *occ) { // r1
u32 occ1 = (u32)*(occ + sq->off1); // r2
u32 occ2 = (u32)*(occ + sq->off2); // r3
u32 index = (occ1 & sq->rankMask) >> sq->rankShift; // r4
index |= (occ1 & sq->colMask1) >> sq->colShift1; // r5
index |= (occ2 & sq->colMask2) >> sq->colShift2; // r6
index |= (index & 0xffff0000) >> sq->rookShift; // r7
return sq->rPerfect1[index & 255] | sq->rPerfect2[(index >> 8) & 255]; // r8
}
b1) pointers are only 32 bit, so it saves one instruction to do it this way.
b2) using 32 bit instructions only retrieve both parts of occ and combine them.
b3) fold the upper half of index into lower two rows, now you have a dual index.
b4) return the combined minimal perfect index to the bishp attack/mobility table.
r1) same as b1.
r2) load occ1 with the half of the occupied bitboard that contains the rank.
r3) load occ2 with the other half.
r4) load the rank bits shifted to bit 0 of index.
r5) load the column bits of occ1 shifted into place in the index.
r6) load the column bits of occ2 shifted into place in the index.
r7) same as b3.
r8) same as b4, except it is for rooks.
* ?Perfect1 for each square is initialized with the address of that squares subtable.
* ?Perfect1 then is further initialized with the row1 bits for a subtable within the subtable.
* ?Perfect2 is initialized with the remaining bits.
* when ?Perfect1 and ?Perfect2 are or'd together they form the complete index.
I am going to use these routines in my new chess program regardless of speed, because I believe that they are more than fast enough. And the bishop is probably the fastest yet developed in 32 bit or even 64 bit. The rook may or may not be fastest in 32 bit, but it is definitly very competitive. For 64 bit I reccommend Gerd's 'Kindergarden' RookAttacks(). Or just use 'kindergarden' for rooks and 'folded' for bishops.
I will now complete the initialization code and make a follow-up post.
I hope that this code proves useful!
Mike