Introduction and (hopefully) contribution - bitboard methods
Posted: Fri Jun 03, 2011 5:07 pm
I've been lurking here for awhile and first of all I'd like to say thank you to all for posting all this good stuff, I've learned a lot here and now i'd like to maybe contribute to the chess programming community.
I am involved professionally with computer programming for 25 years. And some of my hobbies are programming and chess. So why not combine both . Well I'm working on my own chess engine and now I'm just reviewing the code in order to make available soon. Hope it will be helpfull to other fellow cc programmers.
Here I want to share what I did with bitboards, that got me some performance improvement over traditional methods. I'm talking about bit_count, first_index and last_index methods.
I was using the usual methods:
Those methods take in consideration that 0 is the leftmost bit and 63 is the rightmost bit. I know I'm using the opposite of most engines.
What I did was to create a structure to split the bitboard in 4 parts and them precompute a table with necessary information for each part. When I need the first/last index I locate which of the 4 parts has the first/last bit and them just lookup the precomputed table.
For bit count I just summ the 4 parts together. Well a code is worth thousand words. See the code bellow.
This is the structure to access each part of the bitboard. The bitboard value will be at BBIX.val64. And each part is at the val16[4]. Remember that this is a union.
The initialization part consists in creating 3 tables for first_index, last_index and bit_count. Note that I still use the tradicional methods to help the initialization. Of course we can get rid completely of the tradional methods by initializing using a more manual method. This on my TODO list.
And then here are the new methods using the BBIX and the precomputed tables.
Pretty much the approach is divide and conquer for first/last. I need to know which part has the first/last bit and then lookup the corresponding table. For each bitboard I need 2 if's and one lookup.
I'm using microsoft, so the methods/tables are adjusted for the LSM, MSB microsoft stuff (I hate this part). So it may require some configuration for other plataforms.
On my computer I got better performance that resulted a couple of thousand nodes searched.
It is not worth to simply move a bitboard to BBIX.val64 and then use the new methods, otherwise you have the overhead of moving data. You need to have the data on the BBIX structure.
Here is part of my genmove using the new method:
The biggest gain was in the first_index and bit_count.
In my tests I got those numbers on my 32bit windows. Didn't had a chance to test 64bit yet.
first_index new 13,937ms old: 41,793ms
last_index new 14,118ms old: 18,283ms
bit_count new 1,856ms old: 35,833ms
As I said, hope this is useful, and please point out anything you find if you do any tests.
Once again, thanks everyone and hope this is usefull.
Alcides Schulz.
I am involved professionally with computer programming for 25 years. And some of my hobbies are programming and chess. So why not combine both . Well I'm working on my own chess engine and now I'm just reviewing the code in order to make available soon. Hope it will be helpfull to other fellow cc programmers.
Here I want to share what I did with bitboards, that got me some performance improvement over traditional methods. I'm talking about bit_count, first_index and last_index methods.
I was using the usual methods:
Code: Select all
int bb_bit_count(U64 bb)
{
int count = 0;
while (bb)
{
count++;
bb &= bb - 1;
}
return count;
}
int bb_first_index(U64 bb)
{
if (bb >> 48)
return 63 - (bb_msb_table[bb >> 48] + 48);
if ((bb >> 32) & 65535)
return 63 - (bb_msb_table[(bb >> 32) & 65535] + 32);
if ((bb >> 16) & 65535)
return 63 - (bb_msb_table[(bb >> 16) & 65535] + 16);
return 63 - (bb_msb_table[bb & 65535]);
}
int bb_last_index(U64 bb)
{
unsigned int folded;
if (!bb)
return -1;
bb ^= bb - 1;
folded = (int) bb ^ (bb >> 32);
return 63 - bb_lsb_table[folded * 0x78291ACF >> 26];
}
What I did was to create a structure to split the bitboard in 4 parts and them precompute a table with necessary information for each part. When I need the first/last index I locate which of the 4 parts has the first/last bit and them just lookup the precomputed table.
For bit count I just summ the 4 parts together. Well a code is worth thousand words. See the code bellow.
Code: Select all
typedef union u_bitboard_index
{
U64 val64;
unsigned long val32[2];
unsigned short val16[4];
} BBIX;
The initialization part consists in creating 3 tables for first_index, last_index and bit_count. Note that I still use the tradicional methods to help the initialization. Of course we can get rid completely of the tradional methods by initializing using a more manual method. This on my TODO list.
Code: Select all
// Bit tables
char first_index_table[4][65536];
char last_index_table[4][65536];
char bit_count_table[65536];
void bb_init()
{
// ommited code to init the usual bitboards first/last/count
// tables for faster first/last/count
first_index_table[0][0] = -1;
first_index_table[1][0] = -1;
first_index_table[2][0] = -1;
first_index_table[3][0] = -1;
last_index_table[0][0] = -1;
last_index_table[1][0] = -1;
last_index_table[2][0] = -1;
last_index_table[3][0] = -1;
for (j = 1; j < 65536; j++)
{
first_index_table[0][j] = (char)bb_first_index((U64)j) - 48;
first_index_table[1][j] = (char)bb_first_index((U64)j) - 32;
first_index_table[2][j] = (char)bb_first_index((U64)j) - 16;
first_index_table[3][j] = (char)bb_first_index((U64)j);
last_index_table[0][j] = (char)bb_last_index((U64)j) - 48;
last_index_table[1][j] = (char)bb_last_index((U64)j) - 32;
last_index_table[2][j] = (char)bb_last_index((U64)j) - 16;
last_index_table[3][j] = (char)bb_last_index((U64)j);
}
for (j = 0; j < 65536; j++)
{
bit_count_table[j] = (char)bb_bit_count((U64)j);
}
}
Code: Select all
//-------------------------------------------------------------------------------------------------
// Return index of the first "1" bit of BBIX: 0 to 63, or -1 when bitboard is 0
//-------------------------------------------------------------------------------------------------
int first_index(BBIX bbix)
{
if (bbix.val32[1])
{
if (bbix.val16[3])
return first_index_table[0][bbix.val16[3]];
else
return first_index_table[1][bbix.val16[2]];
}
else
{
if (bbix.val16[1])
return first_index_table[2][bbix.val16[1]];
else
return first_index_table[3][bbix.val16[0]];
}
}
//-------------------------------------------------------------------------------------------------
// Return index of the last "1" bit of BBIX: 0 to 63, or -1 when bitboard is 0
//-------------------------------------------------------------------------------------------------
int last_index(BBIX bbix)
{
if (bbix.val32[0])
{
if (bbix.val16[0])
return last_index_table[3][bbix.val16[0]];
else
return last_index_table[2][bbix.val16[1]];
}
else
{
if (bbix.val16[2])
return last_index_table[1][bbix.val16[2]];
else
return last_index_table[0][bbix.val16[3]];
}
}
//-------------------------------------------------------------------------------------------------
// Number of "1" bits in the BBIX
//-------------------------------------------------------------------------------------------------
int bit_count(BBIX bbix)
{
return bit_count_table[bbix.val16[0]] +
bit_count_table[bbix.val16[1]] +
bit_count_table[bbix.val16[2]] +
bit_count_table[bbix.val16[3]];
}
I'm using microsoft, so the methods/tables are adjusted for the LSM, MSB microsoft stuff (I hate this part). So it may require some configuration for other plataforms.
On my computer I got better performance that resulted a couple of thousand nodes searched.
It is not worth to simply move a bitboard to BBIX.val64 and then use the new methods, otherwise you have the overhead of moving data. You need to have the data on the BBIX structure.
Here is part of my genmove using the new method:
Code: Select all
BBIX piece, attacks, moves;
piece.val64 = STM_STATE.piece[ROOK] | STM_STATE.piece[QUEEN];
while(piece.val64)
{
from = (char)last_index(piece);
// rankfile_east
attacks.val64 = rankfile_east[from] & board.occupied_squares;
if (attacks.val64)
{
to = (char)first_index(attacks);
if (OPP_STATE.square[to])
add_capture(from, to);
moves.val64 = from_to_path[from][to];
}
else
moves.val64 = rankfile_east[from];
while(moves.val64)
{
to = (char)first_index(moves);
add_move(from, to);
CLRBIT_IX(moves.val64, to);
}
//....
In my tests I got those numbers on my 32bit windows. Didn't had a chance to test 64bit yet.
first_index new 13,937ms old: 41,793ms
last_index new 14,118ms old: 18,283ms
bit_count new 1,856ms old: 35,833ms
As I said, hope this is useful, and please point out anything you find if you do any tests.
Once again, thanks everyone and hope this is usefull.
Alcides Schulz.