I am involved professionally with computer programming for 25 years. And some of my hobbies are programming and chess. So why not combine both . Well I'm working on my own chess engine and now I'm just reviewing the code in order to make available soon. Hope it will be helpfull to other fellow cc programmers.
Here I want to share what I did with bitboards, that got me some performance improvement over traditional methods. I'm talking about bit_count, first_index and last_index methods.
I was using the usual methods:
Code: Select all
int bb_bit_count(U64 bb)
{
int count = 0;
while (bb)
{
count++;
bb &= bb - 1;
}
return count;
}
int bb_first_index(U64 bb)
{
if (bb >> 48)
return 63 - (bb_msb_table[bb >> 48] + 48);
if ((bb >> 32) & 65535)
return 63 - (bb_msb_table[(bb >> 32) & 65535] + 32);
if ((bb >> 16) & 65535)
return 63 - (bb_msb_table[(bb >> 16) & 65535] + 16);
return 63 - (bb_msb_table[bb & 65535]);
}
int bb_last_index(U64 bb)
{
unsigned int folded;
if (!bb)
return -1;
bb ^= bb - 1;
folded = (int) bb ^ (bb >> 32);
return 63 - bb_lsb_table[folded * 0x78291ACF >> 26];
}
What I did was to create a structure to split the bitboard in 4 parts and them precompute a table with necessary information for each part. When I need the first/last index I locate which of the 4 parts has the first/last bit and them just lookup the precomputed table.
For bit count I just summ the 4 parts together. Well a code is worth thousand words. See the code bellow.
Code: Select all
typedef union u_bitboard_index
{
U64 val64;
unsigned long val32[2];
unsigned short val16[4];
} BBIX;
The initialization part consists in creating 3 tables for first_index, last_index and bit_count. Note that I still use the tradicional methods to help the initialization. Of course we can get rid completely of the tradional methods by initializing using a more manual method. This on my TODO list.
Code: Select all
// Bit tables
char first_index_table[4][65536];
char last_index_table[4][65536];
char bit_count_table[65536];
void bb_init()
{
// ommited code to init the usual bitboards first/last/count
// tables for faster first/last/count
first_index_table[0][0] = -1;
first_index_table[1][0] = -1;
first_index_table[2][0] = -1;
first_index_table[3][0] = -1;
last_index_table[0][0] = -1;
last_index_table[1][0] = -1;
last_index_table[2][0] = -1;
last_index_table[3][0] = -1;
for (j = 1; j < 65536; j++)
{
first_index_table[0][j] = (char)bb_first_index((U64)j) - 48;
first_index_table[1][j] = (char)bb_first_index((U64)j) - 32;
first_index_table[2][j] = (char)bb_first_index((U64)j) - 16;
first_index_table[3][j] = (char)bb_first_index((U64)j);
last_index_table[0][j] = (char)bb_last_index((U64)j) - 48;
last_index_table[1][j] = (char)bb_last_index((U64)j) - 32;
last_index_table[2][j] = (char)bb_last_index((U64)j) - 16;
last_index_table[3][j] = (char)bb_last_index((U64)j);
}
for (j = 0; j < 65536; j++)
{
bit_count_table[j] = (char)bb_bit_count((U64)j);
}
}
Code: Select all
//-------------------------------------------------------------------------------------------------
// Return index of the first "1" bit of BBIX: 0 to 63, or -1 when bitboard is 0
//-------------------------------------------------------------------------------------------------
int first_index(BBIX bbix)
{
if (bbix.val32[1])
{
if (bbix.val16[3])
return first_index_table[0][bbix.val16[3]];
else
return first_index_table[1][bbix.val16[2]];
}
else
{
if (bbix.val16[1])
return first_index_table[2][bbix.val16[1]];
else
return first_index_table[3][bbix.val16[0]];
}
}
//-------------------------------------------------------------------------------------------------
// Return index of the last "1" bit of BBIX: 0 to 63, or -1 when bitboard is 0
//-------------------------------------------------------------------------------------------------
int last_index(BBIX bbix)
{
if (bbix.val32[0])
{
if (bbix.val16[0])
return last_index_table[3][bbix.val16[0]];
else
return last_index_table[2][bbix.val16[1]];
}
else
{
if (bbix.val16[2])
return last_index_table[1][bbix.val16[2]];
else
return last_index_table[0][bbix.val16[3]];
}
}
//-------------------------------------------------------------------------------------------------
// Number of "1" bits in the BBIX
//-------------------------------------------------------------------------------------------------
int bit_count(BBIX bbix)
{
return bit_count_table[bbix.val16[0]] +
bit_count_table[bbix.val16[1]] +
bit_count_table[bbix.val16[2]] +
bit_count_table[bbix.val16[3]];
}
I'm using microsoft, so the methods/tables are adjusted for the LSM, MSB microsoft stuff (I hate this part). So it may require some configuration for other plataforms.
On my computer I got better performance that resulted a couple of thousand nodes searched.
It is not worth to simply move a bitboard to BBIX.val64 and then use the new methods, otherwise you have the overhead of moving data. You need to have the data on the BBIX structure.
Here is part of my genmove using the new method:
Code: Select all
BBIX piece, attacks, moves;
piece.val64 = STM_STATE.piece[ROOK] | STM_STATE.piece[QUEEN];
while(piece.val64)
{
from = (char)last_index(piece);
// rankfile_east
attacks.val64 = rankfile_east[from] & board.occupied_squares;
if (attacks.val64)
{
to = (char)first_index(attacks);
if (OPP_STATE.square[to])
add_capture(from, to);
moves.val64 = from_to_path[from][to];
}
else
moves.val64 = rankfile_east[from];
while(moves.val64)
{
to = (char)first_index(moves);
add_move(from, to);
CLRBIT_IX(moves.val64, to);
}
//....
In my tests I got those numbers on my 32bit windows. Didn't had a chance to test 64bit yet.
first_index new 13,937ms old: 41,793ms
last_index new 14,118ms old: 18,283ms
bit_count new 1,856ms old: 35,833ms
As I said, hope this is useful, and please point out anything you find if you do any tests.
Once again, thanks everyone and hope this is usefull.
Alcides Schulz.