New De Bruijn BitScan

Discussion of chess software programming and technical issues.

Moderator: Ras

Gerd Isenberg
Posts: 2251
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

New De Bruijn BitScan

Post by Gerd Isenberg »

Kim Walisch, a software engineer from Luxembourg, proposed a new BitScan in cpw based on De Bruijn multiplication. Instead of the isolated least significant one bit, he uses a 0-1 mask separated by the LS1B of a bitboard (as already base of Matt Taylor's folded bitscan, which then xors upper and lower 32-bit word) to multiply it with a 64-bit De Bruijn sequence. Separation by x ^ (x-1) is faster than isolation by x & -x due to x86 lea. Now the factor of a bitboard becomes two times the isolated LS1B minus one. Interestingly, only the higher 1/16 of all 2^26 odd De Bruin sequences qualify for a collision free factor which leave unique upper six bit indices in the 0..63 range for all possible 64 LS1Bs.

LS1B*DeBruijn + LS1B*DeBruijn - DeBruijn -> unique upper six bits only for the 2^22 upper odd De Bruins sequences

Now, the question remains, why?
Any mathematical insights appreciated.

Code: Select all

const int index64[64] = {
    0, 47,  1, 56, 48, 27,  2, 60,
   57, 49, 41, 37, 28, 16,  3, 61,
   54, 58, 35, 52, 50, 42, 21, 44,
   38, 32, 29, 23, 17, 11,  4, 62,
   46, 55, 26, 59, 40, 36, 15, 53,
   34, 51, 20, 43, 31, 22, 10, 45,
   25, 39, 14, 33, 19, 30,  9, 24,
   13, 18,  8, 12,  7,  6,  5, 63
};
 
/**
 * bitScanForward
 * @author Kim Walisch (2012)
 * @param bb bitboard to scan
 * @precondition bb != 0
 * @return index (0..63) of least significant one bit
 */
int bitScanForward(U64 bb) {
   const U64 debruijn64 = C64(0x03f79d71b4cb0a89);
   assert (bb != 0);
   return index64[((bb ^ (bb-1)) * debruijn64) >> 58];
}
Gerd
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: New De Bruijn BitScan

Post by mcostalba »

Thanks Gerd for reporting this.

Nowdays this is more of theoretical interest then real one because 64 bits architectures have hardware bitscan, and on other architectures than Intel speed-up is less clear due to missing lea instruction. Nevertheless it allows to sync implementation of 64 and 32 bit case (where in the latter Matt's trick is used) so is nice to have I guess.

I have pushed a patch to SF to support it.