Bit twiddling question, part 2: arbitrary bitscan order

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Bit twiddling question, part 2: arbitrary bitscan order

Post by Zach Wegner »

For part 2 in our 438 part series, I ask another hypothetical question of possible interest to bit twiddlers everywhere: how to run a bitscan over a bitboard, pulling off bits with an arbitrarily defined priority? For instance, a normal BSF has priority of bit X = 63-X, and BSR has priority =X. More importantly, how to do this fast?

The two ideas I have:

One other possibility is, if the priorities are all grouped in contiguous 16-bit chunks, then you can do a Nalimov-style bitscan. As an example, here's a normal BSF, but with the priorities of the 4 groups reversed (hope this makes sense):

Code: Select all

if (bb & 0xffff000000000000)
  sq = lsb[bb>>48];
else if (bb & 0x0000ffff00000000)
  sq = lsb[bb>>32 & 0xffff];
else if (bb & 0x00000000ffff0000)
  sq = lsb[bb>>16 & 0xffff];
else if (bb & 0x000000000000ffff)
  sq = lsb[bb & 0xffff];
This would scan bits in the order 48-63, 32-47, 16-31, and then 0-15. This works, but having the orders be restricted to those 16 bit chunks isn't very useful at all. Of course this could be extended to N bit chunks with suitable lookup tables, or arbitrary sets of bits that are transformed into a contiguous set before lookup. Too slow though!

The other is inspired by the second bitscan here (created by Gerd), which does 6 tests, one for each bit of the final result: http://chessprogramming.wikispaces.com/BitScan#toc8
(btw, be sure to check out the FZ video at the bottom, coincidentally right now I am listening to an audience tape of the same song performed about 8 months later :))

Simply take the priorities of the squares, and convert them to a "horizontal" array of 6 bitboards. Bitboard 0 will be the set of all squares with bit 0 set in their priority index, and so on. Then we loop through all the bitboards, except we need some special handling. We have to test the 32 highest priority bits first. If there is a bit in there, then we mask out all other bits and continue (I do this below with a simple -1/0 mask). Otherwise we do nothing, and we know that only bits with priorities <= 31 remain.

Code: Select all

bit5 = &#40;bb & mask&#91;5&#93;) != 0;
bb &= mask&#91;5&#93; | &#40;bit5-1&#41;; // assuming bit5 is a bitboard, probably a cast is smarter
result |= bit5; result <<= 1;
...then do the same for bits 4-0. At the end we have the highest existing priority 0<=p<=63 in result, which we use in a reverse lookup table. This should have about the same instructions of cycles as Gerd's, but with an added sub, or, and and for each bit.
steffan
Posts: 28
Joined: Sun Mar 12, 2006 12:08 am
Location: Midlands, England

Re: Bit twiddling question, part 2: arbitrary bitscan order

Post by steffan »

Zach Wegner wrote:... At the end we have the highest existing priority 0<=p<=63 in result, which we use in a reverse lookup table.
Why do the lookup? By the end of your code, isn't the answer in the variable bb? If so, you dont need the 'result' variable at all.

Cheers,
Steffan
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Bit twiddling question, part 2: arbitrary bitscan order

Post by mcostalba »

Just a very quick idea, probably could be further optimized. It compiles, but I have no time now for a complete test.

Code: Select all

int bitScanReverse&#40;uint64_t b&#41; &#123;

    const uint64_t P32 = 0xFFFFFFFF00000000UL;
    const uint64_t P16 = 0xFFFF0000FFFF0000UL;
    const uint64_t P08 = 0xFF00FF00FF00FF00UL;
    const uint64_t P04 = 0xF0F0F0F0F0F0F0F0UL;
    const uint64_t P02 = 0xCCCCCCCCCCCCCCCCUL;
    const uint64_t P01 = 0x8888888888888888UL;

    const uint64_t M32&#91;2&#93; = &#123; P32, ~P32 &#125;;
    const uint64_t M16&#91;2&#93; = &#123; P16, ~P16 &#125;;
    const uint64_t M08&#91;2&#93; = &#123; P08, ~P08 &#125;;
    const uint64_t M04&#91;2&#93; = &#123; P04, ~P04 &#125;;
    const uint64_t M02&#91;2&#93; = &#123; P02, ~P02 &#125;;

    uint64_t mask;
    int ret, n;

    ret = n = b > P32;
    mask = M32&#91;n&#93;;

    n = b > &#40;P16 & mask&#41;;
    mask &= M16&#91;n&#93;;
    ret = &#40;ret << 1&#41; + n;

    n = b > &#40;P08 & mask&#41;;
    mask &= M08&#91;n&#93;;
    ret = &#40;ret << 1&#41; + n;

    n = b > &#40;P04 & mask&#41;;
    mask &= M04&#91;n&#93;;
    ret = &#40;ret << 1&#41; + n;

    n = b > &#40;P02 & mask&#41;;
    mask &= M02&#91;n&#93;;
    ret = &#40;ret << 1&#41; + n;

    return &#40;ret << 1&#41; + &#40;b > &#40;P01 & mask&#41;);
&#125;


Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Bit twiddling question, part 2: arbitrary bitscan order

Post by Gerd Isenberg »

I don't actually understand what "arbitrary bitscan order" means ;-(
Is it file/rank order related? F.i. to implement a "symmetrical" white versus black bitscan?

Code: Select all

int symmetricalBsf &#40;U64 bb, enumColor color&#41; &#123;
    if ( color == ECWhite )
       return bsf&#40;bb&#41;;          // from a1..h1, a2..., h8
    return bsf&#40;bswap&#40;bb&#41;) ^ 56; // from a8..h8, a7..., h1
&#125;
Aleks Peshkov
Posts: 892
Joined: Sun Nov 19, 2006 9:16 pm
Location: Russia

Re: Bit twiddling question, part 2: arbitrary bitscan order

Post by Aleks Peshkov »

I see the use of arbitrary ordered bitscan for PST-evaluation move generation ordering.
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: Bit twiddling question, part 2: arbitrary bitscan order

Post by Zach Wegner »

steffan wrote:
Zach Wegner wrote:... At the end we have the highest existing priority 0<=p<=63 in result, which we use in a reverse lookup table.
Why do the lookup? By the end of your code, isn't the answer in the variable bb? If so, you dont need the 'result' variable at all.

Cheers,
Steffan
True, if you didn't want to get the index of the bit, but merely isolate it, that would be faster. It might even be competitive to just do a bsf at the end after isolating the bit.

The result accumulation is just 6 lea's, but they should be pipelined. OTOH you still need a lookup table, and the code size is increased a bit. These solutions should be quite similar in speed.
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: Bit twiddling question, part 2: arbitrary bitscan order

Post by Zach Wegner »

mcostalba wrote:Just a very quick idea, probably could be further optimized. It compiles, but I have no time now for a complete test.

Code: Select all

int bitScanReverse&#40;uint64_t b&#41; &#123;

    const uint64_t P32 = 0xFFFFFFFF00000000UL;
    const uint64_t P16 = 0xFFFF0000FFFF0000UL;
    const uint64_t P08 = 0xFF00FF00FF00FF00UL;
    const uint64_t P04 = 0xF0F0F0F0F0F0F0F0UL;
    const uint64_t P02 = 0xCCCCCCCCCCCCCCCCUL;
    const uint64_t P01 = 0x8888888888888888UL;

    const uint64_t M32&#91;2&#93; = &#123; P32, ~P32 &#125;;
    const uint64_t M16&#91;2&#93; = &#123; P16, ~P16 &#125;;
    const uint64_t M08&#91;2&#93; = &#123; P08, ~P08 &#125;;
    const uint64_t M04&#91;2&#93; = &#123; P04, ~P04 &#125;;
    const uint64_t M02&#91;2&#93; = &#123; P02, ~P02 &#125;;

    uint64_t mask;
    int ret, n;

    ret = n = b > P32;
    mask = M32&#91;n&#93;;

    n = b > &#40;P16 & mask&#41;;
    mask &= M16&#91;n&#93;;
    ret = &#40;ret << 1&#41; + n;

    n = b > &#40;P08 & mask&#41;;
    mask &= M08&#91;n&#93;;
    ret = &#40;ret << 1&#41; + n;

    n = b > &#40;P04 & mask&#41;;
    mask &= M04&#91;n&#93;;
    ret = &#40;ret << 1&#41; + n;

    n = b > &#40;P02 & mask&#41;;
    mask &= M02&#91;n&#93;;
    ret = &#40;ret << 1&#41; + n;

    return &#40;ret << 1&#41; + &#40;b > &#40;P01 & mask&#41;);
&#125;


I think this should work, but with an inversion of the P* masks. Testing for a bit in the top 32 bits is equivalent to > 0xffffffff. I think the (bb&mask)!=0 and bb>~mask should be the same speed though, a test and setnz in the first case and a cmp and seta in the second. I like your array idea though, perhaps it could be faster than a calculated mask.

Also, just scrapping the mask and doing b &= M*[n] should be faster.
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: Bit twiddling question, part 2: arbitrary bitscan order

Post by Zach Wegner »

Gerd Isenberg wrote:I don't actually understand what "arbitrary bitscan order" means ;-(
Is it file/rank order related? F.i. to implement a "symmetrical" white versus black bitscan?

Code: Select all

int symmetricalBsf &#40;U64 bb, enumColor color&#41; &#123;
    if ( color == ECWhite )
       return bsf&#40;bb&#41;;          // from a1..h1, a2..., h8
    return bsf&#40;bswap&#40;bb&#41;) ^ 56; // from a8..h8, a7..., h1
&#125;
Yes, that could be one possible order. The priority table in this case, for black and a1=0, would be:

Code: Select all

int table&#91;&#93; = &#123;
  56, 57, 58, 59, 60, 61, 62, 63,
  48, 49, 50, 51, 52, 53, 54, 55,
  ...
  0, 1, 2, 3, 4, 5, 6, 7
&#125;;
Think about it like quad bitboards--the 4 bitboards there vertically encode a piece code 0-16 (I guess I should have said "vertical" originally instead of "horizontally"). Here 6 bitboards vertically encode a priority value 0-63 for each square. The priority is just the order the bits should be pulled off in. If you pass in -1 as a bitboard, it would first pull off the bit with priority 63, then 62...0.

As Aleks says, this could be used to order in a piece-square order. For kings you'd prioritize the back rank, with the two wings being preferable to the center (maybe with a penalty for d1/f1 though). Knights go to the center first, etc.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Bit twiddling question, part 2: arbitrary bitscan order

Post by Gerd Isenberg »

Zach Wegner wrote: Yes, that could be one possible order. The priority table in this case, for black and a1=0, would be:

Code: Select all

int table&#91;&#93; = &#123;
  56, 57, 58, 59, 60, 61, 62, 63,
  48, 49, 50, 51, 52, 53, 54, 55,
  ...
  0, 1, 2, 3, 4, 5, 6, 7
&#125;;
Think about it like quad bitboards--the 4 bitboards there vertically encode a piece code 0-16 (I guess I should have said "vertical" originally instead of "horizontally"). Here 6 bitboards vertically encode a priority value 0-63 for each square. The priority is just the order the bits should be pulled off in. If you pass in -1 as a bitboard, it would first pull off the bit with priority 63, then 62...0.

As Aleks says, this could be used to order in a piece-square order. For kings you'd prioritize the back rank, with the two wings being preferable to the center (maybe with a penalty for d1/f1 though). Knights go to the center first, etc.
I see. How about brute force simd approach (8 xmm-regs) to expand the bitboard to a wordboard alá SSE2-dot-product, to mask the expanded words (0xffff for a binary one, 0x0000 for binary zero) with a word[64]-vector, containing order bytes ( > 0) in the upper half and the bitindex (0..63) in the lower half. Then sort the vector by a small SSE2 bubble sort ;-)
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Bit twiddling question, part 2: arbitrary bitscan order

Post by mcostalba »

Zach Wegner wrote:I think this should work, but with an inversion of the P* masks. Testing for a bit in the top 32 bits is equivalent to > 0xffffffff. I think the (bb&mask)!=0 and bb>~mask should be the same speed though, a test and setnz in the first case and a cmp and seta in the second. I like your array idea though, perhaps it could be faster than a calculated mask.

Also, just scrapping the mask and doing b &= M*[n] should be faster.
Thanks for your suggestions here is the new version.

Code: Select all

int bitScanReverse&#40;uint64_t b&#41; &#123;

    const uint64_t P32 = 0xFFFFFFFF00000000UL;
    const uint64_t P16 = 0xFFFF0000FFFF0000UL;
    const uint64_t P08 = 0xFF00FF00FF00FF00UL;
    const uint64_t P04 = 0xF0F0F0F0F0F0F0F0UL;
    const uint64_t P02 = 0xCCCCCCCCCCCCCCCCUL;
    const uint64_t P01 = 0x8888888888888888UL;

    const uint64_t M32&#91;2&#93; = &#123; ~P32, P32 &#125;;
    const uint64_t M16&#91;2&#93; = &#123; ~P16, P16 &#125;;
    const uint64_t M08&#91;2&#93; = &#123; ~P08, P08 &#125;;
    const uint64_t M04&#91;2&#93; = &#123; ~P04, P04 &#125;;
    const uint64_t M02&#91;2&#93; = &#123; ~P02, P02 &#125;;

    int ret, n;

    ret = n = &#40;b & P32&#41; != 0;
    b &= M32&#91;n&#93;;

    n = &#40;b & P16&#41; != 0;
    b &= M16&#91;n&#93;;
    ret = &#40;ret << 1&#41; + n;

    n = &#40;b & P08&#41; != 0;
    b &= M08&#91;n&#93;;
    ret = &#40;ret << 1&#41; + n;

    n = &#40;b & P04&#41; != 0;
    b &= M04&#91;n&#93;;
    ret = &#40;ret << 1&#41; + n;

    n = &#40;b & P02&#41; != 0;
    b &= M02&#91;n&#93;;
    ret = &#40;ret << 1&#41; + n;

    return &#40;ret << 1&#41; + (&#40;b & P01&#41; != 0&#41;;
&#125;