Bit twiddling question, part 2: arbitrary bitscan order

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: Bit twiddling question, part 2: arbitrary bitscan order

Post by Zach Wegner »

Gerd Isenberg wrote:I see. How about brute force simd approach (8 xmm-regs) to expand the bitboard to a wordboard alá SSE2-dot-product, to mask the expanded words (0xffff for a binary one, 0x0000 for binary zero) with a word[64]-vector, containing order bytes ( > 0) in the upper half and the bitindex (0..63) in the lower half. Then sort the vector by a small SSE2 bubble sort ;-)
Fascinating! Not sure if it's exactly optimal, for sparsely populated bitboards. ;)
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: Bit twiddling question, part 2: arbitrary bitscan order

Post by Zach Wegner »

Speed comparison of five routines: my original proposal, Steffan's proposal (but with a bsf at the end to get the index), Marco's proposal, and a combination of all three. I also tested Steffan's without the bitscan, and with a bb &= ~s; in the loop.

Code: Select all

int bitscan_pri_z(bb_t bb)
{
    int bit, res = 0;
    bb_t r;

    /* Unroll this! */
    for (bit = 5; bit >= 0; bit--) {
        r = (bb & pri_mask[bit]) != 0;
        bb &= pri_mask[bit] | (r - 1);
        res = res << 1 | r;
    &#125;

    return rev_priority&#91;res&#93;;
&#125;

bb_t bitscan_pri_so&#40;bb_t bb&#41;
&#123;
    int bit;
    bb_t r;

    /* Unroll this! */
    for &#40;bit = 5; bit >= 0; bit--) &#123;
        r = &#40;bb & pri_mask&#91;bit&#93;) != 0;
        bb &= pri_mask&#91;bit&#93; | &#40;r - 1&#41;;
    &#125;

    return &#40;bb&#41;;
&#125;

int bitscan_pri_s&#40;bb_t bb&#41;
&#123;
    int bit;
    bb_t r;

    /* Unroll this! */
    for &#40;bit = 5; bit >= 0; bit--) &#123;
        r = &#40;bb & pri_mask&#91;bit&#93;) != 0;
        bb &= pri_mask&#91;bit&#93; | &#40;r - 1&#41;;
    &#125;

    return bsfq&#40;bb&#41;;
&#125;

int bitscan_pri_m&#40;bb_t bb&#41;
&#123;
    int bit, res = 0, r;

    /* Unroll this! */
    for &#40;bit = 5; bit >= 0; bit--) &#123;
        r = &#40;bb & pri_mask_a&#91;bit&#93;&#91;1&#93;) != 0;
        bb &= pri_mask_a&#91;bit&#93;&#91;r&#93;;
        res = res << 1 | r;
    &#125;

    return rev_priority&#91;res&#93;;
&#125;

int bitscan_pri_sm&#40;bb_t bb&#41;
&#123;
    int bit, r;

    /* Unroll this! */
    for &#40;bit = 5; bit >= 0; bit--) &#123;
        r = &#40;bb & pri_mask_a&#91;bit&#93;&#91;1&#93;) != 0;
        bb &= pri_mask_a&#91;bit&#93;&#91;r&#93;;
    &#125;

    return bsfq&#40;bb&#41;;
&#125;
Each one was measured in a dumb loop:

Code: Select all

t = clock&#40;);
    for &#40;i = 0; i < 10000000; i++) &#123;
        bb_t b = -1;
        x = 0;
        while &#40;b&#41; &#123;
            b &= ~&#40;1ull << x&#41;;
            x++;
        &#125;
    &#125;
    int overhead = clock&#40;) - t;

    int time;

    t = clock&#40;);
    for &#40;i = 0; i < 10000000; i++) &#123;
        bb_t b = -1;
        while &#40;b&#41; &#123;
            int s = bitscan_pri_z&#40;b&#41;;
            b &= ~&#40;1ull << s&#41;;
        &#125;
    &#125;
    time = clock&#40;) - t - overhead;
    printf&#40;"zach&#58; overhead=%i, time=%i\n",overhead, time&#41;;
And the results (in centiseconds):

Code: Select all

zach&#58; overhead=54, time=1037
steffan orig&#58; overhead=54, time=873
steffan&#58; overhead=54, time=903
marco&#58; overhead=54, time=1389
s/m&#58; overhead=54, time=1769
Surprisingly, the bsfq in Steffan's is very cheap, and the combination of Steffan and Marco's proposals was much slower than the original. Steffan looks like the winner here though.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Bit twiddling question, part 2: arbitrary bitscan order

Post by Gerd Isenberg »

Zach Wegner wrote:
Gerd Isenberg wrote:I see. How about brute force simd approach (8 xmm-regs) to expand the bitboard to a wordboard alá SSE2-dot-product, to mask the expanded words (0xffff for a binary one, 0x0000 for binary zero) with a word[64]-vector, containing order bytes ( > 0) in the upper half and the bitindex (0..63) in the lower half. Then sort the vector by a small SSE2 bubble sort ;-)
Fascinating! Not sure if it's exactly optimal, for sparsely populated bitboards. ;)
Propably the bubble sort of 64 shorts is a bit too slow ;-)
Expanding bits to bytes (or shorts) is quite effiecient as the dot-product is nowadays for a weighted popcount.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Bit twiddling question, part 2: arbitrary bitscan order

Post by mcostalba »

Zach Wegner wrote: Surprisingly, the bsfq in Steffan's is very cheap, and the combination of Steffan and Marco's proposals was much slower than the original. Steffan looks like the winner here though.
Thanks for testing !

Have you tested on a 64 bit system or on a 32bit ?

The little reverse routine of mine could be rewritten in a much faster way for 32 bit, namely all the 'and' operation but the first could be 32 bit.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Bit twiddling question, part 2: arbitrary bitscan order

Post by mcostalba »

This is another version, probably slower but fancier ;-)

Code: Select all

void rol32&#40;uint64_t& bb&#41; &#123;

    bb = &#40;bb << 32&#41; | &#40;bb >> 32&#41;;
&#125;

int bitScanReverse32&#40;uint32_t b&#41; &#123;
    
    const uint32_t U16 = 0xFFFF0000;
    const uint32_t U08 = 0xFF00FF00;
    const uint32_t U04 = 0xF0F0F0F0;
    const uint32_t U02 = 0xCCCCCCCC;
    const uint32_t U01 = 0x88888888;

    const uint64_t L16 = &#40;uint64_t&#40;~U16&#41; << 32&#41; + U16;
    const uint64_t L08 = &#40;uint64_t&#40;~U08&#41; << 32&#41; + U08;
    const uint64_t L04 = &#40;uint64_t&#40;~U04&#41; << 32&#41; + U04;
    const uint64_t L02 = &#40;uint64_t&#40;~U02&#41; << 32&#41; + U02;
    const uint64_t L01 = &#40;uint64_t&#40;~U01&#41; << 32&#41; + U01;

    int ret = 0;

    uint64_t bb = &#40;uint64_t&#40;b&#41; << 32&#41; | b;

    bb &= L16;
    if &#40;unsigned&#40;bb&#41;) ret += 1; else rol32&#40;bb&#41;;
    bb = &#40;bb << 32&#41; | unsigned&#40;bb&#41;;

    ret <<= 1;
    bb &= L08;
    if &#40;unsigned&#40;bb&#41;) ret += 1; else rol32&#40;bb&#41;;
    bb = &#40;bb << 32&#41; | unsigned&#40;bb&#41;;

    ret <<= 1;
    bb &= L04;    
    if &#40;unsigned&#40;bb&#41;) ret += 1; else rol32&#40;bb&#41;;
    bb = &#40;bb << 32&#41; | unsigned&#40;bb&#41;;

    ret <<= 1;
    bb &= L02;    
    if &#40;unsigned&#40;bb&#41;) ret += 1; else rol32&#40;bb&#41;;
    bb = &#40;bb << 32&#41; | unsigned&#40;bb&#41;;

    return &#40;ret << 1&#41; + &#40;unsigned&#40;bb & L01&#41; != 0&#41;;
&#125;

int bitScanReverse&#40;uint64_t b&#41; &#123;

    const uint32_t hb = b >> 32;
    return hb ? 32 + bitScanReverse32&#40;hb&#41; &#58; bitScanReverse32&#40;uint32_t&#40;b&#41;);

&#125;