Fascinating! Not sure if it's exactly optimal, for sparsely populated bitboards.Gerd Isenberg wrote:I see. How about brute force simd approach (8 xmm-regs) to expand the bitboard to a wordboard alá SSE2-dot-product, to mask the expanded words (0xffff for a binary one, 0x0000 for binary zero) with a word[64]-vector, containing order bytes ( > 0) in the upper half and the bitindex (0..63) in the lower half. Then sort the vector by a small SSE2 bubble sort
Bit twiddling question, part 2: arbitrary bitscan order
Moderators: hgm, Rebel, chrisw
-
- Posts: 1922
- Joined: Thu Mar 09, 2006 12:51 am
- Location: Earth
Re: Bit twiddling question, part 2: arbitrary bitscan order
-
- Posts: 1922
- Joined: Thu Mar 09, 2006 12:51 am
- Location: Earth
Re: Bit twiddling question, part 2: arbitrary bitscan order
Speed comparison of five routines: my original proposal, Steffan's proposal (but with a bsf at the end to get the index), Marco's proposal, and a combination of all three. I also tested Steffan's without the bitscan, and with a bb &= ~s; in the loop.
Each one was measured in a dumb loop:
And the results (in centiseconds):
Surprisingly, the bsfq in Steffan's is very cheap, and the combination of Steffan and Marco's proposals was much slower than the original. Steffan looks like the winner here though.
Code: Select all
int bitscan_pri_z(bb_t bb)
{
int bit, res = 0;
bb_t r;
/* Unroll this! */
for (bit = 5; bit >= 0; bit--) {
r = (bb & pri_mask[bit]) != 0;
bb &= pri_mask[bit] | (r - 1);
res = res << 1 | r;
}
return rev_priority[res];
}
bb_t bitscan_pri_so(bb_t bb)
{
int bit;
bb_t r;
/* Unroll this! */
for (bit = 5; bit >= 0; bit--) {
r = (bb & pri_mask[bit]) != 0;
bb &= pri_mask[bit] | (r - 1);
}
return (bb);
}
int bitscan_pri_s(bb_t bb)
{
int bit;
bb_t r;
/* Unroll this! */
for (bit = 5; bit >= 0; bit--) {
r = (bb & pri_mask[bit]) != 0;
bb &= pri_mask[bit] | (r - 1);
}
return bsfq(bb);
}
int bitscan_pri_m(bb_t bb)
{
int bit, res = 0, r;
/* Unroll this! */
for (bit = 5; bit >= 0; bit--) {
r = (bb & pri_mask_a[bit][1]) != 0;
bb &= pri_mask_a[bit][r];
res = res << 1 | r;
}
return rev_priority[res];
}
int bitscan_pri_sm(bb_t bb)
{
int bit, r;
/* Unroll this! */
for (bit = 5; bit >= 0; bit--) {
r = (bb & pri_mask_a[bit][1]) != 0;
bb &= pri_mask_a[bit][r];
}
return bsfq(bb);
}
Code: Select all
t = clock();
for (i = 0; i < 10000000; i++) {
bb_t b = -1;
x = 0;
while (b) {
b &= ~(1ull << x);
x++;
}
}
int overhead = clock() - t;
int time;
t = clock();
for (i = 0; i < 10000000; i++) {
bb_t b = -1;
while (b) {
int s = bitscan_pri_z(b);
b &= ~(1ull << s);
}
}
time = clock() - t - overhead;
printf("zach: overhead=%i, time=%i\n",overhead, time);
Code: Select all
zach: overhead=54, time=1037
steffan orig: overhead=54, time=873
steffan: overhead=54, time=903
marco: overhead=54, time=1389
s/m: overhead=54, time=1769
-
- Posts: 2250
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: Bit twiddling question, part 2: arbitrary bitscan order
Propably the bubble sort of 64 shorts is a bit too slowZach Wegner wrote:Fascinating! Not sure if it's exactly optimal, for sparsely populated bitboards.Gerd Isenberg wrote:I see. How about brute force simd approach (8 xmm-regs) to expand the bitboard to a wordboard alá SSE2-dot-product, to mask the expanded words (0xffff for a binary one, 0x0000 for binary zero) with a word[64]-vector, containing order bytes ( > 0) in the upper half and the bitindex (0..63) in the lower half. Then sort the vector by a small SSE2 bubble sort
Expanding bits to bytes (or shorts) is quite effiecient as the dot-product is nowadays for a weighted popcount.
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Bit twiddling question, part 2: arbitrary bitscan order
Thanks for testing !Zach Wegner wrote: Surprisingly, the bsfq in Steffan's is very cheap, and the combination of Steffan and Marco's proposals was much slower than the original. Steffan looks like the winner here though.
Have you tested on a 64 bit system or on a 32bit ?
The little reverse routine of mine could be rewritten in a much faster way for 32 bit, namely all the 'and' operation but the first could be 32 bit.
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Bit twiddling question, part 2: arbitrary bitscan order
This is another version, probably slower but fancier
Code: Select all
void rol32(uint64_t& bb) {
bb = (bb << 32) | (bb >> 32);
}
int bitScanReverse32(uint32_t b) {
const uint32_t U16 = 0xFFFF0000;
const uint32_t U08 = 0xFF00FF00;
const uint32_t U04 = 0xF0F0F0F0;
const uint32_t U02 = 0xCCCCCCCC;
const uint32_t U01 = 0x88888888;
const uint64_t L16 = (uint64_t(~U16) << 32) + U16;
const uint64_t L08 = (uint64_t(~U08) << 32) + U08;
const uint64_t L04 = (uint64_t(~U04) << 32) + U04;
const uint64_t L02 = (uint64_t(~U02) << 32) + U02;
const uint64_t L01 = (uint64_t(~U01) << 32) + U01;
int ret = 0;
uint64_t bb = (uint64_t(b) << 32) | b;
bb &= L16;
if (unsigned(bb)) ret += 1; else rol32(bb);
bb = (bb << 32) | unsigned(bb);
ret <<= 1;
bb &= L08;
if (unsigned(bb)) ret += 1; else rol32(bb);
bb = (bb << 32) | unsigned(bb);
ret <<= 1;
bb &= L04;
if (unsigned(bb)) ret += 1; else rol32(bb);
bb = (bb << 32) | unsigned(bb);
ret <<= 1;
bb &= L02;
if (unsigned(bb)) ret += 1; else rol32(bb);
bb = (bb << 32) | unsigned(bb);
return (ret << 1) + (unsigned(bb & L01) != 0);
}
int bitScanReverse(uint64_t b) {
const uint32_t hb = b >> 32;
return hb ? 32 + bitScanReverse32(hb) : bitScanReverse32(uint32_t(b));
}