Good suggestions everyone, interesting stuff.
mcostalba wrote:If popcnt is less then 4 for each bitboard then multiply for 0x5555... Should do the trick:
return a*0x555... > b*0x555...
Well, as you and Ricardo noticed, that doesn't quite work, but it's a very nice idea. I think it might be able to be used though with a bit more trickery (semi ripped off from CPW):
Code: Select all
bool popcnt_a_ge_b(a,b)
{
const U64 k1 = C64(0x5555555555555555);
a = a - ((a >> 1) & k1);
b = b - ((b >> 1) & k1);
a *= k1;
b *= k1;
return (a >= b);
}
I think that works. It can of course give errors if either popcnt is >= 4, but I think for a quick SEE it might be worth it. That function is nicely parallel, and branchless. Looks like it should be about 6-7 cycles total. An error here (for popcount>=4) should be pretty rare and can be ignored.
Unfortunately I'm not sure if I need this function at all any more; I might need to just use full popcounts. I'll see what I come up with for this SEE, it's rather tricky.