hi everyone again.
today i tested some ideas. Let me start up with what i have done
to compute the msb. i order from slow to fast.
1. fasttest() -> performs -97%
testing bits, starting from sq to msb, obstructed by end of line
(in simple loop)
2. divconq() -> performs -78%
check always upper half of line, splits until msb is found
3. BitScanReverse64 intrinsic -> performs -10%
4. bsr (double conversion, from Gerd). my original one +-0
5. msb-lookup +30%
Code: Select all
//******************************************************************************
//...
for(BTB k=1;k<256;k++)
{
msk.msb_rank[k] = bsr64(k); //UI_08 msb_rank[256]; maybe 128 is enough
msk.msb_file[k] = bsf64(k); // ""
}
//...
//******************************************************************************
BTB msb_rank(BTB bb,SQR_ID sq64)
{
UI_08 shift = (sq64>>3) << 3;
return((BTB) 1 << (msk.msb_rank[bb>>shift] + shift));
}
//******************************************************************************
BTB msb_file(BTB bb,SQR_ID sq64)
{
//obstruction to rank 1 has to be set...
BTB x;
x = bb >> (sq64 & 7);
x *= diag;
x >>= 56;
x = bR8 >> (msk.msb_file[x]*8);
return(x & bb);
}
//******************************************************************************
So what conclusion has to be done here, with respect to Kindergarten BB ?
The 2 msb lookups are faster than computing by bitscan-reverse !?
should someone now use KG-BB ? Where are the differences ?
Because i am not (of course not) expert for KG-BB. What s about the memory usage ?
Or which kind of bsr should be used ? (Gerd's double conversion is the fastest i used until now. it s faster than the intrinsic ? or am i doing sth totally wrong ?)
note:
- in the case how the msb is used for the obstruction difference, it
is enough to isolate the corresponding file _OR_ rank, to build the ms1b.
That may lead to some other ideas.
- maybe a fast magic bsr-function without lookup can be created, due
to the simpleness of a masked line ?? any idea for that ? (would be enough to have it for one line,all other lines maybe shifted...)
well, now with the kind of msb-lookup the performance comes closer to
the magic approach, some more 20 % speedup and it will beat it...