Gerd Isenberg wrote:
For intels with fast bsr but slower bswap, this approach might be not that bad as I initially thought. Alternatively one can save 5 ops by upper/lower lookups from the very same cacheline to end up with 9 ops per line (file, rank, dia, antidia) and two independent instruction chains until the final lea/and:
the thing i worked out with the less time i have left at the moment.
would be happy again for any idea. if anyone would like to the the
performance, feel free .
Any improvements, explanations what may be wrong here, just
report...thx
BTB file(const BTB occ,const SQR_ID sq64)
{
//locals
BTB fle,lo,hi;
//hi: occupancy bits above sq (rnk8 bit is set independent)
//lo: occupancy bits below sq (rnk1 bit is set independent)
//msk_hi: precalculated bits above sq on file
//msk_bit_on_rnkx: corresponding bit for file on rnk x
hi = (occ & msk.hi[sq64]) | msk.msk_bit_on_rnk8[sq64];
lo = (occ & msk.lo[sq64]) | msk.msk_bit_on_rnk1[sq64];
hi &= -hi; //lsb of bits above sq
lo = BB(bsr64(lo)); //msb of bits below sq
fle = (((hi<<1)-lo) & msk.msk_file[sq64]); //connecting bits
return(fle);
}
Your routine is fine, except three optimizations already posted elsewhere in this thread. "High" don't needs the upper bit set, since you may borrow from hidden 2^64 anyway. Only "low" needs to be at least one in case of empty set, so just or "low" by one.
A possible micro optimization is to use -low = -1 << bsr, for x86 lea-instruction with 2*high+(-low) instead of (high<<1)-low. ) Nine operations. Few registers, independent instruction chains. Your idea has become hot now. Needs a name
well, if i am thinking for a name, what do you think about this?
"Significant Bit Solution (SBS)".
So, thank you Gerd for your improvements and explanations.
At the moment i dont know how to get it better.
Maybe the next days (when time is left), i will compare with my magic
bitboard scheme.
Gerd Isenberg wrote:Your idea has become hot now. Needs a name
If idea remains hot after speed test vs magic bitboards I would be very glad to try to integrate in Stockfish and test there in real games with a complete chess engine around it.
In any case it is very interesting. The only little issue that I note is the use of bsr64 that is not usable on 32 bit machines, as the one that I have and on which I can test BTW
One of the advantage of magic bitboards is that there exsist a 32 bit friendly version, and this is very important for a candidate general solution to sliding attacks calculation as I think could be this new one.
I would vote "Obstruction Difference" (instead of obstructed) or "Hoffman's Obstruction Difference" (omit "action", but we can leave the smiley in the name if you want ).
"Obstructed difference" sounds like it's obstructing the difference somehow, but "obstruction difference" sounds like what it's doing: taking the difference of the obstructions.
By the way, thanks to both of you for working this out. Oddly, just before I read this the other day, I was thinking there had to be a way to generate moves similar to this, but couldn't quite work it out in my head.