First I'm going to try and optimize the movegenerator which takes most of the time.
It's in delphi and written for win32.
Is this a reasonably fast way to do the BSF?
ericlangedijk wrote:First I'm going to try and optimize the movegenerator which takes most of the time.
It's in delphi and written for win32.
Is this a reasonably fast way to do the BSF?
On most (all?) x86 processors btc via indirect memory operand might be a bottleneck, since it is able to work on vectors with indices far greater 31, it has (much) worse latency even for indices <= 31. May be better to spend an extra register for the bsf-btc pair. I suggest to use a precondition and bsf without reset and notfound always called with none empty bitboards, otherwise likely the same condition is asked twice, inside and outside BSF. Instead of btc or btr, X "and" (X-1) might be faster (but does not work for bsr instead of bsf).