Branch Prediction Efficiency

Sesse · Post by **Sesse** » Sat Dec 31, 2022 11:08 am

poplsb() is presumably just finding the lowest bit (__builtin_ctzll(b) or similar intrinsic) and then removing it (b &= b-1).

dangi12012 · Post by **dangi12012** » Tue Jan 03, 2023 1:15 pm

Yes for optimal perfect prediction (essentially eliminating all branches)
you can introduce an integer template called config into movegen which will range from 0 to 3^8+1.

The template config when 6561 is for cases when there are more than 2 of a piece on the board. (so a promotion to 3rd queen) in which case you need the while loops.
All others are just the Knights, Rooks, Bishops, Queens in base 3 to represent 0,1,2 pieces. This will cover close to 100% of all positions.

You get the number of pawns for free. With this knowledge at compiletime you get very much information of the current board for efficient code. The templates will call towards 0 with less and less pieces.

Fun fact: all possible pawn configurations fit into a single integer as well (but range is too big for templating but small enough for memory). With the number of pawns + a single integer you can just lookup the bb for enemy pawns making the board representation a tad smaller.

syzygy · Post by **syzygy** » Thu Jan 12, 2023 3:45 am

Mike Sherwin wrote: ↑Wed Dec 28, 2022 2:55 pmSo everything positive over the last decade or so that I've read about how fast jump tables are compared to conditional jumps is bogus? Absolutely amazing!

If you MUST execute a switch on, say, 64 possible values it will be faster to implement the switch with a direct jump than to do 6 conditional branches. If all 64 values are equally probable, the probabilty of predicting everything right is 1 in 64 in both cases, but the direct jump will mispredict only once the other 63 out of 64 times.

However, if you can avoid the switch, then you win.

Your first method requires a switch for each piece. (OK, just on 6 values, not 64.)
Your second method does not require any switch.

The second method may mispredict a few more times on exiting the loops, but this is still better than a switch for each piece.
The first method might be slightly faster if you know there are only very few pieces.

Branch Prediction Efficiency

Re: Branch Prediction Efficiency

Re: Branch Prediction Efficiency

Re: Branch Prediction Efficiency