bitboards like mailbox

tcusr · Post by **tcusr** » Tue Nov 23, 2021 4:18 pm

hgm wrote: ↑Tue Nov 23, 2021 2:39 pm All this still seems very slow compared to a neighbor table, from which you read the square numbers of the move targets directly. Then it would just be a matter of loading the from-square, and using it as an index register for 4 load instructions to fetch the to-squares, 5 instructions in total. Of course you would have the overhead of updating the neigbor table as a prequel to move generation, but for these 4 directions that would just require loading the square that the latest move evacuated, using it as an index to load the square numbers of its 4 neighbors, and store those 4 square numbers indexed by each other. That is only 9 instructions. (But it would also take 9 instructions to undo that during UnMake().)

yeah, it feels clunky. i'm officially over with this hybrid idea

i'll try again koggestone simd though because i don't understand how it can be compared to magic bitboards like this guy says https://www.talkchess.com/forum3/viewtopic.php?t=46974.
without using native instructions the assembly is really really long, not everyone knows how to compile a program, a lot of normal users will have a slower version because of this.

hgm · Post by **hgm** » Tue Nov 23, 2021 4:21 pm

'This guy' programs on a GPU, right?

tcusr · Post by **tcusr** » Tue Nov 23, 2021 5:06 pm

hgm wrote: ↑Tue Nov 23, 2021 4:21 pm 'This guy' programs on a GPU, right?

yes.
i should inform myself better before asking questions

hgm · Post by **hgm** » Tue Nov 23, 2021 8:19 pm

Perhaps there can also be substantial gains in the way you process the move targets once you got them. The conventional method is to join these with the from-square and write those in a move list. And then you have to assign sort keys, and sort that list later.

There is one difference between extracting targets from a bitboard and reading them from the neighbor table: with the bitboard you would by definition only have on-board targets, and can limit those through a single AND operation to squares occupied by enemies. This has an advantage as well as a disadvantage: the disadvantage is that the number of iterations in the extraction loop is variable, leading to a frequent mispredict for terminating a loop that on average has few iterations. With the neighbor table you would always have 4 neighbors, (so a fixed number of iterations), but you would have to discriminate the actual captures from the off-board and friendly pieces for each of those separately. Something like

Code: Select all

int victim = board[toSqr]; // occupant of the to-square
moveList[lastMove] = toSqr + 256*fromSqr; // pack move & store at end of move list
lastMove += (victim & opponent) != 0; // expand move list by one if victim is opponent

But you would still have to loop through the move list afterwards (but only the accepted moves) to assign sort keys, and then again several times to sort it.

So how about doing away with the move list (for captures) and instead use an attack map for a little speedup?

Use two static tables for mapping pieces on bits, victimBit[] and attackerBit[]. (You could also do those mappings on the fly, through something like 1<<piece, but tabulating is more flexible.) In each node, use an array attackers[] in stead of a move list. Each element of that array corresponds to a victim. Each bit in such an element corresponds to a piece, and gets set when that piece attacks the victim in question. So after your move generator produces a move (fromSqr, toSqr) of a given 'piece', you do

Code: Select all

int victim = board[toSqr];
attackers[victim] |= attackerBit[piece]; // mark the victim as attacked by piece

This can be done unconditionally, whether victim is a friend or foe, or even when it is empty or off board (assuming the board has some boundary guard there). The attackerBit[piece] is the same for all toSqr of a piece, and is loaded in a register before the loop over to-squares. So after production of toSqr this is just two loads, an OR and a store.

After the entire move generation, instead of messing with a move list, you can then loop through attackers[victim] in MVV order, and for every victim then over the attackers in LVA order. That latter loop can be done through bit extraction, if you assign the bits in the proper way. No assignment of sort keys, no sorting, just two nested loops.

There is a practical problem, though: with the given code you would have to clear the entire attackers[] array before you start move generation. (Or at least the elements you are going to use, for the opponent victims.) To avoid that, you can use a scalar variable 'attacked' where each bit corresponds to a victim, and flags whether the attackers[] element for that victim was already used before, or is still undefined. You can clear that variable in a single instruction, declaring all attackers[] elements invalid. And if an invalid element is then used later, it is cleared first, and marked as valid.

Code: Select all

int victim = board[toSqr];
int b = victimBit[victim];
int a = (b & attacked ? attackers[victim] : 0); // assume 0 if not used before
attacked |= b; // mark as used
attackers[victim] = a | attackerBit[piece];

(3 loads, a bit test, a conditioal move, 2 OR and a store = 8 instructions. For 4 targets that makes 32 instructions.) As a side effect the variable 'attacked' will now indicate which pieces are attacked after all move generation is done. If we assign the bits such that they would extract in MVV order, we can use it to restrict looping over victims to looping over attacked victims only, by using bit extraction. For each attacked victim we then loop through the attackers, and for each attacker we have a move ready to be searched.

tcusr · Post by **tcusr** » Thu Nov 25, 2021 9:29 pm

hgm wrote: ↑Tue Nov 23, 2021 8:19 pm Perhaps there can also be substantial gains in the way you process the move targets once you got them. The conventional method is to join these with the from-square and write those in a move list. And then you have to assign sort keys, and sort that list later.

There is one difference between extracting targets from a bitboard and reading them from the neighbor table: with the bitboard you would by definition only have on-board targets, and can limit those through a single AND operation to squares occupied by enemies. This has an advantage as well as a disadvantage: the disadvantage is that the number of iterations in the extraction loop is variable, leading to a frequent mispredict for terminating a loop that on average has few iterations. With the neighbor table you would always have 4 neighbors, (so a fixed number of iterations), but you would have to discriminate the actual captures from the off-board and friendly pieces for each of those separately. Something like
Code: Select all
int victim = board[toSqr]; // occupant of the to-square
moveList[lastMove] = toSqr + 256*fromSqr; // pack move & store at end of move list
lastMove += (victim & opponent) != 0; // expand move list by one if victim is opponent
But you would still have to loop through the move list afterwards (but only the accepted moves) to assign sort keys, and then again several times to sort it.

So how about doing away with the move list (for captures) and instead use an attack map for a little speedup?

Use two static tables for mapping pieces on bits, victimBit[] and attackerBit[]. (You could also do those mappings on the fly, through something like 1<<piece, but tabulating is more flexible.) In each node, use an array attackers[] in stead of a move list. Each element of that array corresponds to a victim. Each bit in such an element corresponds to a piece, and gets set when that piece attacks the victim in question. So after your move generator produces a move (fromSqr, toSqr) of a given 'piece', you do
Code: Select all
int victim = board[toSqr];
attackers[victim] |= attackerBit[piece]; // mark the victim as attacked by piece
This can be done unconditionally, whether victim is a friend or foe, or even when it is empty or off board (assuming the board has some boundary guard there). The attackerBit[piece] is the same for all toSqr of a piece, and is loaded in a register before the loop over to-squares. So after production of toSqr this is just two loads, an OR and a store.

After the entire move generation, instead of messing with a move list, you can then loop through attackers[victim] in MVV order, and for every victim then over the attackers in LVA order. That latter loop can be done through bit extraction, if you assign the bits in the proper way. No assignment of sort keys, no sorting, just two nested loops.

There is a practical problem, though: with the given code you would have to clear the entire attackers[] array before you start move generation. (Or at least the elements you are going to use, for the opponent victims.) To avoid that, you can use a scalar variable 'attacked' where each bit corresponds to a victim, and flags whether the attackers[] element for that victim was already used before, or is still undefined. You can clear that variable in a single instruction, declaring all attackers[] elements invalid. And if an invalid element is then used later, it is cleared first, and marked as valid.
Code: Select all
int victim = board[toSqr];
int b = victimBit[victim];
int a = (b & attacked ? attackers[victim] : 0); // assume 0 if not used before
attacked |= b; // mark as used
attackers[victim] = a | attackerBit[piece];
(3 loads, a bit test, a conditioal move, 2 OR and a store = 8 instructions. For 4 targets that makes 32 instructions.) As a side effect the variable 'attacked' will now indicate which pieces are attacked after all move generation is done. If we assign the bits such that they would extract in MVV order, we can use it to restrict looping over victims to looping over attacked victims only, by using bit extraction. For each attacked victim we then loop through the attackers, and for each attacker we have a move ready to be searched.

thank you very much hgm, you made me realize that i actually want to write a mailbox engine. i've been using bitboards from the start and always snobbed mailbox for no reason. i have a few questions on how to proceed if you don't mind
1) you seem to prefer 16x12 over pure 0x88, why? to the only thing that changes is that we do board[square] != RIM instead o !(square & 0x88)
2) should i use a piece list?
3) will the the mailbox approach interfere with search techniques (that are maybe better suited for bitboards)?
4) what mailbox tricks should that you use should i implement? i'll for sure use the neighbor table for example

thanks

hgm · Post by **hgm** » Thu Nov 25, 2021 10:48 pm

I prefer to have the rim of guards because it makes testing for a valid move more homogeneous, so the various cases can be combined. E.g. if you always generate 8 knight moves irrespective of where the knight stands, you want to discard both moves that stray off board and moves that hit a friendly piece. By clever encoding of the pieces you can combine these tests to one. E.g. use the '16' bit in the piece number as a 'white' flag, and '32' as a 'black' flag, and set both those bits for the edge guards, you can just test whether the bit for your own color is set to reject both capture of friends and edge guards. And if a square in the neighbor table always has neighbors in all directions, some of those neigbors can be edge guards. You then cannot simply do victim = board[toSqr] on all toSqr you get from the neighbor table, (as in the examples above) but would first have to test whether toSqr is on board (introducing another if statement). (An additional advantage for me is that I often deal with boards of deviating sizes, for which there is no eqivalent to the sqr & 0x88 test.)

A piece list is highly recommended. This makes it far easier to loop through all your pieces, (e.g. for generating the non-captures of each), as the length of the list is only 16 (even if you leave holes for these that are captured). While getting the pieces from the board would have to go through 64 squares. And you would have to identify each piece you bump into, while when looping through the list you would know what pieces you will encounter. (E.g. you could have a separate loop over the Pawn section of the list, so that you know you will only encounter Pawns, and automatically generate Pawn moves for them.) The board in that case should contain piece numbers rather than piece types.

I think that search techniques are in general independent of the game-state representation.

Mailbox board with rim + piece list + neighbor table should already give you a very competitive implementation. The neighbor table is not a standard feature. The attack map as alternative for a move list w.r.t. the handling of captures is not limited to a particular board representation, but I think it would be an improvement over the conventional method for sorting the captures, so I would recommend that too.

Identifying passers is a bit cumbersome with mailbox. Therefore you should definitely use a pawn hash table, in which you store the pawn evaluation score, which files are half-open, which pawns are passers, perhaps the value of the pawn shields on the K-side and Q-side (to facilitate king-safety evaluation, and evaluation of castling rights), perhaps the number of pawns on each square shade (for evaluating good and bad Bishops). The hit rate on a pawn hash table is so high that you would only rarely have to calculate that information from scratch. So that you really do not care how fast such a calculation would be. The algorithm for identifying passers that I prefer requires two passes through all pawns: one to determine the most backward pawn in every file, and then a second one to check which pawn has passed the opponent's most backward pawn in all 3 adjacent files.

tcusr · Post by **tcusr** » Thu Nov 25, 2021 11:05 pm

thanks very much
hope you don't mind if i adopt your piece encoding idea

btw just a question, is it better to use a legal generator for mailbox?
EDIT:
i also don't understand why your piece lists are one-dimensional? do you index them by piece type + color?

dangi12012 · Post by **dangi12012** » Fri Nov 26, 2021 1:25 am

hgm wrote: ↑Thu Nov 25, 2021 10:48 pm .I prefer to have the rim of guards because it makes testing for a valid move more homogeneous, so the various cases can be combined. E.g. if you always generate 8 knight moves irrespective of where the knight stands, you want to discard both moves that stray off board and moves that hit a friendly piece.

Again with bitboards you dont need mailslot 0x88 or 10x12 Boards. You lookup the knights move & it with allowed mask (enemy empty or forced check prevention)
Which is 1 branchless lookup and one AND.

With bitboards you need a looped if statement for pruning the squares which is 8-10x slower!

hgm · Post by **hgm** » Fri Nov 26, 2021 8:19 am

But that is just where the work starts. Then you have to extract the moves one by one, and store them somehow. And since there is a variable number of moves to extract, you will likely have a mis-predicted branch for terminating the extraction loop. If you always loop over 8 moves, the loop can be unrolled, and would be truly branchless. This proved important. In the mailbox trials I tried to speed things up by only looping over on-board targets, by having different lists of pre-computed target squares for each from-square. But it was slower!

I admit that the advantage of mailbox is not so clear for knights and kings. But you have only 3 of those, so it is not a decisive factor. In the mailbox trials I was of course doing things incrementally, so that you would hardly ever generate moves for a piece from scratch. Knights cannot be blocked, so the only way a knight's moves change is when it moves itself.

You raise the interesting point of a check-evasion generator. Now this is not a primary concern, as only few positions are in check. But you can indeed save a lot of work in those. For mailbox you could do this by tabulating possible interposition steps as a function of the piece type, relative location of the piece to king, and the direction of the (slider) checker w.r.t. king. The latter would presumably be known from the check detection even before you start move generation. Each entry in the table could point to a -1 terminated list of step vectors that indicate how the piece would need to step to land on the check ray. For a knight there would be at most 2 such moves, so you would not have to loop over 8 moves and test those for interposition. In fact most of the time you would not have to loop over anything at all, because the knight cannot land on the check ray. Only when it can you would have to test whether it lands on this ray closer to the king than the checker, by distance[toSqr-king], the distance of checker to king presumably already known from the check test. When it does the square must of course be empty (or it would not have been a check), or capture the checker, so you don't have to test the board for the occupant.

gaard · Post by **gaard** » Fri Nov 26, 2021 8:35 am

hgm wrote: ↑Thu Nov 25, 2021 10:48 pm Identifying passers is a bit cumbersome with mailbox. Therefore you should definitely use a pawn hash table, in which you store the pawn evaluation score, which files are half-open, which pawns are passers, perhaps the value of the pawn shields on the K-side and Q-side (to facilitate king-safety evaluation, and evaluation of castling rights), perhaps the number of pawns on each square shade (for evaluating good and bad Bishops). The hit rate on a pawn hash table is so high that you would only rarely have to calculate that information from scratch. So that you really do not care how fast such a calculation would be. The algorithm for identifying passers that I prefer requires two passes through all pawns: one to determine the most backward pawn in every file, and then a second one to check which pawn has passed the opponent's most backward pawn in all 3 adjacent files.

Cheating should be allowed. If you add a member to you board structure like uint8_t pawn_on_files[ColorCount][1 + 8 + 1], where pawn_on_files[0/1][0/9] are off-board, and assuming the lsb of pawn_on_files[0/1][1-8] are rank 1 and the msb is rank 8, you can combine the benefits of mailbox with pseudo-bitboards if you update it incrementally. With a hash table the difference is probably a wash though, since the hit rate will be above 90%.

bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox