Sliding-path for incremental move generation

hgm · Post by **hgm** » Wed Apr 08, 2015 11:13 am

I did such a thing in my Chess program Usurpator, in the days where it was still difficult to reach a depth of 4 ply. I extended all 'new' captures there, i.e. capture of the piece that just moved, but also slider captures that went over the square evacuated in the last ply ('pins') and on the ply before ('discovered threats'), and captures with the piece yu moved two ply ago. That worked pretty well. But some of the moves that get attention this way are really stupid moves, which was OK when the depth was just enough to refute them, but becomes counter-productive at large search depth, as extra depth on them is just a waste of time. Close to the leaves this could still be useful, though.

Incremental methods can be very tricky, because you can easily fall into the trap of calculating information pre-emptively that in the end will never be used. E.g. updating your entire move list incrementally, just to find out that the first move you try already provides a beta cutoff, so you have to 'decrementally revert' the move list again. It would then probably have been much faster to just generate the cut-move from scratch, e.g. in staged move generation, starting to generate the captures of the MVV by just looking in all directions for something that attacked it.

So the most important aspect of any incremental scheme is how to decide whether an update is really useful. E.g. if you keep a list of captures that is incrementally updated, it would be a waste of time to update your own captures as a consequence of your own move, when the opponent would not play a move to make it your turn again. (E.g. because he stands pat, or because all his captures are futile.) So you would want to first update the move list for the opponent, after evaluation and possible stand-pat exit. And only if that update shows that the opponent has moves to play, also update your own move list.

In fact it might be even better to do a 'pilot investigation' to figure out if the opponent would have moves, before updating anything that you would have to revert when he has none. So check if the existing move list already contains a non-futile capture with a piece that was not captured in the current move, or, if not, whether evacuation of the from-square of that move discovered a non-futile slider attack. And only when any of this is the case, update the capture sets for both sides.

This would sort of implement the 'attention' idea: you would only be spending calculational effort when new moves are created that deserve your attention.

stegemma · Post by **stegemma** » Wed Apr 08, 2015 11:57 am

hgm wrote:[...]E.g. updating your entire move list incrementally, just to find out that the first move you try already provides a beta cutoff, so you have to 'decrementally revert' the move list again.[...]
In fact it might be even better to do a 'pilot investigation' to figure out if the opponent would have moves, before updating anything that you would have to revert when he has none.[...] This would sort of implement the 'attention' idea: you would only be spending calculational effort when new moves are created that deserve your attention.

I don't have to revert moves, because i use two separate arrays: one for moves and one for pointers to moves. In any node, I take a pointer to the last generated move and to the last pointer to playable moves. This way, I only add pointers to valid moves, without deleting invalid moves, when i go to the next ply. When i come back to previous ply, the moves has not been affected by following plies. When I generate new moves, do to the moved piece or sliding-path extension, I simply add to the end of the move array, eventually overwriting those already generated in the previous move of the same ply.

For sample, if at ply 0 i have generated moves:

[e3 e4 d3 d4... Nh3]

i have pointers to those moves too:

[*e3 *e4 *d3 *d4... *Nh3]

if I play e4, I only add new moves (Queen, King, Knight) at the end of the array, keeping the pointer to previous end:

0:[e3 e4 d3 d4... Nh3] 1:[e5 Ke2... Ne2]
0:[*e3 *e4 *d3 *d4... *Nh3] 1:[*d3 *d4... *Nh3 *e5 *Ke2... *Ne2]

when i come back to ply 0 and play for sample d4:

0:[e3 e4 d3 d4... Nh3] 1:[d5 Qd2...]
0:[*e3 *e4 *d3 *d4... *Nh3] 1:[*e3 *e4... *Nh3 *d5 *Qd2...]

At the moment I have an unique array for moves of both colors but I plan to use an interleaved array (to optimize cache use, i hope):

[white move, black move, white move...]

"For" loops should increment by 2 to search for the moves of one color and by 1 for both colors. maybe I can interleave even pointers to moves... but this is a further optimization to try, if needed.

For the idea to take care of the last move and its squares involved, i was thinking only to add a very little bonus to moves on that square, maybe it could be analyzed in a next step, when the program gets playing all moves in a perfect way.

As you said in the last part of your message, there could be a lot of tricks to optimize incremental generation, more than standard one.

hgm · Post by **hgm** » Wed Apr 08, 2015 12:56 pm

OK, I see. But the copy-modify approach for the pointers to the moves does seem a lot more expensive than update-restore, when the move lists get long. (Having pointers to moves rather than the moves itself would not offer any advantage for me anyway, as the moves in the move list are just 32-bit machine words, and are manipulated just as easily as any pointers.)

The problem with update-restore is that you really want the move list to be sorted, so that the data structure used to represent it should allow easy insertion and deletion of moves, rather than just appending them as a stack. That makes the restore a lot more expensive than just decreasing a stack pointer.

This is what attracts me to the use of bitmaps to represent the move list. If each bits represents a potential move, sorted in the required order, adding or deleting moves is just a matter of setting/clearing their bit. But this requires the potential moves to be sorted in MVV/LVA order, meaning that moves should not be associated with squares, (from, to), but rather with pieces (mover, victim). Even in orthodox Chess that would cause 16x16 = 256 potential captures for which a bit has to be reserved, too many for efficient bit extraction. Hence the idea to make it a two-level bitmap, with a 'directory' bitmap where each bit corresponds to a victim (in MVV order), and an array of bitmaps, one bitmap for each victim, where each bit corresponds to an attacker (in LVA order).

This would make obtaining the (from,to) representation of the move from the 'move set' a bit more expensive than usual, but you would only need that representation when you actually want to perform that move. The decision of whether you want to perform the move would be taken based on the victim and attacker/defender information which now has become the primary data of the storage format. In such a way that you would not even have to check whether it is what you need, but can just selectively access only those moves that you do need. Bitboards also benefit from this principle, where you can do bulk-selection of moves by bitwise Boolean operations on the bitboards, and then only extract the moves that satisfy your criteria. I think this principle is in general a winner: minimize work spend on concluding what you don't need, even if that makes it a bit more expensive to process the data that you do need. The move-set method would just take this a few steps further.

hgm · Post by **hgm** » Wed Apr 08, 2015 5:59 pm

I guess for Chess you could even hold the entire 'capture set' in a YMM register. Interpreted as a 16x16 bit matrix where the rows correspond to victims and the columns to attackers, you could clear all old attacks of the piece that moved by ANDing the set with a mask for the moved piece. Only adding the new moves of the moved piece would require complex calculation, finding the target squares, looking on the board what they contain, and then to what bits in the set these correspond. So that you can OR the capture set with the AND of the victim mask and the moved-piece mask to add the new capture to it. And when you do the update in a YMM register, it might not cost you anything extra to store it back in another location than the original, so that restoring the map on UnMake is not necessary.

I am not sure if there is an easy way to extract the lowest set bit from a YMM register, though (to get the next MVV/LVA capture).

Of course you would not really want to search the moves in MVV/LVA order, but postpone (or prune) dubious calptures (High x protected Low). But you would also keep a set of 'pseudo-captures', i.e. attacks on your own pieces. You need that for updating the capture set, as when a piece is captured the pseudo-captures of it now become real (re-)captures, so you would have to swap the corresponding rows of the captures set and pseudo-captures set. But that pseudo-captures row would tell you if the piece is protected or hanging. (In fact the rows provide the set of attackers and defenders, which you could feed to SEE for step-by-step extraction of the attackers and defenders, low to high by bit extraction.) So you could decide to postpone search of captures that look fishy, and keep a bitmap in capture-set format of the moves you postponed that way. Then you could do a second pass for doing all the postponed captures, using that capture set as the basis for bit extraction.

stegemma · Post by **stegemma** » Wed Apr 08, 2015 7:02 pm

hgm wrote:I guess for Chess you could even hold the entire 'capture set' in a YMM register. Interpreted as a 16x16 bit matrix where the rows correspond to victims and the columns to attackers, you could clear all old attacks of the piece that moved by ANDing the set with a mask for the moved piece. [...]

This could be interesting, in fact you can get the extension of sliding path just because you know the pieces that attacked the moved one. Instead of use the YMM registers (i would use it in assembly, if really i want to use it) you can use multiple 64 bit registers, maybe dividing the matrix between piece types. Four 4x16 matrixes could give similar performance and are portable.

hgm · Post by **hgm** » Wed Apr 08, 2015 7:36 pm

stegemma wrote:Instead of use the YMM registers (i would use it in assembly, if really i want to use it) you can use multiple 64 bit registers, maybe dividing the matrix between piece types. Four 4x16 matrixes could give similar performance and are portable.

True. But for a game where each side has 96 pieces the matrix would get too big to scan through it anyway. In that case a two-level approach would be faster: first extract the next-highest attacked victim from a victim set in MVV order, (which in itself would already span multiple 64-bit words, especially if you also have to reserve bits for the promoted versions of most of the pieces, which would need to be in another position than the unpromoted piece, when you want to maintain the sorting by value) and then extract attackers from a set only dedicated to that victim.

Removing a bit corresponding to some attacker (because it moved away) from the attackers set of all victims would become very expensive, though. As well as making a copy of the captures set. So one really should generate all (pseudo-)captures of the moved piece, and delete them from the attacker sets of the actual victims in that case. Almost no piece moves in more than 8 directions, and clearing the bit in 8 attackers set would be a lot cheaper than non-selectively clearing all bits in a 96x96 bit array 64-bits at the time. And on restore you would have to selectively set them again.

stegemma · Post by **stegemma** » Thu Apr 09, 2015 3:43 pm

hgm wrote:[...]But for a game where each side has 96 pieces the matrix would get too big to scan through it anyway. In that case a two-level approach would be faster: first extract the next-highest attacked victim from a victim set in MVV order[...]

96 pieces are well over my knowledge... i would be happy if i could make it works for standard chess. For now any attempt seems to give worst performance, in my engine. The last version gives 18 M nodes/second with incremental and 21 Mn/s without, using the same IsInCheck routine for both. It seems that looping on many moves would destroy the advantage of creating only a few new moves.

Maybe a bit-approach as you say could be better but i suspect that, at the end, I could fall in the same problems and performance.

There must be some other solution but it's not clear to me what could be.

hgm · Post by **hgm** » Thu Apr 09, 2015 6:36 pm

The capture-set approach would need a bit of refinement to do true MVV/LVA, because pieces occur in pairs. So PxR2 would have to be sorted before BxR1. That means that a 2-level approach really should not extract the next victim in line, and then all attackers of that victim, but get the set of attackers of all victims of equal value at once, and extract the LVA on any of of those victims. This does not really have to be a problem.

Like you mentioned, when you have a bit-set of all attackers on a piece, (which is the capture set ANDed with the victim mask of that piece), you know exactly which attacks have to be extended due to evacuation of its from-square, when that piece is moved.

To make such extension efficient it would be very useful to have a table that for each square and direction contains the distance to the nearest neighbor in that direction (which could be an off-board boundary square, when there is no such neighbor). Such a table is very easy to update incrementally, as each square points to all its eight nearest neighbors, which point back to it in the opposite direction, and have to be made pointing to each other when the square is evacuated.

So you would have to AND the masked-out attacks on the moved piece with a slider-attacks mask, extract the attacks, relate them to a piece number from a bit-to-attacker table, relate them to a board square from a piece list that contains the location of each piece. From the attacker location and the evacuated square would follow the attack direction. The neighbor table would then tell you the distance to the new target of that slider, you would look on the board to see which piece is there, fetch its victim-mask from the piece list, AND it with the attackers-mask of the slider that you extended the move from (also from the piece list), to be left with the single bit corresponding to the capture of the new victim by the extended slider. This would then be ORed into the capture set.

That is a pretty-long-latency data flow. But it could be done for each of the moves to be extended in parallel, so that is not too bad. And it would of course not be so bad as it looks, as very often there would not be any slider attacks at all on the moved piece.

All attacks on the moved piece could be removed by ANDing the capture-set with the complement the piece's victim-mask, and its own attacks (and that of the piece it captures) would likewise be removed by ANDing with the complements the respective attackers mask. You would then only have to add the (pseudo-)captures of the piece from its new location, and the incremental update is done.

stegemma · Post by **stegemma** » Thu Apr 09, 2015 8:28 pm

hgm wrote:[...]So you would have to AND the masked-out attacks on the moved piece with a slider-attacks mask, extract the attacks, relate them to a piece number from a bit-to-attacker table, relate them to a board square from a piece list that contains the location of each piece. From the attacker location and the evacuated square would follow the attack direction. The neighbor table would then tell you the distance to the new target of that slider, you would look on the board to see which piece is there, fetch its victim-mask from the piece list, AND it with the attackers-mask[...]

Ops... and after all that mask'ing and and'ing i'll get... time's over!

I think that a working solution should need a complete re-design of the generator, not just some adjustment. You'll right when you speak about a bit approach, that handle the "concept" of attacker/victim some-how. Maybe even speaking in terms of "moves" has to be changed: not "moves" but "path" from a source to a destination or from a piece (attacker) to another piece (the victim), as you say.

Now i'm mixing the best of two last move generators i've wrote, keeping some idea from the incremental one and applying to the simpler generator, the faster one.

I would like to complete the next version of Satana this way, because it's time to work hard on alfa-beta search, that was minimal since now, in my engines.

hgm · Post by **hgm** » Thu Apr 09, 2015 11:23 pm

I am trying to write this up as pseudo-code, but I ran into a problem:

I want to keep track of both captures and pseudo-captures (i.e. of own piece), so that both enemy and friendly slider moves to an evacuated square are known, and can be easily extended to the downstream victim. But on a capture the new occupant of a square must 'inherit' the attacks on the previous occupant as pseudo-attacks on it, and the pseudo-attacks as attacks. This would be easy if the 256-bit captures set would be a simple 16x16 matrix that for each victim row had the 16 attackers in the same order: it would be a matter of copying one row to the other, implemented as masking out 16 contiguous bits, shifting those to the position of the other row, and ORing them in there. But it would be very cumbersome if the 16 attacker bits of a row are not in the same relative position, because in one case they were interleaved with the attacks on a second piece of the same type, but not in the other.

This is especially bad because there are more Pawns than copies of any other piece. For all other piece types the set of attackers on them could be interleaved in pairs. (You could interleave attackers of King with those of Queen without any ill effects, as you would just want to test captures of the King for presence, and not extract them in any particular order, while you would only start to extract captures of the Queen in nodes where there are no captures on the King.) But for Pawns you would want to interleave the attackers sets of all 8 Pawns, to make sure all PxP are extracted before all NxP, etc. That means the bits representing the attackers of a particular Pawn would be spread out with a spacing of 8 in the captures set, while those of other pieces would be spread out with a spacing of 2. So it would be very hard to make a Knight inherit the attacks on the Pawn that it captured.

To patch this up we could abandon the strict LVA ordering of the captures of Pawns. This might be justified by the observation that captures of Pawns by non-Pawns are rarely good captures when the Pawn is protected. So they are likely to be filtered out as 'bad captures' in the first MVV/LVA pass through the captures set, to be searched in a later pass (if at all). And if they are captures of an unprotected Pawn there really is little reason to think that QxP would be any worse than PxP. If you filter out the negative-SEE captures, you will only be left with SEE=0 and SEE=1 captures when you capture a Pawn. That is just as true for PxP as for other x P. So if you want to do smart ordering of capturing Pawns, LVA ordering is pretty pointless, and only ordering by SEE would help. If you want to do the latter, you could just make an extra pass through the Pawn-victim section of the captures set: first play and search only the SEE=1 moves, not caring whether Q x unprotected P is tried before or after P x P, then play only the SEE=0 moves. (And forget about the SEE < 0 moves, or postpone those until after non-captures.)

So then the captures set would be structured as 8 sub-sets of 2x16 attackers on a pair of pieces. The pairs would be (K,Q), (R1,R2), (B1,B2), (N1,N2), (P1,P2), (P3,P4), (p5,P6) and (P7,P8). (LSB to MSB) Within each pair the attackers would be ordered similarly, e.g. for the (K,Q) pair as

KxK, KxQ, QxK, QxQ, R1xK, R1xQ, R2xK, R2xQ, B1xK, B1xQ, B2xK, B2xQ, N1xK, ...

(MSB to LSB). Then the set of attackers of a particular piece could be extracted by ANDing the 256-bit captures set with 0x55555555<<S, where S=8*N or 8*N+1 (depending on which piece of the pair you need), and could then simply be shifted to the location of another victim to make that inherit the captures on the piece it replaced.

Sliding-path for incremental move generation

Re: Sliding-path for incremental move generation

Re: Sliding-path for incremental move generation

Re: Sliding-path for incremental move generation

Re: Sliding-path for incremental move generation

Re: Sliding-path for incremental move generation

Re: Sliding-path for incremental move generation

Re: Sliding-path for incremental move generation

Re: Sliding-path for incremental move generation

Re: Sliding-path for incremental move generation

Re: Sliding-path for incremental move generation