Color-Independent Code
So far I had assumed it was unavoidable to have color-dependent code, as the merging of white and black attackers sets need shifing in opposit directions for interleaving their bits. The white attackers are in the even bits, and the black attackers are in the odd bits. So after masking away the protectors of a piece, to combine set1 and set2 requires set1 + 2*set2 for white (left shift), and set1>>1 | set2 for black attackers, i.e. execution of different instructions, raher than use of different data. Specifying a left-shift of -1 doesn't give you a right-shift, in i386 architecure, but a lef-shift of 31.
But there seems to be i386 instruction ROL and ROR ('rotate' left/right). These act like (x >> N) | (x << 32-N). There are no C operators directly specifying those, but I cannot exclude compiler optimizers are smar enough to recognize these expressions, and use a ROR instruction to implement them. Since (like with the plain shifts) the shift will be interpreted modulo 32 (or 64, in a 64-bit register), negative numbers here actually work. As unlike plain shifting, rotating 31 bits left or 1 bit rights is actually the same. Now rotating and shifing is no precisely the same. But in the current application he bit shifted out of the word is always a bit that has just been cleared by masking away the protectors, rotate is the same as a (logical) shift. So I can write color-independent code after all, if needed forcing the use of ROR shough an embedded assembler instruction, just as I do for BSF.
The 'Final' design
Since the design now seems to converge to a near-optimal state, and since in the process of this blog has changed several times, let me reiterate what we arrived at. The engine uses several data structures to represent the game state, with a high degree of redundancy:
The data structures
1) A 16x12
mailbox board with a 2-wide rim of 'edge guards', indexed by square number in 0x88 format (or perhaps offsetted by 0x22 to avoid negative square numbers in the rims). The 8x8 playing area of the board[] array contains (unique) piece numbers, not piece types.
2) A
piece list, indexed by piece number. his is implemented as a number of separate arrays, one for each aspect of the piece we want to tabulate / keep track of. Most important are location[], which contains the number of the square the piece is on. Other important (but static) data about the piece is a pointer to a list of move steps in can make (for leapers, as a difference of square numbers) or directions it can move in (numbered 1-8). This basically defines the piece type. There also are pointers to the Piece-Square Table for the piece, and the table of Zobrist keys. And the piece values.
3) An
attack map. As this is also indexed by piece number it formally counts as a part of the piece list. But the whole design revolves around it, so I mention it separately. Each piece has a 32-bit int associated with it, recording the pieces that attack it. These elements of the array attackers[] are thus piece
sets, where each of the 32 pieces is represented by a single bit. The assignment of pieces to bits is critical: white and black pieces alternate, and going from least-significant bit to mostsignificant bit, the pieces increase in value. This means that extracting bits with the BSF instruction will retrieve the pieces in the set in order of increasing value, Lowest-Valued Attacker first. To be specific, the bit assignment is:
Code: Select all
(MSB) kKqQrRrRbBbBnNnNpPpPpPpPpPpPpPpP (LSB)
Since the set contains pieces of both colors, only half the pieces are true attackers, the other half are protectors.
4) A
neighbor table. This is an auxiliary data structure that for each
occupied square holds the number of the nearest occupied square in each of the 8 directions. So it is actually a two-dimensional array, indexed by square number and direction. Rather than neighbor[sqr][direction] the 'row' of 8 directions is put in a struct, which makes it easy to copy it in its enirety (as the eight unsigned char square numbers fit in a 64-bit int). The neighbor can be a rim square. If you hink about it, you will recognize the neighbor table as a collecion of doubly linked lists, (where the cells are the squares), one list for each rank, file or diagonal. That makes it very easy to delete or re-insert squares, when these get evacuated or reoccupied.
5) A
capture stack in each node. This is the logical equivalent of a move list for the captures. But it is not an array where the elements each contain a description of a single move. The elements of the stack are also
sets, where each bit represents a capture, and the location of that bit in the set determines what capture that is. In this sense it is similar to the attack map, and it basically is a copy of the half of the attack map that holds the attacks on the pieces of the side that is not on move, leaves out the protectors and has reshuffled the bits into MVV/LVA order. Each element of the capture stack is a 'capture set', where each bit represents a capture. The assignment of the bits is reminiscent of that in attack-map elements. Except that it doesn't contain attackers of different color on the same piece, but attackers of the same color on two different pieces. And they concatenate two such patterns, as they use 64-bit word length. So one capture set holds all captures on a group of 4 pieces. So in theory 4 such capture sets can hold all conceivable captures.
6) A conventional
move list for the non-captures. A this point we will not pay any attention to that, as there are so few nodes where non-captures are generated that the details for this have no significant impact on speed.
The process of capture generation in this engine consists of two steps: update the attack map of the parent node (through copy-make), for the effect of the latest ply, and then a sorting steps, transforming the attack map to a collection of capture sets on the capture stack. After this the captures will be searched in the order they are extracted from the capture stack, as if the latter was a 512-bit machine word.
The sorting step
Filling the capture stack from the attack map always starts by loading pairs of attack-map elements in a 64-bit access, and masking away the protectors and the already captured true attackers. (Since we have to do a masking anyway to get rid of the protectors, getting rid of the captured pieces is completely free, and allows us to leave the attacks these pieces were making just before they were captured in the map, saving us a lot of work.) Two such masked pairs will then be interleaved to get a capture set. We need to make 4 such sets (as there are 16 opponent pieces to capture). Two of those will only contain captures on Pawns, a third captures on minors, and the fourth captures on majors.(There will not be any captures on a King at this stage.) The piece numbers are assigned to the minors and majors such that Q and both B end up in the least-significant half of the capture sets, while both R and both N end up in the upper half. The merging requires a shift and a + (or |) to interleave the pairs.
The capture set for the majors can be pushed onto the capture stack 'as is'; the captures in those would extract in MVV/LVA order. But for Pawns and minors we have to re-order them. This is done by splitting the captures on minors into three (P x minor, minor x minor, major x minor), and pushing those on the capture stack separately, such that they would be treated in the listed order. (In the original capture set major x B would preceed PxN, which violates MVV/LVA.) Likewise, we split the captures on Pawns (2 sets) each into two (P x P and other x P), and push the four resulting sets onto the capture stack such that the P x P would be treated before the other x P. So we finally end up with up to eight capture sets on the capture stack.
We avoid pushing of empty capture sets to the stack, though, by suppressing the increment of the stack pointer after copying such a sets to the stack through a conditional move. So the entire procedure of filling the capture stack from the attack map is branch free. The downside of that is that we always have to do the entire job: Even if we already know in advance tha capture of a Pawn is futile, or we will almost certainly get a cutoff by capturing a Queen, we will still have been splitting the Pawn sets. But the splitting is very little work, and a single branch misprediction would already be more costly.
The capture search
Extraction of the captures, and searching them, will be done in a single loop, which only has branches to stay looping (a likely mispredict when the loop terminates), and a branch to break out of the loop for a beta (or alpha) cutoff. (But the routine it calls for searching the captures could of course have its own mispredicts.) As the captures extract from the capture stack in MVV/LVA order, the only task of the loop is to 'decode' the bit & word numbers to piece numbers for mover and victim. This can be done through a lookup table (of which an element specifies both attacker and victim). All capture sets on the stack map their bits in the same way to pieces, except that the victim needs an offset that depends on the set (0 or 4 for the Pawn sets, 8 for the minors and 12 for the majors). This offset will be pushed on a stack together with the capture set.
The map update
Logically, the update a map needs after a capture consists of two parts: attacks by pieces of the player that moved, and attacks by pieces of his opponent. The attacks by the opponent are affected much less than his own (as the opponent did not displace any of his pieces). One of the opponent's pieces disappears in the capture, but we take care of that by removing it from the presence mask and setting its own attack-map element (for the attacks
on it) to zero. That is just two alterations. We leave the attacks
by it in the map, (where they are scattered over many elements), as whenever such an element is used, the masking with the presence mask (from which the piece was removed), which had to be done anyway to select pieces of the right color, causes these attacks to be ignored.
And then there is the discovery of slider moves, which for opponent sliders is 'pin enforcement'. The map helps is own updating here, as it already lists all attacks on the moved piece before the move. So we just select the sliders from that with a mask, and then for each slider we extract from the remaining set, identify it, get its location, get the direction of its attack, get the neighbor in the opposit direction, and apply the slider's attack to the attack-map element of that new downstream target. That is a bit cumbersome, but for enemy sliders there are on average much fewer than one for which you have to do all that. Most pieces are not attacked by enemy sliders.
The other half of the update involves more. Here we really have to generate moves of the moved piece, both from its old location (to remove those from the map), and its new one. But that is still a lot less work than generating moves for all pieces, which would be needed with move generation from scratch. And then we also have to discover friendly slider moves ('discovered attacks'). There tend to be a lot more of those than enemy sliders. Still, you would only have to generate a capture of each slider that hits the piece in one direction. And not every slider will hit the piece. So again this should be much less work than move generation from scratch. To save on loop overhead, discovery of friendly and enemy sliders can be done in a single loop.
Finally we have to account for the attacks on the moved piece. The old attacks now hit thin air. In the new location it inherits the attacks of the piece it captured (minus its own; he must have been attacking it when he captures it!). We have to be careful to do things in the right order, to make sure discovered slider attacks that also hit the piece in its new location get assigned to the right piece.
The move generation
The few capture generations we have to do will be done as follows: for sliders we look up their location in the piece list, as well as a pointer to the list of the 4 or 8 directions they move in. We then loop through that list, and for each direction in it consult the neighbor table at the from-square, to see where the nearest obstacle in that direcion is. And on the board we then find what stands there. We then record (or remove, as the case may be) the attack of that slider in the attack-map element of the hit piece. It cannot be an empty square (the neighbor table guarantees that), and if it is an edge guard we apply the attack to the (dummy) attack-map element for edge guards. (This to avoid having to test and branch. No one will ever look at that dummy element, so it doesn't matter how much we maltreat it.)
If the moved piece is a leaper, we again look up its location in the piece list, as well as the pointer to its list of board steps. We then loop through the 2 or 8 elements in that list, and for each step, calculate where it ends, look on the board what is there, and apply or remove the leaper's attack to the attackers set of what it hit. In this case it can hit empty squares or edge guards as well as pieces, so empty squares have a dummy attack-map element too.
Note that we don't care about the color of the pieces we hit; the attack map contains both true attacks and protections.