On-demand tablebase generation

mvk · Post by **mvk** » Sat Jan 25, 2014 1:16 am

Evert wrote:
hgm wrote:I haven't studied your code yet, but for comparison I transferred my 4-men generator dedicated for finding 3-vs-1 mating potential to my 2.2GHz i3 laptop, to time it there. KBNK took 6.58 sec.
Mine is 5.3s on a 2.9GHz i7, which definitely seems too slow by comparison.
I compared the speed with FairyGen as well, and that takes 1.6s for the same task.

I dug up my ancient (1996) 'pretty fast' kbnk and kqkr generators. It turns out they still work:
KBNK takes 76 ms.
KQKR takes 342 ms.
Single core. Machine+compiler speed has improved by a factor 22000 over the years.

hgm · Post by **hgm** » Sat Jan 25, 2014 10:12 am

mvk wrote:I dug up my ancient (1996) 'pretty fast' kbnk and kqkr generators. It turns out they still work:
KBNK takes 76 ms.
KQKR takes 342 ms.
Single core. Machine+compiler speed has improved by a factor 22000 over the years.

Wow, that is more like it. Indeed with the knowledge I have now I consider FairyGen a slow algorithm, and was hoping it could be done ~10 times as fast.

Evert · Post by **Evert** » Sat Jan 25, 2014 10:34 am

mvk wrote:
Evert wrote:
hgm wrote:I haven't studied your code yet, but for comparison I transferred my 4-men generator dedicated for finding 3-vs-1 mating potential to my 2.2GHz i3 laptop, to time it there. KBNK took 6.58 sec.
Mine is 5.3s on a 2.9GHz i7, which definitely seems too slow by comparison.
I compared the speed with FairyGen as well, and that takes 1.6s for the same task.
I dug up my ancient (1996) 'pretty fast' kbnk and kqkr generators. It turns out they still work:
KBNK takes 76 ms.
KQKR takes 342 ms.
Single core. Machine+compiler speed has improved by a factor 22000 over the years.

Cool, thanks for those!
The code is pretty neat and clean, I'll be looking at it a bit for ideas.

Mine is never going to be as fast because it is supposed to be generic, but I don't think that is the main cause of the slowdown. I tried getting rid of the recursive function (for building all positions) by writing a number of explicit loops, and this only improved performance by 10% - which is good because I'd like to hang on to it.

Probably the next thing I'll try is using a bitbase to keep track of positions that need work.

hgm · Post by **hgm** » Sat Jan 25, 2014 1:24 pm

OK, some comments to your posted code:

It seems to me that the code would spend most of its time in probe_tb (of which you did not post the source), as it seems from the arguments you pass to it that it has to calculate the index from the piece list every time you probe. My generators never calculate an index from scratch. The index either just increments (when you scan through the tablebase to find positions that need treatment), or is generated by moves. Now when on the real board piece i makes a move to = from + step, the probe index is simply newIndex = oldIndex + step*stride.

Note that an N-men tablebase is nothing but an N-dimensional (or 2N-dimensional, if you want) mailbox board for a game where you have only one piece (whose coordinates are the 2-D pieces) with different moves along each dimension (or pair of dimensions). To know where you have to probe, you can use regular mailbox move generation for generating the indexes.

With symmetry it is sometimes needed to transform indexes to map them back in the canonical sector. I do this by direct transformation of the index, rather than on the underlying board position, and then recalculating the index for it. My only generator that uses symmetry is 8x8, where the transformations are obvious (e.g. ^= 07070707, or for the diagonal reflection swapping even and odd groups of 3 bits). But it can be done almost as easily for linear mappings that do not use strides that are powers of 2, by keeping a separate rankIndex and fileIndex, which sum to the total index.

What also seems a significant source of slowdown (or at least would be, once you have optimized propagation) is that you do a mate test on all positions. It is much better to selectively generate mates. That way you save a large factor on generation of positions, and an infinite factor on testing for mate. A mate has to be a check, so when you have generated the attack bitboard of the white pieces before you started placing the black King (as you say, to take care of the X-ray checks), you now have an inCheck bitboard for black, and the mates must be a subset of that. In fact a position can only be a checkmate if all surrounding squares are also checked. So you could do

Code: Select all

inaccessible = inCheck | occupiedBlack;
mate = inCheck & ~king_attacks[whiteKing]; // no mate if next to king!
mate &= inaccessible << 1 | edge1;
mate &= inaccessible >> 1 | edge2;
mate &= inaccessible << 8 | edge3;
...

you would eliminate all checks where the king can simply step out of it as mate candidates, and typically this would be all checks. In fact typically you would already have eliminated all checks after the first 3 "mate &=", so it would pay to already test if(mate == 0) there, to take an early exit. In the rare case where anything is left in the bitboard, you could mark it as 'potentially lost' in the tablebase. The usual verification will then automatically discard it if their happen to be evasions by non-King moves.

There isn't really any need to test for move legality, as long as you start marking all wtm positions where black is in check as 'won'. There shouldn't be any reason then to test if a wtm position you are treating is legal or not, as illegal positions should never be 'newly won'.

hgm · Post by **hgm** » Sun Jan 26, 2014 2:50 pm

I have now also started writing a general (symmetry-less) 4-men generator, based on 1-bit representation. It keeps a single won[] bitmap for the wtm positions, and an array of lost[Nmax][] bitmaps for the btm lost-in-N positions. A symmetry-less 4-men has 16M positions, so each bitmap takes 2MB. It only calculates white wins.

The 6 least-significant bits of the index are the black King square. That makes that the bitmaps can also be interpreted as 256K bitboards for the black King, each for a different constellation of the other 3 pieces. The white attack set on such a bitboard (with these 3 pieces) would mark the positions where black is in check. It can be used to reduce the number of mate candidates to very few by shift and AND operations, treating 64 positions (differing in black King location) at once.

So the mate-detection step involves generating these attack sets. Of course standard bitboard techniques could be used for that, but these are often memory intensive, and not particularly fast (if they have to be applied from scratch to every 3-piece constellation, i.e. 256K times). So I wanted to use another method: For the first two pieces I prepare a table attacks[piece][square][blocker], which contains bitboards for the moves of the 'piece' on the 'square', in the presence of only one other piece, at square number 'blocker'. To prepare such a table for a leaper is trivial, as its attack set will be independent of the blocker, and would just be copied to 64 elements of the attacks[piece][square][] array. For sliders you would start doing that too, but you can then start clearing some bits of the bitboards where blocking took place. Anyway, no matter how you generate them, we are only dealing with 4K bitboards here, so it should always be fast. The effect of their mutual blocking can now be calculated during the scan through the EGT after placement of the second piece (i.e. 4K times per scan), through:

Code: Select all

table0 = attacks[0][pos[0]]; // tables to look up effects of blocking
table1 = attacks[1][pos[1]];
at0 = table0[pos[1]];
at1 = table1[pos[0]];

(where pos[n] the square number of piece n), i.e. piece 0 blocks 1 and piece 1 blocks 0. The third piece can be either black (2+2-men, in which case it does contribute to the black King's no-go area only by its presence), or a white leaper (3+1-men, if all else fails, we use the white King). So it can block piece 0 and 1, but will never be blocked itself. When assigning it a location in the course of the EGT scan, (256K times per scan) we calculate the white attack set for the 3-piece constellation as

Code: Select all

nogo = at0 & table0[pos[2]] | at1 & table1[pos[2]] | attacks_2[pos[2]];

with attacks_2[64] a table with attack sets for the third (leaper) piece. With 3 table lookups, (from 512-byte tables), 2 ANDs and 2 ORs that is about as cheap as you can get. The nogo board would go into the won[] bitmap to mark the black-in-check positions as won to white (because illegal). It can then be pre-screened by, say, calculating

Code: Select all

nogo & (nogo<<1 | edge1) & (nogo>>7 | edge2)

which in general would already eliminate all checks as possible mates, because there is either a step left or one diagonally backward that the black King could take as a legal evasion. Only if some set bits were left you would AND with all the other King steps, and what is left would be mate candidates, to be stored in the lost[0][] bitmap for later verification (to see if there are evasion through moves with the third piece, in case this was black). This verification (in needed, in 2+2-men) can take place as soon as we have treated all locations of the third piece, and would be ready to go to the next constellation of the first two (white) pieces. At that time the won[] bitmap would be complete (all illegal wtm positions being marked there) for the current placement of these two pieces, so that we can start moving the third and probe if such a move would lead us to a legal position (i.e. out of check). Note that the data set to be verified at this point is only 512 bytes, and thus certainly still present in L1.

After identifying the mates I use the leap-frog algorithm for the retrograde solving: the won[], lost[N][] and partially complete lost[N+1][] bitmaps are used to complete lost[N+1][] and generate a partially complete lost[N+2]. This will be done alternately with either the first or the second piece kept static. That means all moving pieces (three of them) access a range of 256K positions, 32KB worth of bitmap. So with 4 active bitmaps the working set will be 128KB, which is too big for L1, but easily fits L2 even in Nehalem architecture. The lost[N][] bitmaps will be accompanied by a 64 times smaller directory, which has 1 bit for each bitboard in the bitmap to indicate if there are any bits set in it, to speed up the scan over all positions.

I will report speeds once I have it running.

hgm · Post by **hgm** » Mon Jan 27, 2014 10:53 am

Using bitmap representations does have some important spin-offs. My earlier generators all treated lost-in-N positions one at the time, starting with white retro-moves to find the 'newly won' wtm positions, and from there on black retro-moves to 'potentially lost' btm positions, which are then marked as lost-in-(N+1) when verified by black forward moves, probing into the won bitmap. When 64 different black King locations reside in the same memory word, however, it becomes quite easy to do this in parallel. This can significantly speed up the work in the stage where the bulk of the wins/losses is assigned. E.g. in KBNK at the peak 2M losses and 1.5M wins are assigned in one iteration, from 16M positions in total. So at that stage the lost-in-N bitmap would have 12.5% of its bits set, and in a 64-bit word you would on average find 8 lost-in-N positions.

These positions can be subjected in parallel to the white retro-moving. Some positions in the 'king bundle' might have the black King on a square where it would block the white retro-move, but you can clear those bits as soon as the retro-move hits the corresponding square, when radiating out from the moving piece. This is especially useful when you collect the newly-won positions resulting from all retro-moves of a given white piece in a temporary 'newly won' bitmap, before continuing with black retro-moves. This requires only little storage. E.g. in KQKR, when working on the Q retro-moves, white K would be static, and QKR have 256K positions, held by a 32KB bitmap. This bitmap first collects all parents of lost-in-N (i.e. the 'won-in-at-most-(N+1)'), by merging moves into the same Q location coming from different Q locations. So that you don't have to compare them separately with the won bitmap to weed out those that were already won before, but just do it once for the merged set, to be left with the 'newly won' positions for that white-K location. These can then be used as the starting point for the black retro-moving, after which they have outlasted their purpose, and can be cleared to use the same 32KB buffer for the next white-K location.

The black retro-moves of any non-King can be similarly done with entire King bundles. (And for the black King different moves can be treated in parallel through standard bitboard techniques.)

tpetzke · Post by **tpetzke** » Tue Jan 28, 2014 2:15 pm

Hi,

currently I just try to get mine to work again at all, so I have a simple HowTo question

I have the following position

[d]8/8/8/8/1Q6/8/8/k2K4 b - - 0 1

I just calculated that this is a "Mated in 2". if I now generate the retro moves for White that lead to that position and assign each of those source positions a "Mate in 3" score, I end up scoring some positions as Mate in 3 which in fact are "Mate in 2" (eg. WQ comes from D2 is a Mate in 2 and not Mate in 3).

Using my current approach I get all positions correctly classified as Won or Lost, but the Distance to Mate is inflated.

I can think of expensive ways to solve that, but there must be an elegant way I think. How are others doing that ?

Thomas...

hgm · Post by **hgm** » Tue Jan 28, 2014 2:25 pm

Indeed, predecessors of 'newly lost' btm positions are all won (wtm) positions, but they are not necessarily newly won. You would have to test if they don't already contain a lower or equal DTx. Or, in the case you use just a single-bit flag to indicate 'won' (as I am doing), clear the 'candidate newly wons' wit 'won' to be left with the newly-won positions. Only newly-won positions need to be propagated further.

This is different with lost positions: the predecessor of a newly-won position can never be a lost position before (because upto then the move to the newly-won position was a move to an undecided position, providing the escape).

Evert · Post by **Evert** » Tue Jan 28, 2014 2:25 pm

tpetzke wrote:I just calculated that this is a "Mated in 2". if I now generate the retro moves for White that lead to that position and assign each of those source positions a "Mate in 3" score, I end up scoring some positions as Mate in 3 which in fact are "Mate in 2" (eg. WQ comes from D2 is a Mate in 2 and not Mate in 3).

Using my current approach I get all positions correctly classified as Won or Lost, but the Distance to Mate is inflated.

I can think of expensive ways to solve that, but there must be an elegant way I think. How are others doing that ?

I test if a position already has a "mate-in-n" score (as opposed to a "don't know"/"draw" score) and if it does, I only update the score if the new score is better (which I think never happens if you iterate in cycles and generate "mate-in-1" from "mate" positions and "mate-in-2" from "mate-in-1" positions).

In other words, the position is not a "new" won position, so you don't assign it a score and you don't search it again. You can keep track of that with a bitmap if you want, or just probe the table you're building up.

tpetzke · Post by **tpetzke** » Tue Jan 28, 2014 2:50 pm

I test if a position already has a "mate-in-n" score (as opposed to a "don't know"/"draw" score) and if it does, I only update the score if the new score is better (which I think never happens if you iterate in cycles and generate "mate-in-1" from "mate" positions and "mate-in-2" from "mate-in-1" positions).

I do it in cycles and it happens already in the first iteration (after the initial run where I assign the Illegals, Unknown, Stalemate, Mated and Mate in 1 scores).

As soon as I find a Mated in X score I immediately assign a Mate in X+1 to all predecessors. And here it happens that I assign a position that is still "unknown" a higher than necessary "Mate in X" score.

I guess if I finish to find all the Mated in X positions and only then process their predecessors it will work, but I thought it is sub-optimal.

Thomas...

On-demand tablebase generation

Re: On-demand tablebase generation

Re: On-demand tablebase generation

Re: On-demand tablebase generation

Re: On-demand tablebase generation

Re: On-demand tablebase generation

Re: On-demand tablebase generation

Re: On-demand tablebase generation

Re: On-demand tablebase generation

Re: On-demand tablebase generation

Re: On-demand tablebase generation