endgame table generation

Dave Gomboc · Post by **Dave Gomboc** » Sat Sep 17, 2022 2:15 pm

Hello. (Long time, no login/read/post!)

I am interested in learning what readable code is publicly available today for endgame table generation.

I am, of course, well aware of Syzygy. https://www.chessprogramming.org/Syzygy_Bases provides links to many posts in this forum that I have followed and read. The README for Syzygy's repository, https://github.com/syzygy1/tb, indicates that the "rtbgen" and "rtbgenp" programs are used for generation. (That README does seem out of date in some respects, though: for example, it focuses upon 6-piece support, though I see that several commits were made to support 7-piece table construction, and also that Syzygy 7-piece tables have been available since 2018.)

I would be particularly interested to learn about discussions that explain its generation/construction code. I did find extensive discussion of the probing code (specifically, related to Stockfish's implementation of Syzygy probing), and I also found some interesting discussion re: Re-Pair compression. However, I didn't yet come across postings that discussed or elaborated upon the part of the code that performs the retrograde analysis.

From time to time, other table formats are also mentioned. I recall that Steven Edwards made much of his code available, though those tables were not terribly compact. I recall that Eugene Nalimov's tables were quite popular for a while, but to the best of my recollection, no table generation code for those was ever released. What other open-source implementations of endgame table generation are available today? Generation code for simpler variations such as theoretic WDL are also of some interest, though it seems to me that code that handles issues regarding 50-move progress would be most useful.

Dave

hgm · Post by **hgm** » Sat Sep 17, 2022 4:01 pm

There are two issues here: generating the tables, and compressing them for more efficient storage. I never bothered with the latter.

For the generation there are many algorithms. For fast generation you would want to do it in memory. Whether that is possible depends on the number of men, and how many of those are pawns. There are algorithms that use one byte per position, to store the DTx, and algorithms that use a bit per position, and crank out a new bitmap for each new DTx. On my old website ( http://home.hccnet.nl/h.g.muller/chess.html ) I published the source of a generator that uses bytes, for pawnless end-games up to 5 men. This generator was the basis for FairyGen. Marcel van Kervinck once posted a 'Pretty Fast' generator here for KBNK, using bits, which was indeed extremely fast. End-games with pawns are much simpler, as these have no symmetry. But they require more memory.

I also implemented 3- and 4-men generators in Javascript for a checkmating applet with fairy pieces ( https://www.chessvariants.com/page/MScheckmating-applet ). It uses bytes, which initially contain an 'escape count' (the number of non-losing moves), and replace this by the DTM code when the count hits zero.

expositor · Post by **expositor** » Sun Sep 18, 2022 8:00 am

It's not especially readable, but you might be interested in Expositor's internal 3-man tablebase, which is generated on the fly when the engine starts up.

Dave Gomboc · Post by **Dave Gomboc** » Sun Sep 25, 2022 9:29 am

hgm wrote: ↑Sat Sep 17, 2022 4:01 pm There are two issues here: generating the tables, and compressing them for more efficient storage. I never bothered with the latter.

Right. I'm inquiring about the first category, the generation (or construction).

hgm wrote: ↑Sat Sep 17, 2022 4:01 pm For the generation there are many algorithms. For fast generation you would want to do it in memory. Whether that is possible depends on the number of men, and how many of those are pawns.

Sure... and also how much RAM you have at your disposal.

I am aware of a general approach called the "grandfather algorithm", though I don't recall at the moment if that is actually specifically defined in a publication somewhere. It seems to me that there may also be other reasonable general approaches that I'm unaware of, though.

hgm wrote: ↑Sat Sep 17, 2022 4:01 pm There are algorithms that use one byte per position, to store the DTx, and algorithms that use a bit per position, and crank out a new bitmap for each new DTx.

That's interesting. I am familiar with using a counter (might be larger than a byte, if using DTM or if tracking accurate DTC past 100 ply) per position, but I wasn't aware that some folks were instead using a separate bit plane per DTx. Has anyone done a resource analysis (time, memory) analysis of this choice?

hgm wrote: ↑Sat Sep 17, 2022 4:01 pm On my old website ( http://home.hccnet.nl/h.g.muller/chess.html ) I published the source of a generator that uses bytes, for pawnless end-games up to 5 men. This generator was the basis for FairyGen.

Aha, I see that online at https://home.hccnet.nl/h.g.muller/EGTB.html. Thank you for the pointer! That's the only generic (not specific to a single material balance) open-source code other than Syzygy that I'm aware of. (Are there any others out there?)

hgm wrote: ↑Sat Sep 17, 2022 4:01 pm Marcel van Kervinck once posted a 'Pretty Fast' generator here for KBNK, using bits, which was indeed extremely fast.

I do see https://github.com/kervinck/pfkpk has KPK (announced at http://www.talkchess.com/forum3/viewtopic.php?t=57517), but I don't see KBNK there. Aha, there's a posting that you made with some KNBK code in this thread: http://www.talkchess.com/forum3/viewtop ... =7&t=77249.

hgm wrote: ↑Sat Sep 17, 2022 4:01 pm End-games with pawns are much simpler, as these have no symmetry. But they require more memory.

This seems incorrect to me: there are fewer symmetries, but not none. If nothing else, there's always the "invert colours, flip the side to move, and interchange ranks: 1<->8, 2<->7, 3<->6, 4<->5" symmetry. Also, unless castling is legal in a position (and the tables actually handle positions when castling is possible rather than bailing with "don't know" -- are there even any such tables that support castling built and used with open-source code?), then there would also be an "interchange files: a<->h, b<->g, c<->f, d<-e>" symmetry available.

hgm wrote: ↑Sat Sep 17, 2022 4:01 pm I also implemented 3- and 4-men generators in Javascript for a checkmating applet with fairy pieces ( https://www.chessvariants.com/page/MScheckmating-applet ). It uses bytes, which initially contain an 'escape count' (the number of non-losing moves), and replace this by the DTM code when the count hits zero.

I'm going to limit my search to regular Chess. (Actually, an endgame table generator for Xiangqi might be fine too, though Shogi play doesn't generally converge.)

syzygy · Post by **syzygy** » Sun Sep 25, 2022 3:50 pm

Dave Gomboc wrote: ↑Sat Sep 17, 2022 2:15 pmI would be particularly interested to learn about discussions that explain its generation/construction code. I did find extensive discussion of the probing code (specifically, related to Stockfish's implementation of Syzygy probing), and I also found some interesting discussion re: Re-Pair compression. However, I didn't yet come across postings that discussed or elaborated upon the part of the code that performs the retrograde analysis.

The retrograde analysis code encodes each position as an index. Let's assume a pawnless table.

The KK position is encoded as a number from 0 to 461. One of the kings is restricted to the A1-D1-D4 triangle. If it is on the diagonal, the other king is restricted to A1-H1-H8.
Each other piece position is a number from 0 to 63.
These numbers are concatenated with KK as the most significant bits.
So a KBKN position could be (KK << 12) | (B << 6) | N.

The in-memory tables are two arrays of 462*64^(n-2) bytes, one for white to move, one for black to move.

This not the most memory-efficient approach since it makes no use of symmetry of alike pieces (e.g. for KRRvK), but it simplifies generation and for 6 pieces RAM is not an issue. For 7 pieces it makes more sense to make use of these symmetries, but you'd still need to deal with KQRBvKQR etc.

So 1 byte per position.
In principle this is not sufficient for tables with high max DTZ, but this is worked around by "reducing" the table after a certain number of iterations. A reduction step basically saves the tables to disk and keeps enough information to be able to continue with further iterations. [Later the DTZ table is reconstructed but without the WDL and cursed information and (in most cases) counting moves instead of plies, so a byte is sufficient (except for some 7-piece tables, for which the reconstructed table has 2 bytes per position before it is compressed).]

A further complication is that the generation code distinguishes between positions that can be won or drawn by an immediate capture and other wins and draws, and similar for "cursed" wins and losses. This extra information is used by the compression algorithm.

So possible values are (for pawnless tables):

Code: Select all

#define ILLEGAL 0
#define BROKEN 0xff
#define UNKNOWN 0xfe
#define CHANGED 0xfd
#define CAPT_CLOSS 0xfc
#define CAPT_DRAW 0xfb
#define MATE 0xfa
#define LOSS_IN_ONE 0xf9
#define CAPT_WIN 1
#define WIN_IN_ONE 2

BROKEN means two pieces on the same square.
ILLEGAL means that the side to move can take the enemy king. (So the corresponding position in the other table is in check -> potentially mate.)
UNKNOWN means not yet known.
CHANGED is used to mark positions which are potentially lost (it is the ancestor of a position which was found to be a win in the last iteration).
CAPT_CLOSS means the position has a capture into a cursed loss.
CAPT_DRAW means the position has a capture into a draw.
MATE means mate.
LOSS_IN_ONE+k mean loss with DTZ=k+1 for k >= 0.
CAPT_WIN means the position has a capture into a win.
WIN_IN_ONE means the position is won with DTZ=1 but not CAPT_WIN. So mate in 1.
WIN_IN_ONE+k means won with DTZ=k+1 for k >= 1.

The tables are initialised by calc_broken(), calc_captures_w(), calc_captures_b() and calc_mates(), which are called from main().

The rest of the generation is done in iterate().

For each iteration, iterate() calls iter(). (Via run_iter() to do it multi-threaded and to take care of wtm and btm.)

The iterate() function "programs" the work that has to be done by iter() by setting up the tbl[], win_loss[] and loss_win[] arrays.

The idea is that for ply = n, iter() finds all wins in n and n+1 and all potential losses in n and n+1.
ply = 0, 1, 2 have special setup to deal with MATE, LOSS_IN_ONE, WIN_IN_ONE
ply = 3, ..., 99 look the same.
ply = 100, 101, 102 again need extra care.
ply >= 103 again look the same, but at some point (or points) a reduction is needed.

The iter() function loops through the table for the side to move (the other table being table_opp).

case 1:
The position is CHANGE. It is checked whether it is a proven loss with check_loss(). This also determines whether it is lost in n-1 or n.
If the position is indeed lost, then its ancestors in table_opp are marked as win (in n or n+1).
If the position is not lost, it is marked as UNKNOWN.

case 2:
The position was found to be a win in the previous iteration.
Its ancestors in table_opp are marked as potential losses (CHANGE) in n or n+1.

case 3,4:
special treatment of win in 1 and CAPT_CLOSS

check_loss(pcs, idx0, table, occ, p) does a 1-ply forward analysis on position idx0 to check whether all successor positions in table[] (with the opponent to move) are won.
The pieces (piece types) of the side to move are in pcs[]. Their positions on the board are in p[].
occ is the occupancy bitboard.

Marking of wins and potential losses is done by a 1-ply retro analysis.
There retro functions have the same structure.
See the RETRO macro in generic.c and MARK_PIVOT0(mark_changed) (white king), MARK_PIVOT1(mark_changed) (black king) and MARK(mark_changed) (remaining pieces).
For retro king moves it must be checked whether the ancestor position has KK on the diagonal, since then two mirror positions have to be marked.

Since different threads might simultaneously mark the same position, there is some atomic locking. If a position can both be set to CHANGED and to a WIN value, it should get the WIN value. Similar if a position can get two different WIN values: the one with smaller DTZ should prevail. See e.g. SET_CHANGED(): only write CHANGE if the value is still UNKNOWN. On non-ancient CPUs the lock prefixes don't really hurt as long as the cache lines are uncontended.

Initialisation of captures and illegal/in-check positions is done with retro analysis.

syzygy · Post by **syzygy** » Sun Sep 25, 2022 4:14 pm

Dave Gomboc wrote: ↑Sun Sep 25, 2022 9:29 am I am aware of a general approach called the "grandfather algorithm", though I don't recall at the moment if that is actually specifically defined in a publication somewhere. It seems to me that there may also be other reasonable general approaches that I'm unaware of, though.

I use the grandfather approach. The other general approach is out-counting. See
https://open-chess.org/viewtopic.php?t=779
Initially I also implemented out-counting, but it was slower.
Out-counting avoids the "check_loss()" 1-ply forward analysis to verify losses by keeping track of the number of moves that have not yet been proven to lose. In this case some of the byte range has to be used to count the number of moves. Move counting has to be done carefully to deal correctly with the diagonal symmetry.

hgm wrote: ↑Sat Sep 17, 2022 4:01 pm There are algorithms that use one byte per position, to store the DTx, and algorithms that use a bit per position, and crank out a new bitmap for each new DTx.
That's interesting. I am familiar with using a counter (might be larger than a byte, if using DTM or if tracking accurate DTC past 100 ply) per position, but I wasn't aware that some folks were instead using a separate bit plane per DTx. Has anyone done a resource analysis (time, memory) analysis of this choice?

Ken Thompson used such an approach to generate 5-piece tables:
https://pdos.csail.mit.edu/~rsc/thompson86endgame.pdf

I believe Thompson describes out-counting in a paper on the generation of 6-piece tables, but I cannot find this now.

hgm wrote: ↑Sat Sep 17, 2022 4:01 pm End-games with pawns are much simpler, as these have no symmetry. But they require more memory.
This seems incorrect to me: there are fewer symmetries, but not none. If nothing else, there's always the "invert colours, flip the side to move, and interchange ranks: 1<->8, 2<->7, 3<->6, 4<->5" symmetry. Also, unless castling is legal in a position (and the tables actually handle positions when castling is possible rather than bailing with "don't know" -- are there even any such tables that support castling built and used with open-source code?), then there would also be an "interchange files: a<->h, b<->g, c<->f, d<-e>" symmetry available.

The diagonal symmetry of non-pawn tables is a complication, but dealing correctly with pawns (double push, promotion, en passant) is more work to get right. Doing the generation by pawn slices to save memory (and to get DTZ right -> pawn move resets the counter) is a further complication.

syzygy · Post by **syzygy** » Sun Sep 25, 2022 4:24 pm

syzygy wrote: ↑Sun Sep 25, 2022 4:14 pmI use the grandfather approach. The other general approach is out-counting.

A third approach is pure forward analysis. This is terribly slow, in particular if there are many drawn positions, since you have no way of limiting the number of positions to check in each iteration (except that you can skip the proven wins and losses).

For some tables the generation can probably be sped up by using pure forward analysis in the first few iterations.

This is in particular the case for suicide chess where captures are forced (and the goal is to lose all your pieces including the king). Tables with lots of Qs and Rs typically only have short wins and losses and no draws, so a few forward passes would solve it completely. My generator (adapted for suicide chess) also needs only a few passes, but they take incredibly long due to all the retro'ing (which marks each ancestor position multiple times).

syzygy · Post by **syzygy** » Sun Sep 25, 2022 4:44 pm

syzygy wrote: ↑Sun Sep 25, 2022 4:14 pmI believe Thompson describes out-counting in a paper on the generation of 6-piece tables, but I cannot find this now.

Found it:
https://ia800703.us.archive.org/26/item ... ndgame.pdf
Apparently he could not get out counting to work (on his limited hardware).

Rein Halbersma · Post by **Rein Halbersma** » Sun Sep 25, 2022 5:11 pm

syzygy wrote: ↑Sun Sep 25, 2022 4:24 pm
syzygy wrote: ↑Sun Sep 25, 2022 4:14 pmI use the grandfather approach. The other general approach is out-counting.
A third approach is pure forward analysis. This is terribly slow, in particular if there are many drawn positions, since you have no way of limiting the number of positions to check in each iteration (except that you can skip the proven wins and losses).

For some tables the generation can probably be sped up by using pure forward analysis in the first few iterations.

This is in particular the case for suicide chess where captures are forced (and the goal is to lose all your pieces including the king). Tables with lots of Qs and Rs typically only have short wins and losses and no draws, so a few forward passes would solve it completely. My generator (adapted for suicide chess) also needs only a few passes, but they take incredibly long due to all the retro'ing (which marks each ancestor position multiple times).

Ralph Gasser has a formula for this (see section 3.3.2 of one of his Nine Men's Morris papers).

The forward approach is better than the backward approach if the number of resolved wins times the average number of predecessors is larger than the remaining unidentified positions:

Code: Select all

#Win * #AvgPred > #Unidentified

So indeed, if a lot of wins get identified in the early iterations, the forward approach dominates the backward approach. In games like checkers/draughts, the forced captures and the few plies after that are typically done with the forward approach.

syzygy · Post by **syzygy** » Mon Sep 26, 2022 12:11 am

Rein Halbersma wrote: ↑Sun Sep 25, 2022 5:11 pmRalph Gasser has a formula for this (see section 3.3.2 of one of his Nine Men's Morris papers).

The forward approach is better than the backward approach if the number of resolved wins times the average number of predecessors is larger than the remaining unidentified positions:
Code: Select all
#Win * #AvgPred > #Unidentified
So indeed, if a lot of wins get identified in the early iterations, the forward approach dominates the backward approach. In games like checkers/draughts, the forced captures and the few plies after that are typically done with the forward approach.

It seems with his approach each potentially losing position (i.e. each predecessor of a newly found winning position) is immediately checked for being a loss, which means some/many positions may be checked multiple times in the course of a single pass.

My approach only marks each potential loss as such and checks them only in the next pass. This means the number of forward checks will at most be #Unidentified (and will usually be lower), even if #Win * #AvgPred > #Unidentified.

But marking the predecessor positions has a cost too, in particular if almost all Unidentified/UNKNOWN positions end up being marked anyway.

For regular chess there is probably not much to gain here (but perhaps for KQQvKQQ). (It could be added relatively easily to my generator. Instead of doing a check_loss() on each CHANGED position, do it on each UNKNOWN position, and skip the marking of potential losses in the preceding iteration.)

endgame table generation

endgame table generation

Re: endgame table generation

Re: endgame table generation

Re: endgame table generation

Re: endgame table generation

Re: endgame table generation

Re: endgame table generation

Re: endgame table generation

Re: endgame table generation

Re: endgame table generation