DTM50

syzygy · Post by **syzygy** » Tue May 22, 2018 9:41 pm

To quote from here:

DTM50 conceptually stores a distance-to-mate value for each pair (position, rule50_counter). So that is 100x as much raw data as for DTM. The sequence of values for a particular position is far from random, but it is not immediately obvious how to find an efficient encoding that fits into a good compression scheme. And generating these values efficiently is another problem.

The generation process is actually not that difficult.

Let's write DTM(p, n) for the distance to mate of position p if there are n ply left on the 50-move counter.

So DTM(p, 1) = k > 0 if position p has a mate in 1 (in which case k=1) or if position p has a capture or pawn move leading to a losing position for the other side to move.

DTM(p, 1) = k < 0 if the player to move in position p cannot avoid playing a losing pawn move or capture.

This means that a table of all DTM(p, 1) values can be generated by 1 ply of retrograde analysis from all mate positions and all positions after a capture or pawn move.

Doing another ply of retrograde analysis leads to a table of all DTM(p, 2) values.

Yet another ply will give a table of all DTM(p, 3) values. In this step, some of the DTM(p, 1) winning values can be overwritten by a smaller winning value. Actually, this can already happen during the generation of the DTM(p, 2) table: instead of playing a winning pawn move (1 ply to a 50-move reset), in rare position there will be a shorter path to mate by first sacking the queen (2 plies to a 50-move rest). Or instead of a capture (1 ply to a reset), the shortest mate might require a check that forces the losing side to move a pawn (2 plies to a reset).

Repeating this until DTM(p, 100), we end up with the distance to mate for each position while respecting the 50-move rule, assuming the 50-move counter has just been reset. This is a very useful table, because entering the table is possible only through a capture, and that capture resets the 50-move counter. The only problem of the table is that it does not contain sufficient information to find the complete DTM50-optimal mating line. For that we also need the DTM(p, c) tables for 1 <= c <= 99.

So we need to save DTM(p, 100) and we need to save information describing exactly the values that were overwritten with smaller values in the process of generating DTM(p, 1), DTM(p, 2), DTM(p, 3), ..., DTM(p, 100).

The DTM(p, 100) tables are the tables that will be probed during the search (because they will be probed immediately after a capture) and they are the tables that are used by the generator to seed the capture positions. Their size will be comparable to normal DTM tables and they can be in the same format as the existing Syzygy tables.

The information encoding the values being overwritting during the generation of DTM(p, 1), ..., DTM(p, 100) would only be probed at the root. So access time can be longer, which means we can allow its decompression to be more expensive, and it can also be stored on a slower disk if necessary. But I don't know yet what kind of format would be best.

The generation process I describe above will need 2 bytes per position. (My (unreleased) DTM generator only needs 1 byte per position but it uses an approach that does not keep track of the number of plies since the last zeroing move.) For 6-piece tables using 2 bytes is not a problem. For 7-piece tables it is a bit more problematic but not necessarily fatal (a more efficient indexing scheme during generation will reduce requirements again).

noobpwnftw · Post by **noobpwnftw** » Tue May 22, 2018 10:43 pm

How about we compress them separately? In one table we store the p, the other 1 to 99. Then we double decompression time but reduce the number of combinations of (p, x) by a magnitude.

Considering that, then probably the 1 to 99 can be an extension like the Syzygy WDL tables, instead of having 5 states as of now, having 100 + 3 states storing the move counter for both sides.

syzygy · Post by **syzygy** » Tue May 22, 2018 11:20 pm

Instead of DTM(p, 100) it is probably better to write DTM(100).

So DTM(100) is like a normal table with for each position the distance to mate assuming the 50-move counter is 0.

DTM(99) is similar, but for the 50-move counter at 1 ply.

So we have 100 tables DTM(100), DTM(99), ..., DTM(1).

The DTM(100) table is the most important one, comparable to WDL in the current WDL/DTZ scheme. It has one value per position, running from 1 to max DTM. The compression scheme can handle that (like it handles it now for DTZ).

Each of DTM(n) for 1 <= n <= 99 also has one value per position.
We could store them all like KRNvKBN.dtm50.n, with n running from 1 to 100. But that would be terribly inefficient, because the difference between DTM(n) and DTM(n-1) will be relatively small.

If you fix a particular position, the value of DTM(n) for that position as n goes from 100 to 1 will monotonically increase starting from its DTM(100) value until it jumps to infinity (meaning drawn by the 50-move rule). We should somehow encode by how much the value jumps and where.

It would be possible to store the delta tables DTM(n+1) - DTM(n) for n = 1, 2, ..., 99. Most of those values will be zero, so the tables will compress quite well. It would be quite a lot of work to find out the value of DTM(n) for n small (you'd have to start with DTM(100) and check each intermediate delta table), but that might still be OK.

It would be possible to encode a number of successive delta tables as one table. For example, if the maximum jump from DTM(n+1) to DTM(n) is 5 moves, three delta tables could be combined into one table taking values from 0..215 (=6^3-1).

But hopefully there is a better way to do this.

petero2 · Post by **petero2** » Wed May 23, 2018 9:23 pm

Here is my suggestion:

Store the DTM(100) table as "usual".

Store all DTM(x) tables for x<100 in a single file, in an "interleaved" format. That is, the value for position i in the DTM(x) table is stored at position:

i * 99 + (99 - x)

in the file. Instead of storing the actual value, you store the delta compared to the "x+1" value, as you already mentioned. A special "inf" delta represents the transition to a drawn position.

Each chess position is therefore represented by a sequence of 99 consecutive values in the file. As you suggest, most values will be zero so a standard Huffman or range encoding would likely do a pretty good job compressing the data. However, you can also take advantage of the following properties of the sequence for a single position:

If the value for x=100 is draw, all values in the sequence are 0, so no bits are required to encode them.
If the value for x=100 is mate in v plies, the first 100-v values in the sequence are 0, so no bits are required to encode them.
If the value for some x=a<100 is "inf", all values for x<a is 0, so no bits are required to encode them.

This scheme has the advantage that decoding any position for any x will require at most two seeks and a small amount of sequential reads.

Sesse · Post by **Sesse** » Wed May 23, 2018 10:45 pm

If you can afford expensive decompression, I would suppose a variant of arithmetic coding (or rANS) would be the way to go. If you think of each position as a series of 100 DTM values (one for each PC), it will be something like

draw draw draw draw 18 18 18 18 14 14 14 14 2 2 2 2 2 … 2

Ie., if PC is very low, it's a draw (some sort of special symbol—let's represent it by 100 for the time being, but we can of course be more clever), if the PC is a bit higher, it's DTM 18, etc.. Delta coding this gives

100 0 0 0 82 0 0 0 4 0 0 0 12 0 0 0 0 … 0

The probability distribution will generally depend on two things: What is the value of PC, and was the last delta value a zero or not? So that gives you 200 different probability distributions, which you can just store nicely in the header of the file.

The other alternative is a run-length encoding ala JPEG, which again gives you 200 different probability distributions (100 for the delta, and 100 for the number of equal values)—although you could model them as one joint distribution if you really wanted to, I'm not sure it's workable at low precision (your alphabet gets really large).

syzygy · Post by **syzygy** » Wed May 23, 2018 11:08 pm

I agree that the DTM(1-99) information corresponding to one position should be stored together.

Instead of 99 values per position, it is probably possible to encode the increasing sequence for a position (minus the DTM(100) value) somewhat efficiently in a variable number of bits and devise an indexing scheme that allows quick access to the start of that information for a particular position.

Positions that are drawn according to WDL50 or illegal will not be looked up, so they can have an arbitrary encoding (I guess they will need some form of run-length encoding).

The current scheme stores all information in fixed-size blocks and has a 2-layer indexing scheme to find the block containing a particular position and the offset into that block (i.e. the number of positions that need to be skipped when decompressing the block). That should work here as well.

syzygy · Post by **syzygy** » Wed May 23, 2018 11:39 pm

Sesse wrote: ↑Wed May 23, 2018 10:45 pm If you can afford expensive decompression, I would suppose a variant of arithmetic coding (or rANS) would be the way to go. If you think of each position as a series of 100 DTM values (one for each PC), it will be something like

draw draw draw draw 18 18 18 18 14 14 14 14 2 2 2 2 2 … 2

Ie., if PC is very low, it's a draw (some sort of special symbol—let's represent it by 100 for the time being, but we can of course be more clever), if the PC is a bit higher, it's DTM 18, etc.. Delta coding this gives

100 0 0 0 82 0 0 0 4 0 0 0 12 0 0 0 0 … 0

The probability distribution will generally depend on two things: What is the value of PC, and was the last delta value a zero or not? So that gives you 200 different probability distributions, which you can just store nicely in the header of the file.

The other alternative is a run-length encoding ala JPEG, which again gives you 200 different probability distributions (100 for the delta, and 100 for the number of equal values)—although you could model them as one joint distribution if you really wanted to, I'm not sure it's workable at low precision (your alphabet gets really large).

This looks nice.

One could also consider encoding the number of ply until the next jump and the size of the jump together.

There will also need to be a way to encode strings of illegal/drawn values. (Winning positions for which the shortest mate starts with a capture and losing positions for which the longest mate starts with a capture can also be skipped in the DTM(1-99) table.)

Or we pick an encoding scheme that just encodes the information in a compact way but respecting byte boundaries, and we use a general compression algorithm to compress that further. That would need a relatively large block size to be efficient or some kind of offline dictionary approach. This might work better if there is correlation between the "values" of adjacent positions, but maybe there isn't much of that (apart from the strings of illegal/drawn values).

Sesse · Post by **Sesse** » Thu May 24, 2018 12:46 am

syzygy wrote: ↑Wed May 23, 2018 11:39 pm One could also consider encoding the number of ply until the next jump and the size of the jump together.

Yes, that's what I meant by run-length coding.

Or we pick an encoding scheme that just encodes the information in a compact way but respecting byte boundaries, and we use a general compression algorithm to compress that further. That would need a relatively large block size to be efficient or some kind of offline dictionary approach. This might work better if there is correlation between the "values" of adjacent positions, but maybe there isn't much of that (apart from the strings of illegal/drawn values).

Very few general compression algorithms support offline dictionaries in practice (and it's rarely obvious how to construct them). But yes, if your block size is large enough, you can use a general compressor to good success.

A trick that is often useful (especially with LZ-type compressors) is transposing data so that data that looks similar lives together. E.g., instead of having DTM, move, DTM, move, DTM, move…, you have DTM, DTM, DTM, …, move, move, move, …. It feels really obvious in retrospect, but especially together with delta-coding, it can make for amazing results.

Andrew · Post by **Andrew** » Thu May 02, 2019 11:09 am

HI just wondering if there has been any progress in Syzygy DTM generation since this thread started? Would be very interesting to
know if say 3 or 4 piece could be generated and have a size significantly less than what we get for the Nalimov files.

For normal engine-engine games etc these won't be of much use.

However there are a lot of positions from studies etc where having DTM might be very useful.

For example look at the various arves files https://rybkaforum.net/cgi-bin/rybkafor ... ?tid=32814

A lot of these positions are easy to get a TB mate score, but a mate distance is much harder.

I'd love to be able to use Stockfish, McCain or another engine to look at these positions using DTM, I believe DTM could provide some
great improvements! (Have seen this a few times with Fritz 14 which uses Nalimov)

Andrew

PS just found this site which looks very interesting but more details would be useful. Seems they have worked
out some better compression ideas?

https://chess-db.com/public/research/en ... bases.html

syzygy · Post by **syzygy** » Thu May 02, 2019 9:50 pm

Andrew wrote: ↑Thu May 02, 2019 11:09 am HI just wondering if there has been any progress in Syzygy DTM generation since this thread started? Would be very interesting to
know if say 3 or 4 piece could be generated and have a size significantly less than what we get for the Nalimov files.

I have not worked on DTM50, which this thread is about.

I have implemented DTM but not released it. There are still some potential improvements I want to try. And there are a few ways to do "half-sided" tables which would save a lot of storage space but at the same increase access time, and I haven't made up my mind about those.

PS just found this site which looks very interesting but more details would be useful. Seems they have worked
out some better compression ideas?

https://chess-db.com/public/research/en ... bases.html

Not better ideas, they just trade memory space for access time. I can compress a 32-man TB to 0 bits by letting the probing code do a very deep search. Access time will be very bad.

DTM50

DTM50

Re: DTM50

Re: DTM50

Re: DTM50

Re: DTM50

Re: DTM50

Re: DTM50

Re: DTM50

Re: DTM50

Re: DTM50