How is work on 8-man tablebases progressing?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

syzygy
Posts: 5646
Joined: Tue Feb 28, 2012 11:56 pm

Re: How is work on 8-man tablebases progressing?

Post by syzygy »

Koistinen wrote: Sat Sep 28, 2024 6:30 pmThen storage would naïvely be 10x100=1000 times larger, so that would be the most fun reducing.
I think the format should ideally store the DTM50 values for "50-move counter = 0" separately from the information for positive 50-move counter (either in a separate file or in a separate part of the table). If an engine probes the table during the search of a root position which itself has too many pieces to be in the TBs, the engine will only probe the tables immediately after a capture, so with 50-move counter = 0. The extra information for 50-move counter > 0 is only needed when probing a TB root position, so it is OK if access to this information is relatively slow (e.g. because it is compressed in relatively large blocks that need to be decompressed as a whole).

So for move-counter = 0 a regular format can be used such as the Syzygy DTZ format (I have already used this format for DTM tables).
The signs of the values do not need to be stored since WDL50 is available and should be accessed first. For the same reason draws can be encoded with arbitrary values. A few more tricks are possible. The table should probably be 2-sided (or things may start to get tricky with the 50-move rule and in any event probing during search would be slow).

The next question is how to encode the extra information for 50-move counter > 0.
The DTM50 value for 50-move counter = 0 will be valid for 50-move counter = n up to some value of n. Then it will jump to something higher or to infinity (meaning cursed win/loss), and this may repeat a few times. So these jumps have somehow to be recorded, but the number of jumps may vary from position to position.
It might be acceptable to store information only for one side-to-move.

Not all the "jumps" may have to be stored. For example, if the whole table contains no cursed positions, then the extra information is only relevant if the winning side decides to throw away many moves instead of simply playing the moves that lead to the fastest mate. If the engine that is accessing the tables is winning, there is no reason for that engine to not use the tables and play the fastest mate. If the engine is losing but the opponent is throwing away moves, perhaps the engine should focus on maximizing distance-to-zeroing move rather than distance to mate. But how to decide which of these jumps may be relevant with optimal play by the winning side and which not... needs some thought.

Another question is if cursed wins in the DTM50 (with move-counter = 0) table should be stored as draws (or rather as arbitrary values for best compression, since the WDL50 table will tell us that it is a draw) or whether a useful value could be stored. I guess one could just continue iterating over the DTM50 table during generation to complete it (as Uri suggested) and not store any info in the "extra information" part for these positions. Is it worth it to have some kind of fake DTM values if the DTZ50+ tables already give enough information to steer the game to a win (if the 50-move counter is ignored, or the losing sides plays inaccurately)?
Koistinen
Posts: 29
Joined: Sun May 23, 2021 10:05 pm
Location: Stockholm, Sweden
Full name: Urban Koistinen

Re: How is work on 8-man tablebases progressing?

Post by Koistinen »

Has DTM50 been computed even for 5-man?
I think it could lead to finding positions interesting for problemists because it has the paradox that the fewer moves you have the longer the mate can get.
Potentially but unlikely a position could simultaneously be mate in 50..150 or more likely something like that but less spectacular.
BTW, I made a mistake earlier about naïve storage required, and it would be more like 100*9/7 of that of DTZ50. 9 because the distance to mate is at most 50*6 and 7 because DTZ50 requires 7 bits for storing result.
syzygy
Posts: 5646
Joined: Tue Feb 28, 2012 11:56 pm

Re: How is work on 8-man tablebases progressing?

Post by syzygy »

Koistinen wrote: Sun Sep 29, 2024 11:36 am Has DTM50 been computed even for 5-man?
At least to some extent:
http://galen.metapath.org/egtb50/
I have so far been able to build 3-man, 4-man, 5-man, and some 6-man tables, as well as the one 7-man table. General 6-man tables remain outside my reach, due to the memory and, to a lesser extent, the time requirements. Software and hardware improvements may make them more feasible in the future, although fully general 6-man tables would require an "unreasonable" amount of memory with the current software.
Some threads on this (already from 2013):
https://kirill-kryukov.com/chess/discus ... php?t=7204
https://kirill-kryukov.com/chess/discus ... php?t=7314

These tables seem to be private and might still not be in a format that is suitable for distribution and for probing with good performance.

32 GB of RAM should be plenty to generate the DTM50 6-piece tables. With 32 GB you can keep the tables for the current and the previous iteration fully in memory, so you can compare them and see which DTM50 values have decreased (because of more plies being available under the 50-move rule: first iteration generates DTM50 for 50-move counter=100, i.e. mate positions, each next iteration for 50-move counter=99,98,etc. I guess the generator cannot do 2 plies per iteration.). Those positions for which DTM50 value decreased (corresponding to a "jump" as I mentioned earlier) have to be stored to a per-ply file. These per-ply files then have to be merged later into a file format that may still have to be invented.

And with 64 GB you can simplify generation a bit by having 2 bytes per position.
BTW, I made a mistake earlier about naïve storage required, and it would be more like 100*9/7 of that of DTZ50. 9 because the distance to mate is at most 50*6 and 7 because DTZ50 requires 7 bits for storing result.
Most DTZ50 tables, especially the larger ones with pawns, tend to have very low DTZ50 values and therefore compress very well. For DTM50 this will be very different. The size of compressed DTM50, even without the info for 50-move counter > 0, will be several times the size of DTZ50. And storing them 2-sided (basically required if you want to use them during search) will again multiply the size by a factor of 2.5-3. But if it's a terabyte for 6-piece tables (i.e. the DTM50 table for 50-move counter=0)... it is 2024 now.
User avatar
phhnguyen
Posts: 1484
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: How is work on 8-man tablebases progressing?

Post by phhnguyen »

Koistinen wrote: Fri Sep 27, 2024 10:57 pm
phhnguyen wrote: Fri Sep 27, 2024 3:13 pm What do you expect to be improvements for 7-man?
Possible improvements I see are:
  • Increased usability. Make it easy to implement use. This would help experimentation.
  • More types, such as DTM, DTZ50, DTMZ50 available for all.
  • Different file formats to support different usage goals, such as requiring less space.
I agree that what you mentioned above is a good improvement and may improve usage. However, IMHO, those improvements are mostly on the programming side and they are not the keys to increasing the popularity, which is the current main dilemma of 7-man EGTB (not many people use it). IMHO, there are two keys to improving popularity:
1) improve engines’ strength
2) much smaller in size (to be easier to download)

For 1) I don’t think any other EGTB can beat Syzygy with the metric DTZ50. They can’t beat it either on probing speed since Syzygy is the smallest and uses only small parts (WDL files) for probing when searching. For 2) Syzygy is 8 times as small as Lomonosov's 7-man one. Look like no new one can beat it on size.

In other words, Syzygy 7-man EGTB is currently the best and the most optimised one. There is not much space for newcomers to beat it. Anyone who wants to improve the 7-man EGTB should consider carefully between the work and the gain.

We may expect Ronald de Man to improve Syzygy on the next versions, say, make it smaller. But I doubt he could reduce significantly it, say, cut 50% of the size, a good enough level to boost the popularity.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
syzygy
Posts: 5646
Joined: Tue Feb 28, 2012 11:56 pm

Re: How is work on 8-man tablebases progressing?

Post by syzygy »

syzygy wrote: Sun Sep 29, 2024 10:51 pm32 GB of RAM should be plenty to generate the DTM50 6-piece tables. With 32 GB you can keep the tables for the current and the previous iteration fully in memory, so you can compare them and see which DTM50 values have decreased (because of more plies being available under the 50-move rule: first iteration generates DTM50 for 50-move counter=100, i.e. mate positions, each next iteration for 50-move counter=99,98,etc. I guess the generator cannot do 2 plies per iteration.). Those positions for which DTM50 value decreased (corresponding to a "jump" as I mentioned earlier) have to be stored to a per-ply file. These per-ply files then have to be merged later into a file format that may still have to be invented.

And with 64 GB you can simplify generation a bit by having 2 bytes per position.
I now think 64 GB is more realistic for generating 6-piece pawnless DTM50 tables, with 2 bytes per positions and two copies of the ful 2-sided table (so 2 * 2 * 462 * 64^4 * 2 bytes = 57.75 GB).

Perhaps only a 1-sided second copy is needed, which would bring the requirement to 43.3 GB (not sure).

I thought 1 byte per position should be enough by using the same range-reduction trick I use for DTZ (and DTM), but it would get too messy to not be able to hold the whole range of possible DTM values in memory. My DTM generator does use the trick, but its approach to generating the table (basically calculating all DTM=n positions in iteration n) cannot be used for DTM50.

Anyway, 64 GB is not a shocking amount.

Each iteration n (from 1 to 100) will output a table to disk with the DTM values that have changed compared with the previous iteration. Some positions will have a DTM value for the first time (if their DTZ = n), other positions will have an improved DTM value. These delta tables can probably be stored using a regular compression method and most will end up being quite small.

The difficult part is then to combine those tables into something usable. Currently I am thinking of identifying the "DTM jumps" for each position, and creating a number of "jump" tables 1,2,3,4,... where table i encodes the i-th jump for each position. It seems this can be done by reading out each of the 100 delta tables position by position and storing the jumps (viewed from the iteration 100 table and looking back) position by position to the jump tables. These jump tables then seem to be similar enough to regular tablebases that we can compress them with the existing tablebase compression method. Perhaps they should all be in a single file for convenience.

It seems to me some of the "later" DTM jumps (occurring for very high 50-move counter values, so at early iterations) will not be relevant for optimal play by the winning side and might not have to be stored. But it might be tricky to identify those. For example, if a table has max-DTZ = 60 ply, then it seems extremely unlikely that optimal DTM50 play by the winning side could ever end up in a position in the same table with the 50-move counter at 80, where at the same time that position has a delayed DTM50 mate for 50-move counter at 80. But I don't think this can be mathematically ruled out without inspecting the whole DTM50 tree.
(Looking at the max DTM50 values could give some bounds, but I'd like to have some more precise criterion if it existed.)

The probing code would first probe the main DTM50 table (generated at iteration 100, valid for 50-move counter = 0), then look up the first "DTM jump" which encodes until which value of the 50-move counter the main DTM50 table is valid and by how much it increases after that. Then if necessary it looks up the second jump, etc.
syzygy
Posts: 5646
Joined: Tue Feb 28, 2012 11:56 pm

Re: How is work on 8-man tablebases progressing?

Post by syzygy »

syzygy wrote: Mon Sep 30, 2024 11:32 pmIt seems to me some of the "later" DTM jumps (occurring for very high 50-move counter values, so at early iterations) will not be relevant for optimal play by the winning side and might not have to be stored. But it might be tricky to identify those. For example, if a table has max-DTZ = 60 ply, then it seems extremely unlikely that optimal DTM50 play by the winning side could ever end up in a position in the same table with the 50-move counter at 80, where at the same time that position has a delayed DTM50 mate for 50-move counter at 80. But I don't think this can be mathematically ruled out without inspecting the whole DTM50 tree.
(Looking at the max DTM50 values could give some bounds, but I'd like to have some more precise criterion if it existed.)
The trick seems to be to determine the "max DTZ" within the DTM50 table. The longest sequence of DTM50-optimal moves that stays within the table, i.e. without zeroing move (= the last iteration at which a new/better mate was found). So this is not the "real" max DTZ value but generally a somewhat higher value.

If this is 90 plies, then any DTM50-optimal mate sequence, at least if both sides play optimally, will never reach a 50-move counter above 90.

I think this should also be true if the losing side play suboptimally, but I might be missing something. (But if we finished DTM50 generation in 90 iterations, I don't see how there could be a DTM50 mate sequence with optimal play by the winning side and any play by the losing side that could stay in the table for more than 90 ply.)

If we can put such an upper bound on the longest sequence with optimal DTM50 play by the winning side, then we know that a game that enters the table will never reach a move-counter higher than this "max DTZ", e.g. 90 plies, if the winning side plays optimally. Then we know that, for this table, there is no need to store information about DTM50 for positions with move-counter > 90.

So this would eliminate the need to store information about silly winning queen sacs that are the "optimal DTM50 move" in a position with the 50-move counter at 98 or 99. A table which has such imbalanced material that the winning side can afford saccing its queen will have a "max DTZ" of maybe 20 or at least not anything near 98 or 99.

If my logic is correct, then a "max DTZ" of at most 50 ply would mean that there is never a need to delay a mate (as long as the game stays within the table): the DTM50 table for "50-move counter=0" is all you need. In practice this will probably also be the case for higher "max DTZ" values. I expectr most "delayed mates" will be at relatively high values of the 50-move counter, say above 70, so if the 50-move counter always stays below 70, then the delayed mates cannot be reached, which is good.

So it seems possible to "safely" decide to not store the full info for all the possible values of the 50-move counter. Unless we really want perfect information even for positions only reachable if the winning side screws up heavily (but the losing side could then just use DTZ tables to try to reach a draw).
Koistinen
Posts: 29
Joined: Sun May 23, 2021 10:05 pm
Location: Stockholm, Sweden
Full name: Urban Koistinen

Re: How is work on 8-man tablebases progressing?

Post by Koistinen »

syzygy wrote: Sun Sep 29, 2024 10:51 pm
Koistinen wrote: Sun Sep 29, 2024 11:36 am Has DTM50 been computed even for 5-man?
At least to some extent:
http://galen.metapath.org/egtb50/
I have so far been able to build 3-man, 4-man, 5-man, and some 6-man tables, as well as the one 7-man table. General 6-man tables remain outside my reach, due to the memory and, to a lesser extent, the time requirements. Software and hardware improvements may make them more feasible in the future, although fully general 6-man tables would require an "unreasonable" amount of memory with the current software.
Most DTZ50 tables, especially the larger ones with pawns, tend to have very low DTZ50 values and therefore compress very well. For DTM50 this will be very different. The size of compressed DTM50, even without the info for 50-move counter > 0, will be several times the size of DTZ50. And storing them 2-sided (basically required if you want to use them during search) will again multiply the size by a factor of 2.5-3. But if it's a terabyte for 6-piece tables (i.e. the DTM50 table for 50-move counter=0)... it is 2024 now.
Thanks, I had missed/forgotten that those DTM50 tables had been computed. No source and no tables available though, so it might be nice if someone made it available.
My storage comparison was for naïve storage. You are right that compressed it is likely to be different.

BTW I remember seeing some request for 8-man Zyzygy by someone claiming to have the storage needed. (Maybe not the RAM.)
Maybe if something usable was written, the next Oak Ridge supercomputer would run it and make it available as a promotion.
If storage of the result could be done at about 1 MUSD it might be cheap enough for that purpose.
Either way, the code would be usable for smaller tables as well, lowering the threshold for experimentation.