Can EGTB storage requirements be reduced using this scheme?

mar · Post by **mar** » Tue Dec 08, 2020 11:39 am

hgm wrote: ↑Tue Dec 08, 2020 11:33 am Similarly, in drawn KQKP you can keep checking from I don't know how many different squares before you have to repeat or allow the Pawn to promote.

exactly, this is why I've recently added a special recognizer that scales KQkp down a lot when the pawn is about to promote - search will take care of the rest, no more fantasized high scores.

hgm · Post by **hgm** » Tue Dec 08, 2020 3:43 pm

That is still a rather course solution, which could lead your engine to happily convert to a lost KQKP. A more accurate version would only downscale with a Rook or Bishop Pawn, when the Queen is not blocking it. That would still misjudge a number of positions that is won because the attacking King is already close by. But for those the winning conversion is nearby, so that doesn't hurt too much. The important thing is that you don't misevaluate conditions that could be maintained for a very long time.

mmt · Post by **mmt** » Tue Dec 08, 2020 4:19 pm

phhnguyen wrote: ↑Tue Dec 08, 2020 10:12 am BTW, Fishtest has revealed that the Syzygy 6-men (no data for 7-men) can help SF to gain about 13 Elo, a good amount but not really vital IMO. You may remove the EGTB then gain back about 5-10 Elo by improving the endgame code for SF.

It's the difference between perfect evaluation/play and imperfect evaluation/play. To me, it's completely vital and I don't really care much about improving Stockfish on the other hand. I want to know, not guess, if some ending is won/lost/drawn. For a program's eval, it can also make a difference whether it makes sense to search any deeper or not.

phhnguyen wrote: ↑Tue Dec 08, 2020 10:12 am For current status (using Syzygy), you may save a lot of space too by simply deleting many "redundant" endgames, say KQvK, KRvK, KQQvK, KQQvKB... SF can win those endgames easily without help from Syzygy. I guess that your shallow search can work successfully mostly for those endgames

For the rest of the endgames, SF will badly struggle so your shallow search should struggle too. If SF can't find the result and/or how to make progress, there is no magic for your shallow search either.

I don't understand. My scheme would make SF play perfectly once it reaches the 7-piece endgame. No "struggle" involved. With the caveat that right now I'm considering the space savings for WDL, not thinking about the DTC endings. With more pieces on board, the question is what is the tradeoff between perfect accuracy and additional time spent per tablebase probe.

phhnguyen wrote: ↑Tue Dec 08, 2020 10:12 amFurthermore, for a given endgame, it is almost useless if your shallow search can solve not 100% but a smaller percentage of all positions since EGTBs have a very efficient way to store data, thus any way to store "only 30%" Egtb data won't work or the space-saving is not worth at all. You must invert a lot of labor to find a new efficient way to store "30%" data but that is a big challenge IMO.

No, my scheme solves 100% of positions perfectly. That's why I calculated the entropy data - to show that it is a promising way to save a lot of space.

phhnguyen wrote: ↑Tue Dec 08, 2020 10:12 amNote that the size of an EGTB is the key to success, BUT the efficiency and a few other factors may contribute too. E.g., Scorpio's EGTB is using a lot of code/search thus its' data is significantly smaller than Syzygy (1/3) but it is not successful as Syzygy in terms of being popular.

Aren't Scropio EGTBs only up to 5 pieces? The space savings don't matter much before 7 pieces.

mmt · Post by **mmt** » Tue Dec 08, 2020 4:28 pm

hgm wrote: ↑Tue Dec 08, 2020 11:33 am I also doubt the usefulness of shallow searches in difficult end-games. Take KPK as an example. It takes a very deep search to recognize most of the drawn positions (i.e. those where you don't quickly lose the Pawn), without an EGT. Initially you can always make progress pushing the Pawn, until the defending King makes his final stand on the last rank, and you have no choice but to repeat positions or stalemate him. And that is a very favorable example: you have not much lattitute for fooling around without repeating to delay pushing the Pawn and approaching the resolution, as you have only a King, and it must keep protecting the Pawn.

Similarly, in drawn KQKP you can keep checking from I don't know how many different squares before you have to repeat or allow the Pawn to promote.

Shallow searches are only any good when these also probe the EGT. Or have hard-coded rules for some critical positions that can be quickly reached. (E.g. the opposition rule in KPK for drawing, and the 'rule of squares' for wins.)

I calculated the accuracy percentages for the hardest (largest) 7-piece EGTBs in the previous post. SF shallow search is accurate enough to work together with EGTBs to make it work while saving a lot of space.

mmt · Post by **mmt** » Wed Dec 09, 2020 6:57 am

mmt wrote: ↑Tue Dec 08, 2020 5:17 am OK, I ran the sampling code on random positions from 62 Syzygy WDL EGTBs that are over 20 GB each using 1.0 as the eval cutoff.
Code: Select all
Number of nodes for SF    Accuracy percentage         Entropy bits per position
50                                87%                         0.60
100                               88%                         0.45
200                               89%                         0.36
400                               90%                         0.30
800                               90%                         0.30
1600                              91%                         0.27
3200                              92%                         0.20
6400                              93%                         0.16
12800                             94%                         0.11
The baseline number of entropy bits per position is 0.85. So 200 nodes could result in space savings of up to 58%, 1600 in 68%, 12800 in 87%. If we add 10% overhead for this B+ tree-like indexing, it's still a lot of savings. While this scheme would be slower, the files would be smaller so they could be easier to cache in RAM and this could counteract the performance penalty somewhat.

Ignore the entropy bits in the above table - I made a mistake and those are only for KRPPvKRN, not all EGTBs. The iterative deepening idea I outlined in the previous post with 50-800 nodes gives the following:
Average number of nodes needed for SF: 162
Accuracy: 94%
Ratio of entropy bits with SF vs original baseline: 0.29

So this scheme reduces entropy by around 71% (before compression and ignoring cursed wins).

mmt · Post by **mmt** » Wed Dec 09, 2020 7:43 am

mmt wrote: ↑Wed Dec 09, 2020 6:57 am Average number of nodes needed for SF: 162
Accuracy: 94%
Ratio of entropy bits with SF vs original baseline: 0.29

After some more manual optimization of the number of nodes to be calculated and the cutoffs for considering a score to be winning or drawing, the average number of nodes needed is just 90 with the entropy ratio of 0.30. The accuracy stays at 94%.

hgm · Post by **hgm** » Wed Dec 09, 2020 12:00 pm

Basically what you propose is to extend the DTx encoding with a code for 'unknown'. When the EGT probe hits such a position it cannot count it as an EGT hit, so it has to search on. (Unless it has some hard-coded recognizer that immediately evaluates it.) This is like a partial EGT, where you don't know in advance whether a position is in the EGT, but the probing itself has to reveal that.

In principle this will work. If you would only 'leave out' positions that resolve quickly, it would not require very much extra search compared to a complete EGT. The KPK end-game evaluation in KingSlayer is also works like that. (But through pattern recognition rather than table probing.) It recognizes the obvious draws (defending King in path of Pawn or next to it, attacking King behind Pawn or at same rank, or one rank in front of it with opposition). For positions with the defending King elsewhere it just searches on; this will either quickly reach a recognized drawn position, or it will be lost, and the normal evaluation would return a winning score.

In the EGT case the hope is that the grouping of many positions under the code 'unknown' makes the EGT better compressible. If the overwhelmingly large fraction of the positions is labelled 'unknown', this obviously will be the case, e.g. through the proposed run-length encoding. But that is just an implementation detail of the compression technique that is used.

The main problem seems to be recognizing which positions can be left out. Having to do a shallow search on each of the positions in the EGT could take thousand times as long as generating the EGT in the first place. And it also is a bit unclear what the criterion should be. Requiring that the search should be able to find the correct score without any EGT probing might be too conservative; you would leave many positions that you might have been able to take out because the remaining positions would have sped up the search enough compared to a probeless search.

A simplistic version of this idea would be to only include positions with white to move and DTx a multiple of 3 (or draws). You would then never have to search more than 6 ply before hitting a tabulated position.

mmt · Post by **mmt** » Wed Dec 09, 2020 5:09 pm

hgm wrote: ↑Wed Dec 09, 2020 12:00 pm The main problem seems to be recognizing which positions can be left out. Having to do a shallow search on each of the positions in the EGT could take thousand times as long as generating the EGT in the first place.

Somebody familiar with SF should be able to find out how much time it takes for SF to evaluate a 7 piece position for 90 nodes pretty easily. But you're right, it would probably take much longer to generate these EGTBs, so it might only make sense to do it for the few most commonly used ones.

hgm wrote: ↑Wed Dec 09, 2020 12:00 pm And it also is a bit unclear what the criterion should be. Requiring that the search should be able to find the correct score without any EGT probing might be too conservative; you would leave many positions that you might have been able to take out because the remaining positions would have sped up the search enough compared to a probeless search.

Whether it makes sense to probe the tablebase in the shallow search would probably depend on the endgame - the ones where we can remove fewer positions would benefit more.

syzygy · Post by **syzygy** » Fri Dec 11, 2020 1:19 am

mmt wrote: ↑Tue Dec 08, 2020 4:19 pm I don't understand. My scheme would make SF play perfectly once it reaches the 7-piece endgame. No "struggle" involved.

What he means is that your call to SF to predict the correct TB value will struggle to return something useful.

With the caveat that right now I'm considering the space savings for WDL, not thinking about the DTC endings.

Before you can think about space savings, you will have to swallow the huge loss of compression efficiency by rendering useless all the current compression tricks. Your approach first needs to make up for that loss before you can think about gaining something.

The space savings don't matter much before 7 pieces.

Did you do the math? With perfect indexing (one index value per legal position, which is unachievable in practice) and 5 positions per byte (3^5 = 243 < 256) as only compression, you would need 705 GB to store all 6-piece WDL tables (and you lose 50-move info). The Syzygy TBs need about 68 GB (including 50-move info). This makes a big difference in probing speed, unless your machine has a half TB of RAM that you have no other use for.

syzygy · Post by **syzygy** » Fri Dec 11, 2020 1:40 am

phhnguyen wrote: ↑Tue Dec 08, 2020 10:12 am For current status (using Syzygy), you may save a lot of space too by simply deleting many "redundant" endgames, say KQvK, KRvK, KQQvK, KQQvKB... SF can win those endgames easily without help from Syzygy. I guess that your shallow search can work successfully mostly for those endgames

The lopsided WDL tables compess very well, so leaving them out doesn't save a lot of space. Most of the data is already covering the not so trivial positions.

For example, KQQvKB.rtbw is 32976 bytes, KRRvKQ.rtbw is 4068432 bytes.

Can EGTB storage requirements be reduced using this scheme?

Re: Can EGTB storage requirements be reduced using this scheme?

Re: Can EGTB storage requirements be reduced using this scheme?

Re: Can EGTB storage requirements be reduced using this scheme?

Re: Can EGTB storage requirements be reduced using this scheme?

Re: Can EGTB storage requirements be reduced using this scheme?

Re: Can EGTB storage requirements be reduced using this scheme?

Re: Can EGTB storage requirements be reduced using this scheme?

Re: Can EGTB storage requirements be reduced using this scheme?

Re: Can EGTB storage requirements be reduced using this scheme?

Re: Can EGTB storage requirements be reduced using this scheme?