Since it's best to have 7-piece EGTBs on an SSD, more people would use them if the storage requirements could be cut down. I'm assuming that Syzygy TBs are efficiently compressed already. One idea to cut down the size of WDL EGTBs is:
- For all positions in EGTBs run SF without EGTBs for 200 nodes
- See if the SF score matches the true results using some cutoff (e.g. if the score is <-1.5 and we are getting mated, between -1.5 and 1.5 and it's a draw, or >1.5 and we have a mate). Do it single-threaded so it's deterministic
- Based on some quick tests, we should get 97% agreement
- We only need to store the true values where they differ from SF, so for about 3% of cases
- Store the data as direct value, gap (let's say up to 7 bits), direct value, gap,... and compress
- Include a copy of SF with the EGTBs
- Run SF for 200 nodes instead of getting the direct value for 97% of cases
Would this result in substantial savings and how much slower would it be? I haven't looked deeply into this yet so I hope someone more knowledgeable can comment. Something similar could also be tried for DTZ EGTBs.
Can EGTB storage requirements be reduced using this scheme?
Moderators: hgm, Rebel, chrisw
-
- Posts: 343
- Joined: Sun Aug 25, 2019 8:33 am
- Full name: .
-
- Posts: 27808
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Can EGTB storage requirements be reduced using this scheme?
The problem is that you cannot do random access to such a data set. You would have to run through the entire EGT adding the size of all gaps to know what position the stored DTx are for.
-
- Posts: 343
- Joined: Sun Aug 25, 2019 8:33 am
- Full name: .
Re: Can EGTB storage requirements be reduced using this scheme?
You can store indexes of, let's say, every 100th element and read in just a matching block of needed 100 elements (and do a binary search or just go through them sequentially). This would increase the storage only marginally and probably wouldn't be too bad in terms of performance because of the caching behavior.
-
- Posts: 1437
- Joined: Wed Apr 21, 2010 4:58 am
- Location: Australia
- Full name: Nguyen Hong Pham
Re: Can EGTB storage requirements be reduced using this scheme?
What data did you get from that quick test? If an engine can provide enough data and correct 97% with that shallow search we don’t need EGTB at all
Some big problems may be:
- you can’t get 97% of all positions with that shallow search. Draw positions usually require search significantly deeper than winning/losing ones
- you can’t get distance-to-convert/distance-to-mate with that shallow search either. For example, if a position has DTC/DTM 30, you usually need to search to at least depth 30
Without knowing DTC/DTM the engine’s search usually stop/can’t make any progression even knowing it is a win/draw/loss.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
The most features chess GUI, based on opensource Banksia - the chess tournament manager
-
- Posts: 343
- Joined: Sun Aug 25, 2019 8:33 am
- Full name: .
Re: Can EGTB storage requirements be reduced using this scheme?
I'll do a more proper test and post the results. I think it's not proper to just say that 97% of all positions are predicted correctly because if it's something like KQRBvKR then it will be strongly compressed anyway because 99% of all positions in it are wins. So the baseline is not 1/3rd but could be much higher depending on the EGTB.
Yeah, the accuracy will vary quite a bit based on the EGTB. You could do iterative deepening to look at more nodes if the results after 200 nodes are not decisive or use a specialized NN trained for a given EGTB if you want to get fancy...
Right, I'm more interested in evaluation than actually playing out the positions here. I have most WDL 7-piece TBs but no DTC on my computer. You'd need to modify this approach to produce a playable move... But for the majority of positions, you don't need to make a move that will result in the fastest conversion, so maybe a version of it could do ok as well.
-
- Posts: 1437
- Joined: Wed Apr 21, 2010 4:58 am
- Location: Australia
- Full name: Nguyen Hong Pham
Re: Can EGTB storage requirements be reduced using this scheme?
I am sure a few more thousands of nodes are not enough for complicated endgame positions. E.g., if an endgame has DTC/DTM about 30, you may need to search deeper until... 30 to see if the given position is a draw or a win/loss
You will face other problems:
- you need almost one NN data for one endgame, at the end you may save almost nothing (in terms of spacing) compare with using an EGTB
- NN search maybe so slow compare with using EGTB. Note that for querying an EGTB file, an engine usually reads in a tiny data chunk and data is almost the result - ready to use. In the contrast for NN, you have to load all data of a NN file, build a NN network then computing before having the result,
- so far (from many discussions I have read) NN is not good at all for solving endgames
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
The most features chess GUI, based on opensource Banksia - the chess tournament manager
-
- Posts: 343
- Joined: Sun Aug 25, 2019 8:33 am
- Full name: .
Re: Can EGTB storage requirements be reduced using this scheme?
It doesn't really matter though if we just want to get the highest possible percentage of correct answers from SF relative to the nodes evaluated and only care about WDL. What I mean is something like this:
Evaluate 50 nodes. If the score is -<3.0 or >3.0 or between -0.3 and 0.3, we're done. If not then we evaluate 200 nodes and if the score is <-2.0 or >2.0 or between -0.5 and 0.5, we're done. If not, then we evaluate 800 nodes and just use whatever score SF gets.
I don't think this would be an issue, the 7-piece EGTBs are huge. Some are over 20 GBs.
If the networks are not too big, they would be in the RAM already. But yeah, it would be slower.phhnguyen wrote: ↑Tue Dec 08, 2020 3:51 am - NN search maybe so slow compare with using EGTB. Note that for querying an EGTB file, an engine usually reads in a tiny data chunk and data is almost the result - ready to use. In the contrast for NN, you have to load all data of a NN file, build a NN network then computing before having the result,
Right, if they don't help compared to SF's NNUE then there is no point in using them. Just I haven't seen it tried yet.
-
- Posts: 343
- Joined: Sun Aug 25, 2019 8:33 am
- Full name: .
Re: Can EGTB storage requirements be reduced using this scheme?
OK, I ran the sampling code on random positions from 62 Syzygy WDL EGTBs that are over 20 GB each using 1.0 as the eval cutoff.
The baseline number of entropy bits per position is 0.85. So 200 nodes could result in space savings of up to 58%, 1600 in 68%, 12800 in 87%. If we add 10% overhead for this B+ tree-like indexing, it's still a lot of savings. While this scheme would be slower, the files would be smaller so they could be easier to cache in RAM and this could counteract the performance penalty somewhat.
Code: Select all
Number of nodes for SF Accuracy percentage Entropy bits per position
50 87% 0.60
100 88% 0.45
200 89% 0.36
400 90% 0.30
800 90% 0.30
1600 91% 0.27
3200 92% 0.20
6400 93% 0.16
12800 94% 0.11
-
- Posts: 1437
- Joined: Wed Apr 21, 2010 4:58 am
- Location: Australia
- Full name: Nguyen Hong Pham
Re: Can EGTB storage requirements be reduced using this scheme?
You may save 100% by removing completely EGTBmmt wrote: ↑Tue Dec 08, 2020 5:17 am The baseline number of entropy bits per position is 0.85. So 200 nodes could result in space savings of up to 58%, 1600 in 68%, 12800 in 87%. If we add 10% overhead for this B+ tree-like indexing, it's still a lot of savings. While this scheme would be slower, the files would be smaller so they could be easier to cache in RAM and this could counteract the performance penalty somewhat.
BTW, Fishtest has revealed that the Syzygy 6-men (no data for 7-men) can help SF to gain about 13 Elo, a good amount but not really vital IMO. You may remove the EGTB then gain back about 5-10 Elo by improving the endgame code for SF.
For current status (using Syzygy), you may save a lot of space too by simply deleting many "redundant" endgames, say KQvK, KRvK, KQQvK, KQQvKB... SF can win those endgames easily without help from Syzygy. I guess that your shallow search can work successfully mostly for those endgames
For the rest of the endgames, SF will badly struggle so your shallow search should struggle too. If SF can't find the result and/or how to make progress, there is no magic for your shallow search either.
Furthermore, for a given endgame, it is almost useless if your shallow search can solve not 100% but a smaller percentage of all positions since EGTBs have a very efficient way to store data, thus any way to store "only 30%" Egtb data won't work or the space-saving is not worth at all. You must invert a lot of labor to find a new efficient way to store "30%" data but that is a big challenge IMO.
Note that the size of an EGTB is the key to success, BUT the efficiency and a few other factors may contribute too. E.g., Scorpio's EGTB is using a lot of code/search thus its' data is significantly smaller than Syzygy (1/3) but it is not successful as Syzygy in terms of being popular.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
The most features chess GUI, based on opensource Banksia - the chess tournament manager
-
- Posts: 27808
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Can EGTB storage requirements be reduced using this scheme?
I also doubt the usefulness of shallow searches in difficult end-games. Take KPK as an example. It takes a very deep search to recognize most of the drawn positions (i.e. those where you don't quickly lose the Pawn), without an EGT. Initially you can always make progress pushing the Pawn, until the defending King makes his final stand on the last rank, and you have no choice but to repeat positions or stalemate him. And that is a very favorable example: you have not much lattitute for fooling around without repeating to delay pushing the Pawn and approaching the resolution, as you have only a King, and it must keep protecting the Pawn.
Similarly, in drawn KQKP you can keep checking from I don't know how many different squares before you have to repeat or allow the Pawn to promote.
Shallow searches are only any good when these also probe the EGT. Or have hard-coded rules for some critical positions that can be quickly reached. (E.g. the opposition rule in KPK for drawing, and the 'rule of squares' for wins.)
Similarly, in drawn KQKP you can keep checking from I don't know how many different squares before you have to repeat or allow the Pawn to promote.
Shallow searches are only any good when these also probe the EGT. Or have hard-coded rules for some critical positions that can be quickly reached. (E.g. the opposition rule in KPK for drawing, and the 'rule of squares' for wins.)