Can EGTB storage requirements be reduced using this scheme?

mmt · Post by **mmt** » Fri Dec 11, 2020 2:16 am

syzygy wrote: ↑Fri Dec 11, 2020 1:19 am What he means is that your call to SF to predict the correct TB value will struggle to return something useful.

The accuracy depends a lot on the ending. KQRPvKRB, for example, is 99.5% correct (it's 90% wins, 9% draws, 1% losses, and 10 GB).

syzygy wrote: ↑Fri Dec 11, 2020 1:19 amBefore you can think about space savings, you will have to swallow the huge loss of compression efficiency by rendering useless all the current compression tricks. Your approach first needs to make up for that loss before you can think about gaining something.

True, I'm not familiar enough with how space-efficient is your encoding scheme or where it falls on the space-probing time spectrum. It's just an exploratory post to see if I'm missing something, I haven't started to write any code yet.

syzygy wrote: ↑Fri Dec 11, 2020 1:19 am Did you do the math? With perfect indexing (one index value per legal position, which is unachievable in practice) and 5 positions per byte (3^5 = 243 < 256) as only compression, you would need 705 GB to store all 6-piece WDL tables (and you lose 50-move info). The Syzygy TBs need about 68 GB (including 50-move info). This makes a big difference in probing speed, unless your machine has a half TB of RAM that you have no other use for.

I do have half a TB of RAM actually and I will likely have 1 TB soon. Since only a fraction of tables is needed for analysis, it should fit in RAM on a smaller system also, so it doesn't need much special handling and I don't consider its size as significant.

Angrim · Post by **Angrim** » Fri Dec 11, 2020 6:06 am

Interesting idea, not one that I'd use but it's at least fun to think about. So the basic idea is you search each position in the tables a fixed number of nodes, and then use a simple heuristic to classify the position as w/l/d, then compare that to the real value, and if the search result is right you store a special <search solves> value in the table. Since the heuristic won't give distance to mate, you can only do w/d/l with this, so the table would have 6 possible values w/d/l/w50/l50/<search> with <search> being a large majority of the results. I think that cursed win/blessed losses are rare enough that it would be sane to just always store those and not worry about trying to find them with search. This should give a decent compression gain, but does have a few downsides.
issues:
1. Time to compress, this would be a lot slower than the current method. Fortunately you only need to compress once, so that might not be a fatal flaw(but see 5).
2. Need to have the exact version of stockfish that was used for the compression running in memory whenever you do a probe. Even stockfish would need to carry around an older version of stockfish to ensure that the results matched up. This is solvable, but very annoying to deal with.
3. Time to decompress. Usually w/d/l tables are probed in the search, and not just at the base of the tree, so speed matters. And this would be somewhat slow. In addition to the current decompression time, in the case that you get a <search> result you not only have the cost of a 200 node search, but also the overhead of running a second program and setting up a new position to it for it to search. Swapping between programs is slow. This is probably a fatal flaw.
4. engine origin confusion. Note that we have an entire forum dedicated to arguing about engine origins. Imagine how much worse it would get if an engine claiming to not be based on stockfish had a copy of stockfish running in memory whenever it ran.
5. inevitable obsolescence. Stockfish is constantly improving, so the version used to make the tables would be obsolete long before the tables were done, and the new version would probably give better compression, so there would be constant wants to restart the compression with a better version. This might not be an issue for everyone, but it would annoy me.

Possible solution to a lot of that, instead of stockfish, use a simple mate solver to attempt to decide if a position is w/l/d, it would be less accurate, but could be much faster and use minimal memory, and clearly not be stockfish

Whether that would work would depend on how accurate it could be made.

mmt · Post by **mmt** » Fri Dec 11, 2020 7:32 am

Thanks for the post.

Angrim wrote: ↑Fri Dec 11, 2020 6:06 amSince the heuristic won't give distance to mate, you can only do w/d/l with this, so the table would have 6 possible values w/d/l/w50/l50/<search> with <search> being a large majority of the results. I think that cursed win/blessed losses are rare enough that it would be sane to just always store those and not worry about trying to find them with search.

If you do a search even when the position is marked as having the true value stored, you need to store just 2 possible values instead of 3 (ignoring cursed wins/losses) because you know that it won't be the W, D or L value returned by the search.

Angrim wrote: ↑Fri Dec 11, 2020 6:06 am3. Time to decompress. Usually w/d/l tables are probed in the search, and not just at the base of the tree, so speed matters. And this would be somewhat slow. In addition to the current decompression time, in the case that you get a <search> result you not only have the cost of a 200 node search, but also the overhead of running a second program and setting up a new position to it for it to search. Swapping between programs is slow. This is probably a fatal flaw.

I don't think it would be that slow. First, you probably need just something like 70 nodes on average, depending on the endgame. Second, you would set up the board for SF and call the search function directly, not using UCI or FENs. This function could be further optimized as compared to the regular SF search function (it's just endgames, very shallow, no castling...). You would only use this probing when the remaining depth is like 2 or 3. If there are indeed space savings, the effects of a slower search would be counteracted by having more RAM available for e.g. regular hash tables.

syzygy · Post by **syzygy** » Sat Dec 12, 2020 1:31 am

mmt wrote: ↑Fri Dec 11, 2020 2:16 amI do have half a TB of RAM actually and I will likely have 1 TB soon. Since only a fraction of tables is needed for analysis, it should fit in RAM on a smaller system also, so it doesn't need much special handling and I don't consider its size as significant.

Then I guess you're the lonely outlier in this respect.

syzygy · Post by **syzygy** » Sat Dec 12, 2020 1:43 am

mmt wrote: ↑Fri Dec 11, 2020 7:32 am Thanks for the post.
Angrim wrote: ↑Fri Dec 11, 2020 6:06 amSince the heuristic won't give distance to mate, you can only do w/d/l with this, so the table would have 6 possible values w/d/l/w50/l50/<search> with <search> being a large majority of the results. I think that cursed win/blessed losses are rare enough that it would be sane to just always store those and not worry about trying to find them with search.
If you do a search even when the position is marked as having the true value stored, you need to store just 2 possible values instead of 3 (ignoring cursed wins/losses) because you know that it won't be the W, D or L value returned by the search.

How are you going to "mark" a position "as having the true value stored" without encoding this in the value you store for the position?
And for a "marked" position, how are you going to store a choice from W/D/L with just 2 possible values? The true value is one of three values W, D, L.

Angrim wrote: ↑Fri Dec 11, 2020 6:06 am3. Time to decompress. Usually w/d/l tables are probed in the search, and not just at the base of the tree, so speed matters. And this would be somewhat slow. In addition to the current decompression time, in the case that you get a <search> result you not only have the cost of a 200 node search, but also the overhead of running a second program and setting up a new position to it for it to search. Swapping between programs is slow. This is probably a fatal flaw.
I don't think it would be that slow. First, you probably need just something like 70 nodes on average, depending on the endgame.

Do you mean a fixed but different number of nodes per endgame? That is going to be a lot of tuning for each table.

Second, you would set up the board for SF and call the search function directly, not using UCI or FENs. This function could be further optimized as compared to the regular SF search function (it's just endgames, very shallow, no castling...). You would only use this probing when the remaining depth is like 2 or 3. If there are indeed space savings, the effects of a slower search would be counteracted by having more RAM available for e.g. regular hash tables.

SF gathers huge amounts of statistics while searching to be used for move ordering. You will have to rip that out of your "probing code SF", which will cost a lot of efficiency. Same for TT. This is the case even if you freeze the SF version you start with.

mmt · Post by **mmt** » Sat Dec 12, 2020 3:43 am

syzygy wrote: ↑Sat Dec 12, 2020 1:43 am How are you going to "mark" a position "as having the true value stored" without encoding this in the value you store for the position?

One way could be like I wrote earlier: true value (1 bit, ignoring cursed wins/losses), gap (maybe 7 bits, maybe depends on the endgame), true value, gap... A sort of simplified RLE. I don't know if it's near optimal. I do know that my scheme reduces entropy significantly compared to storing all values (I know that you're not exactly storing all values in your tables), which should result in better compression ratios.

syzygy wrote: ↑Sat Dec 12, 2020 1:43 amAnd for a "marked" position, how are you going to store a choice from W/D/L with just 2 possible values? The true value is one of three values W, D, L.

If the search returns W, then 0=D, 1=L. If the search returns D, then 0=W, 1=L. If the search return L then 0=W, 1=D.

syzygy wrote: ↑Sat Dec 12, 2020 1:43 am Do you mean a fixed but different number of nodes per endgame? That is going to be a lot of tuning for each table.

Not a single fixed number but a few parameters for each endgame for SF invocations that use the iterative deepening scheme I described earlier.

Tuning should be easy: you just take a random sample of let's say 10k positions from the endgame and optimize the parameters (number of nodes searched, cutoff scores for draws and wins/losses for, let's say, 3 iterations) using your favorite method to get the best score based on what you want the tradeoff between the size and speed to be. That's what I did manually to go down from an average of 200 nodes to 90 nodes without any loss of accuracy for 62 large EGTBs.

syzygy wrote: ↑Sat Dec 12, 2020 1:43 amSF gathers huge amounts of statistics while searching to be used for move ordering. You will have to rip that out of your "probing code SF", which will cost a lot of efficiency. Same for TT. This is the case even if you freeze the SF version you start with.

Maybe it would matter, maybe not. Needs to be checked. It should not matter for LC0 for example - the probing speed of my scheme should be fast enough to run at all leaves.

syzygy · Post by **syzygy** » Sat Dec 12, 2020 3:31 pm

mmt wrote: ↑Sat Dec 12, 2020 3:43 am
syzygy wrote: ↑Sat Dec 12, 2020 1:43 am How are you going to "mark" a position "as having the true value stored" without encoding this in the value you store for the position?
One way could be like I wrote earlier: true value (1 bit, ignoring cursed wins/losses), gap (maybe 7 bits, maybe depends on the endgame), true value, gap... A sort of simplified RLE. I don't know if it's near optimal. I do know that my scheme reduces entropy significantly compared to storing all values (I know that you're not exactly storing all values in your tables), which should result in better compression ratios.
syzygy wrote: ↑Sat Dec 12, 2020 1:43 amAnd for a "marked" position, how are you going to store a choice from W/D/L with just 2 possible values? The true value is one of three values W, D, L.
If the search returns W, then 0=D, 1=L. If the search returns D, then 0=W, 1=L. If the search return L then 0=W, 1=D.

Ok, so you are basically storing two tables. A first table with a value for each position indicating whether a shallow search returns the correct value, and a second table with a value for each position for which the shallow search does not return the correct value.

With the right compression method, I suspect it would be more efficient and simpler to combine these two tables into one.

The simplest way would be three values 0, 1, 2.
0 = use predicted outcome
1 = use most likely outcome among W, D, L different from the predicted outcome
2 = use least likely outcome among W, D, L different from the predicted outcome
where most/least likely is determined according to the frequencies of W, D, L as outcomes for the particular table.

This can probably be improved by conditioning on the predicted outcome. For example, if the predicted outcome is W, then D might be more likely than L even if for the table as a whole L is more likely than D.

syzygy wrote: ↑Sat Dec 12, 2020 1:43 am Do you mean a fixed but different number of nodes per endgame? That is going to be a lot of tuning for each table.
Not a single fixed number but a few parameters for each endgame for SF invocations that use the iterative deepening scheme I described earlier.

Tuning should be easy: you just take a random sample of let's say 10k positions from the endgame and optimize the parameters (number of nodes searched, cutoff scores for draws and wins/losses for, let's say, 3 iterations) using your favorite method to get the best score based on what you want the tradeoff between the size and speed to be. That's what I did manually to go down from an average of 200 nodes to 90 nodes without any loss of accuracy for 62 large EGTBs.

OK, that is indeed doable.

Using a predictor and encoding only the delta between the predictions and the true values is not a new idea but, as far as I am aware, has never worked (or perhaps: has never been implemented) especially well.

Syzygy tables use the following "predictor":
- if there is a winning capture, it is a win (meaning that the position can be encoded with whatever value compresses best);
- if there is a drawing capture, it is at least a draw (meaning that a draw can be encoded as a loss for that position if that is more space efficient);
- otherwise, the position is like any other position in the table, and the compression algorithm will ensure that more likely outcomes are encoded more space-efficiently.
The (partial) "don't care" aspect of this encoding is used in the first ("Re-Pair"-based) compression phase (one can do much better than statically assigning the same value to all don't cares). This also takes care of illegal positions/index values, which can also be encoded with whatever value compresses best.

One can go further by looking at checking moves. If there is winning checking move, the position is a win, etc. But this is getting a lot more expensive since you would need to probe the same 7-piece table many times (the captures by definition enter smaller tables), potentially leading to a combinatorial explosion. Perhaps it is better to look for forced combinations check-evasion-winning/drawing capture, which avoids having to probe the same 7-piece table multiple times during a probe (this is something I wish I had tried when writing my generator).

I think a regular predictor (e.g. based on SF) cannot fully make use of the definite information (including no information) that can be obtained by a relatively cheap single round of captures (perhaps including checking moves). However, it should be possible to combine the approaches. This does seem tricky, though.

If the approaches are combined, the regular predictor's main task is to do well on "quiet" positions that do not have a winning capture or winning checking move. I doubt that SF is the best possible predictor in terms of time vs quality.

petero2 · Post by **petero2** » Sat Dec 12, 2020 5:32 pm

mmt wrote: ↑Mon Dec 07, 2020 6:03 pm One idea to cut down the size of WDL EGTBs is:
- For all positions in EGTBs run SF without EGTBs for 200 nodes
- See if the SF score matches the true results using some cutoff (e.g. if the score is <-1.5 and we are getting mated, between -1.5 and 1.5 and it's a draw, or >1.5 and we have a mate). Do it single-threaded so it's deterministic

You may be interested in this post and the following discussion:

Re-Pair compression questions

It describes my unfinished attempt to improve compression of the Syzygy wdl tables using decision trees. It would be possible to extend that scheme with predicates of the type "searchScore <= x", where searchScore is some deterministic function of the position to probe.

Sesse · Post by **Sesse** » Sat Dec 12, 2020 7:02 pm

phhnguyen wrote: ↑Tue Dec 08, 2020 10:12 amBTW, Fishtest has revealed that the Syzygy 6-men (no data for 7-men) can help SF to gain about 13 Elo, a good amount but not really vital IMO. You may remove the EGTB then gain back about 5-10 Elo by improving the endgame code for SF.

I think this is a case where it's easy to be a bit “Elo blind”. Elo can only say something about winning chances in machine play, and if you care about quality of scores (for human analysis), that's not the same thing. E.g. evaluating the best move in a dead-drawn position as +100 costs 0 Elo if there are no losing moves in the position. But for human analysis, it's clearly interesting to know if it's possible to hold a draw or not. (This is an extreme example, of course, since could hope that you'd choose a different move in an earlier situation to avoid the dead draw.) I frequently see this when my own site reports +0.05 or whatever in endgame situations where other broadcasters report +2 or +3; most likely, it wouldn't matter in self-play since there's no win to be had anyway, but it matters for me.

Tuning engines through self-play has yielded fantastic results over the last decades; probably 1000 Elo or more, and we know of no better way. But it's also left gaping holes when we try to use the scores (which, when tuning, are only used for choosing moves relative to each other) as values in their own right. Fortresses is another typical example.

mmt · Post by **mmt** » Sun Dec 13, 2020 5:54 am

Thanks for the posts, good info. I will do some reading and playing around with code and see where it goes.

Can EGTB storage requirements be reduced using this scheme?

Re: Can EGTB storage requirements be reduced using this scheme?

Re: Can EGTB storage requirements be reduced using this scheme?

Re: Can EGTB storage requirements be reduced using this scheme?

Re: Can EGTB storage requirements be reduced using this scheme?

Re: Can EGTB storage requirements be reduced using this scheme?

Re: Can EGTB storage requirements be reduced using this scheme?

Re: Can EGTB storage requirements be reduced using this scheme?

Re: Can EGTB storage requirements be reduced using this scheme?

Re: Can EGTB storage requirements be reduced using this scheme?

Re: Can EGTB storage requirements be reduced using this scheme?