EGTB encoding

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Edmund
Posts: 670
Joined: Mon Dec 03, 2007 3:01 pm
Location: Barcelona, Spain

EGTB encoding

Post by Edmund »

Endgame tablebase formats (other than wdl) usually store for each position the distance in moves to a certain target (zeroing, conversion, mate, rule). Is there a reason not to store the move to the position closest to the target instead?

The latter could be better compressible for most endgames. Assuming trivial encoding, the maximum number of legal moves in a position of a certain piece configuration would usually be smaller than the maximum number of moves to reach the target.

Distance-to-target formats can be further compressed by combining search with reducing the precision or skipping positions. e.g. only store white-to-move positions, and do a 2 ply search to find the best move; this halfs the number of positions. In theory this can be further extended, but the probing could become very time-consuming. For best-move formats an interesting analogy could be to run a deterministic search without lookups to improve the move ordering and thus reduce the entropy when storing the order of the best move.

Are you aware of any experiments in this direction?
Koistinen
Posts: 18
Joined: Sun May 23, 2021 10:05 pm
Location: Stockholm, Sweden
Full name: Urban Koistinen

Re: EGTB encoding

Post by Koistinen »

I have been planning to look into the possibility of using an imperfect oracle for wdl as well as for best move, then have a lookup table for exceptions.

If the oracle is good enough, the total size might be reduced much.
I think the oracle could be allowed time for more than a 2 ply search for best move, perhaps a 10-ply would be reasonable as when you need the move, you don't need many.
The wdl probe needs to be fast as you want to use it near leaves of the search tree.
Edmund
Posts: 670
Joined: Mon Dec 03, 2007 3:01 pm
Location: Barcelona, Spain

Re: EGTB encoding

Post by Edmund »

interesting. I am less optimistic that "storing exceptions" for moves would be a space saver. Think of some complicated endgame. A short search (even 10 ply) is totally clueless. It may provide some candidate moves, but thats it. Already KQKR or KQKN contain some tricky positions. You would have too many exceptions. Better is to use evaluation to reduce entropy and then apply some standard compression algorithm.

What have you got in mind for wdl as an oracle and an approach to store exceptions?
Did you run any tests?
petero2
Posts: 690
Joined: Mon Apr 19, 2010 7:07 pm
Location: Sweden
Full name: Peter Osterlund

Re: EGTB encoding

Post by petero2 »

This might be of interest:

http://talkchess.com/forum3/viewtopic.p ... 36#p796713

Storing exceptions in a separate table would have a lot of overhead because you would need to store positions and values, instead of just values. In my experiment I therefore used the oracle to remap the values stored in the tablebase, so that 0 means "the oracle is right", 1 means "the oracle's second guess is right", etc. If you have an oracle that is almost always right, most values will be 0 and the compression will be really good.

In my case I used a decision tree as the oracle, but other types of oracles would work too.
syzygy
Posts: 5569
Joined: Tue Feb 28, 2012 11:56 pm

Re: EGTB encoding

Post by syzygy »

Edmund wrote: Tue Sep 19, 2023 12:17 am Endgame tablebase formats (other than wdl) usually store for each position the distance in moves to a certain target (zeroing, conversion, mate, rule). Is there a reason not to store the move to the position closest to the target instead?

The latter could be better compressible for most endgames. Assuming trivial encoding, the maximum number of legal moves in a position of a certain piece configuration would usually be smaller than the maximum number of moves to reach the target.
You would need to store the moves both for wtm and for btm. If you store dtz (or dtm), you only need the "smaller" of wtm and btm, which reduces storage by at least a factor of 3 after compression. Also, most DTZ values are really small, so they compress very well. This is different for DTM.

But if we have WDL available, then in a winning position you could use WDL to find the list of winning moves, perhaps use some prediction algorithm to rank the moves in a predicted order of best to worse, and use this to compress the rank of the actual best move. The better the prediction, the better this will work. For positions that are drawn or lost you wouldn't need to store anything (i.e. you pick the value that compresses best).

It would be interesting to see how the size compares to 1-sided DTZ tables.
Compared with 2-sided DTM tables I do not doubt that it will be a win in size. It probably also beats 1-sided DTM.

Not having information on how close you are to a mate (or winning conversion) is of course a downside. In case of DTZ, you would lose the possibility of knowing whether an initial cursed win is now a real win. Potentially a downside that can be lived with, though.
Distance-to-target formats can be further compressed by combining search with reducing the precision or skipping positions. e.g. only store white-to-move positions, and do a 2 ply search to find the best move; this halfs the number of positions.
That is being done already. Plus other tricks.
In theory this can be further extended, but the probing could become very time-consuming.
Indeed, in theory a 32-piece tablebase can be compressed to the size of the program searching positions to mate. In the end it is a trade off.
Are you aware of any experiments in this direction?
No, but please have a go at it :-)
syzygy
Posts: 5569
Joined: Tue Feb 28, 2012 11:56 pm

Re: EGTB encoding

Post by syzygy »

Edmund wrote: Tue Sep 19, 2023 7:44 pm interesting. I am less optimistic that "storing exceptions" for moves would be a space saver. Think of some complicated endgame. A short search (even 10 ply) is totally clueless. It may provide some candidate moves, but thats it. Already KQKR or KQKN contain some tricky positions. You would have too many exceptions. Better is to use evaluation to reduce entropy and then apply some standard compression algorithm.
Also it takes a LOT of time to perform a "short" 10-ply search for every position of a 7- or 8-piece table.
syzygy
Posts: 5569
Joined: Tue Feb 28, 2012 11:56 pm

Re: EGTB encoding

Post by syzygy »

syzygy wrote: Tue Sep 19, 2023 11:11 pmIt would be interesting to see how the size compares to 1-sided DTZ tables.
Compared with 2-sided DTM tables I do not doubt that it will be a win in size. It probably also beats 1-sided DTM.
But if you anyway lose distance information, then in my opinion the target to beat is 1-sided DTZ, not DTM.

You could store either the move that minimizes DTM or the move that minimizes DTZ (but you'd have to be consistent throughout the table). Minimizing DTM would give nicer play, perhaps at the cost of a larger table (not sure about that).

If you want to take into account the 50-move rule, then storing the best move comes with extra disadvantages, but it would still be possible to store the move that minimizes the distance to a position that is won under the 50-move rule.
Koistinen
Posts: 18
Joined: Sun May 23, 2021 10:05 pm
Location: Stockholm, Sweden
Full name: Urban Koistinen

Re: EGTB encoding

Post by Koistinen »

Edmund wrote: Tue Sep 19, 2023 7:44 pm What have you got in mind for wdl as an oracle and an approach to store exceptions?
Did you run any tests?
I have not run any tests, only been thinking of it.
One idea is to build an evaluator by defining lots of properties of a position and then correlate them with the actual values.
Find some way to combine it all and then find out if searching a few ply helps any.
I guess having one evaluator per endgame class might help.
I don't know which way is best, I just find the question interesting.
It is also possible to have uncertainty part of the evaluation and then decide whether to look up the value based on uncertainty.
To store exceptions I think a multilayered approach is best, but testing is needed.
Same idea might work for dtz50, say having an evaluator which gives a range of dtz50 values.
The possible ways to do it are many, maybe you need some luck to find a good way.
syzygy wrote: Tue Sep 19, 2023 11:15 pm Also it takes a LOT of time to perform a "short" 10-ply search for every position of a 7- or 8-piece table.
You would only need to do the search individually when you want the value for a specific position.
syzygy wrote: Tue Sep 19, 2023 11:26 pm But if you anyway lose distance information, then in my opinion the target to beat is 1-sided DTZ, not DTM.
Yes, 1-sided dtz50 is the target to beat.
Perhaps compression of the wdl tables is more important as for the dtz50 values, a network query might be quick enough that you don't need them locally.
syzygy wrote: Tue Sep 19, 2023 11:11 pm No, but please have a go at it :-)
I like that attitude!
petero2 wrote: Tue Sep 19, 2023 7:54 pm This might be of interest:

http://talkchess.com/forum3/viewtopic.p ... 36#p796713

Storing exceptions in a separate table would have a lot of overhead because you would need to store positions and values, instead of just values. In my experiment I therefore used the oracle to remap the values stored in the tablebase, so that 0 means "the oracle is right", 1 means "the oracle's second guess is right", etc. If you have an oracle that is almost always right, most values will be 0 and the compression will be really good.

In my case I used a decision tree as the oracle, but other types of oracles would work too.
How to best store exceptions depends. To me it looks like a rich field, finding oracles and computing and storing exceptions.
syzygy
Posts: 5569
Joined: Tue Feb 28, 2012 11:56 pm

Re: EGTB encoding

Post by syzygy »

Koistinen wrote: Wed Sep 20, 2023 8:03 pm
syzygy wrote: Tue Sep 19, 2023 11:15 pm Also it takes a LOT of time to perform a "short" 10-ply search for every position of a 7- or 8-piece table.
You would only need to do the search individually when you want the value for a specific position.
I am talking about the generation phase.
Koistinen
Posts: 18
Joined: Sun May 23, 2021 10:05 pm
Location: Stockholm, Sweden
Full name: Urban Koistinen

Re: EGTB encoding

Post by Koistinen »

syzygy wrote: Fri Sep 22, 2023 8:37 pm
Koistinen wrote: Wed Sep 20, 2023 8:03 pm
syzygy wrote: Tue Sep 19, 2023 11:15 pm Also it takes a LOT of time to perform a "short" 10-ply search for every position of a 7- or 8-piece table.
You would only need to do the search individually when you want the value for a specific position.
I am talking about the generation phase.
That is what makes it fun to do. Seems impossible at first but isn't if you look at it from the correct perspective.