Forget Syzygy -- Presenting the Emanuel Torresbase!

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

klx
Posts: 179
Joined: Tue Jun 15, 2021 8:11 pm
Full name: Emanuel Torres

Re: Forget Syzygy -- Presenting the Emanuel Torresbase!

Post by klx »

AndrewGrant wrote: Fri Jun 25, 2021 1:04 am Engines can achieve depth 20 easily, but its heavily pruned. If you want a proven tree, your AlphaBeta is very limited in its depth without unsafe pruning decisions.
I respectfully disagree. We can do safe pruning. If you look at this page, the effective branching factor we can expect around 2, 2^20 = 1 million, and we should be able to analyze more than 1 million nodes per second.

The other thing is that we know this is endgame (<= 8 pieces), and a terminal position in <= 20 plies. So I would expect the branching factor to be even less.
[Moderation warning] This signature violated the rule against commercial exhortations.
klx
Posts: 179
Joined: Tue Jun 15, 2021 8:11 pm
Full name: Emanuel Torres

Re: Forget Syzygy -- Presenting the Emanuel Torresbase!

Post by klx »

phhnguyen wrote: Fri Jun 25, 2021 2:09 am 1) I agree a large number of non-draw positions, not that 99%, but about 90% is easy to search out in reasonable time, say, within 3 seconds threshold
2) However, 10% of non-draw positions may take much much longer, even hours
3) The most trouble is about draw-positions. You can't easily search and conclude if a position is a draw. Sometimes you need to search over 100 plies (because of rule 50-moves) and/or hours
4) the mix between 2 and 3 gives you no good solution
This is not how the Emanuel Torresbase ("ETB") works. It does the full retrograde analysis, and stores the DTM for the tiny portion of positions with depth > N and all draws. So the only positions we have to alpha-beta search for are the ones with depth <= N (for example 20). All the others we can instantly look up in the database.
phhnguyen wrote: Fri Jun 25, 2021 2:09 am Your idea of collecting data for some specific positions only (then search or do whatever from them) is a bad one since they will eat a lot of space, memory, and time for processing (you may need to load them all into memory then make some binary searches for a given position). Your data may end up quite closed on size with Syzygy and/or not worth trouble.
I respectfully disagree. ETB can be configured for speed (hash map) or memory efficiency (binary search). Example with 1 billion positions, a binary search does ~30 queries. In practice, this will be not slower but faster than a "constant time" DTM lookup, since it is orders of magnitude smaller than DTM so it fits in a lower latency storage media. "Constant time" is not constant time.
[Moderation warning] This signature violated the rule against commercial exhortations.
User avatar
phhnguyen
Posts: 1437
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Forget Syzygy -- Presenting the Emanuel Torresbase!

Post by phhnguyen »

klx wrote: Fri Jun 25, 2021 3:30 am
phhnguyen wrote: Fri Jun 25, 2021 2:09 am 1) I agree a large number of non-draw positions, not that 99%, but about 90% is easy to search out in reasonable time, say, within 3 seconds threshold
2) However, 10% of non-draw positions may take much much longer, even hours
3) The most trouble is about draw-positions. You can't easily search and conclude if a position is a draw. Sometimes you need to search over 100 plies (because of rule 50-moves) and/or hours
4) the mix between 2 and 3 gives you no good solution
This is not how the Emanuel Torresbase ("ETB") works. It does the full retrograde analysis, and stores the DTM for the tiny portion of positions with depth > N and all draws. So the only positions we have to alpha-beta search for are the ones with depth <= N (for example 20). All the others we can instantly look up in the database.
phhnguyen wrote: Fri Jun 25, 2021 2:09 am Your idea of collecting data for some specific positions only (then search or do whatever from them) is a bad one since they will eat a lot of space, memory, and time for processing (you may need to load them all into memory then make some binary searches for a given position). Your data may end up quite closed on size with Syzygy and/or not worth trouble.
I respectfully disagree. ETB can be configured for speed (hash map) or memory efficiency (binary search). Example with 1 billion positions, a binary search does ~30 queries. In practice, this will be not slower but faster than a "constant time" DTM lookup, since it is orders of magnitude smaller than DTM so it fits in a lower latency storage media. "Constant time" is not constant time.
Look like you were not aware of how huge an EGTB is. 1 billion positions require over 8 GB RAM (hash key + some extra info), reading that size only into memory is not fun for any app/purpose. In contrast, a standard EGTB doesn't require reading all data into memory (but only a block of a few KB each time, or a few tens bytes in the case of Syzygy). One endgame (even 6 men) alone may far exceed 1 billion positions. Even you have some tricks to store a smaller number but not all, say 10% only, the total number of positions you must write down is still a huge one and may exceed the RAM of model computers.

BTW, just some of my thoughts and experience. They may be outdated. You may have better, clever ideas. Good luck and remember to update us even your project successes or failures.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
klx
Posts: 179
Joined: Tue Jun 15, 2021 8:11 pm
Full name: Emanuel Torres

Re: Forget Syzygy -- Presenting the Emanuel Torresbase!

Post by klx »

phhnguyen wrote: Fri Jun 25, 2021 4:40 am Look like you were not aware of how huge an EGTB is. 1 billion positions require over 8 GB RAM (hash key + some extra info), reading that size only into memory is not fun for any app/purpose. In contrast, a standard EGTB doesn't require reading all data into memory (but only a block of a few KB each time, or a few tens bytes in the case of Syzygy). One endgame (even 6 men) alone may far exceed 1 billion positions. Even you have some tricks to store a smaller number but not all, say 10% only, the total number of positions you must write down is still a huge one and may exceed the RAM of model computers.
The idea is for the ETB (this is the Emanuel Torresbase) to be about 1% size of a conventional DTM database. In this case, the non-search lookups will most likely outperform a DTM lookup in practice, even though it's slower in theory (example binary search vs array lookup). The reason is that in practice, you will be able to fit it in a lower latency storage.

When your DTM data fits on HDD, ETB fits on SSD. When your DTM fits on SSD, ETB fits in RAM. When your DTM fits in RAM, ETB fits in L3 cache. Etc.

You also would not necessarily keep the whole thing in RAM to do binary search, just a small index, and then access the relevant part on disk on demand.

Hope this make sense!
[Moderation warning] This signature violated the rule against commercial exhortations.
User avatar
phhnguyen
Posts: 1437
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Forget Syzygy -- Presenting the Emanuel Torresbase!

Post by phhnguyen »

klx wrote: Fri Jun 25, 2021 5:14 am
phhnguyen wrote: Fri Jun 25, 2021 4:40 am Look like you were not aware of how huge an EGTB is. 1 billion positions require over 8 GB RAM (hash key + some extra info), reading that size only into memory is not fun for any app/purpose. In contrast, a standard EGTB doesn't require reading all data into memory (but only a block of a few KB each time, or a few tens bytes in the case of Syzygy). One endgame (even 6 men) alone may far exceed 1 billion positions. Even you have some tricks to store a smaller number but not all, say 10% only, the total number of positions you must write down is still a huge one and may exceed the RAM of model computers.
The idea is for the ETB (this is the Emanuel Torresbase) to be about 1% size of a conventional DTM database. In this case, the non-search lookups will most likely outperform a DTM lookup in practice, even though it's slower in theory (example binary search vs array lookup). The reason is that in practice, you will be able to fit it in a lower latency storage.

When your DTM data fits on HDD, ETB fits on SSD. When your DTM fits on SSD, ETB fits in RAM. When your DTM fits in RAM, ETB fits in L3 cache. Etc.

You also would not necessarily keep the whole thing in RAM to do binary search, just a small index, and then access the relevant part on disk on demand.

Hope this make sense!
IMO, your number 1% looks unrealistic. Both draw and hard positions are far over that number and you should add them too to your database since they are not easy to search out. I think the real number should be between 10% - 30% and with those numbers, it may be too hard to balance between benefit - trouble. Be aware that each position in your database now takes more space than the traditional ones, say 8 times as large because of storing with hash keys or something similar.

You may do homework (I have done myself once) to get the real number: scan positions from an EGTB (but ignored small, easy endgames), pick them randomly (we can't do all since that takes too much time), and try to search within a time threshold, say 3 seconds. Not all but some endgames may become a nightmare for searching thus you must be patient till the end.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
User avatar
Rebel
Posts: 6995
Joined: Thu Aug 18, 2011 12:04 pm

Re: Forget Syzygy -- Presenting the Emanuel Torresbase!

Post by Rebel »

klx wrote: Fri Jun 25, 2021 5:14 amThe idea is for the ETB (this is the Emanuel Torresbase) to be about 1% size of a conventional DTM database.
Just start with KPK and post the size.
klx wrote: Fri Jun 25, 2021 5:14 amHope this make sense!
Nope.
90% of coding is debugging, the other 10% is writing bugs.
User avatar
yurikvelo
Posts: 710
Joined: Sat Dec 06, 2014 1:53 pm

Re: Forget Syzygy -- Presenting the Emanuel Torresbase!

Post by yurikvelo »

klx wrote: Thu Jun 24, 2021 6:09 pm in the Emanuel Torresbase, we store a special identifier instead of the actual DTM for these "easily-won/lost" positions

Syzygy 7 men: 16.7 TiB
Minus 99.95%: 8.5 GiB
Syzygy is not DTM, it is DTZ. It doesn't store any DTM information
DTM is Nalimov/Lomonosov.

So, potentially, Emanuel can have size a little bit less than Nalimov, but 10x more than Syzygy
During query of a position, if this special value is found, we do alpha-beta to find the outcome. In other words, we can cut out up to 99.95% of the table!
your special identifier consume the same amount of space as Syzygy identifier.
Expected reduction of Nalimov DTM bases is very small.
User avatar
phhnguyen
Posts: 1437
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Forget Syzygy -- Presenting the Emanuel Torresbase!

Post by phhnguyen »

klx wrote: Thu Jun 24, 2021 6:09 pm 1. The vast majority of positions are won within a few number of moves. For example, from the syzygy stats site it seems that towards 99% of positions are won in less than 20 plies (depending the table, some are a lot more like 99.95%).

2. We can trivially search to depth 20 plies for endgames with alpha-beta in a fraction of a second.
I think you have a wrong assumption here. Syzygy uses the metric DTZ50 - somewhat is kind of DTC (deep to conversion). It means when it says 20 plies for a given position, you need to search 20 plies to have a conversion (capture, push pawn, mate...). If it is not a mate, you may not know the result yet and your search function can't stop after that 20th ply but has to search beyond.

For some 6 men endgames, Nalimov EGTB has to use 2 bytes for its DTM. It means a search function has to work over depth 256 to complete. That is why I have said (above) that some endgames require hours or even longer of searching.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
klx
Posts: 179
Joined: Tue Jun 15, 2021 8:11 pm
Full name: Emanuel Torres

Re: Forget Syzygy -- Presenting the Emanuel Torresbase!

Post by klx »

Rebel wrote: Fri Jun 25, 2021 6:01 am Just start with KPK and post the size.
Ok so the problem with KPK is there are a lot of draws which will add space. We can test another set with less draws.
[Moderation warning] This signature violated the rule against commercial exhortations.
klx
Posts: 179
Joined: Tue Jun 15, 2021 8:11 pm
Full name: Emanuel Torres

Re: Forget Syzygy -- Presenting the Emanuel Torresbase!

Post by klx »

Rebel wrote: Fri Jun 25, 2021 6:01 am Just start with KPK and post the size.
Alright with the Kolmogorov extension I can do KPK in 1248 bytes, with millisecond lookup time!
[Moderation warning] This signature violated the rule against commercial exhortations.