Hi there, I have come up with a pretty revolutionary idea to vastly reduce the size of a DTM endgame database.
Here are the facts that lead to my discovery:
1. The vast majority of positions are won within a few number of moves. For example, from the syzygy stats site it seems that towards 99% of positions are won in less than 20 plies (depending the table, some are a lot more like 99.95%).
2. We can trivially search to depth 20 plies for endgames with alpha-beta in a fraction of a second.
So, in the Emanuel Torresbase, we store a special identifier instead of the actual DTM for these "easily-won/lost" positions. During query of a position, if this special value is found, we do alpha-beta to find the outcome. In other words, we can cut out up to 99.95% of the table!
The Emanuel Torresbase is fully adjustable. We can configure the threshold up and down to prefer more disk usage and less compute time, and vice versa. With threshold 0, it becomes plain endgame database. With threshold Infinity it becomes pure alpha-beta search. The threshold can be configured per table, and either in plies or percent.
The Emanuel Torresbase reduces the size of existing databases and paves the way for 8-men database.
Syzygy 7 men: 16.7 TiB
Minus 99.95%: 8.5 GiB
Estimated 8 men size: 8.5 GiB * (16.7 TiB / 149.2 GiB) = 974 GiB
Emanuel Torresbase!
			
			
									
						
							Forget Syzygy -- Presenting the Emanuel Torresbase!
Moderator: Ras
- 
				klx
- Posts: 179
- Joined: Tue Jun 15, 2021 8:11 pm
- Full name: Emanuel Torres
Forget Syzygy -- Presenting the Emanuel Torresbase!
[Moderation warning] This signature violated the rule against commercial exhortations.
			
						- 
				hgm  
- Posts: 28396
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Forget Syzygy -- Presenting the Emanuel Torresbase!
Sounds like total nonsense. For one, 'a fraction of a second' is about a million times slower than 'a fraction of a micro-second', which is what a conventional EGT probe takes.
			
			
									
						
										
						- 
				klx
- Posts: 179
- Joined: Tue Jun 15, 2021 8:11 pm
- Full name: Emanuel Torres
Re: Forget Syzygy -- Presenting the Emanuel Torresbase!
If you are in this position and need to find the best move, sub-second or sub-microsecond performance might not matter.
[Moderation warning] This signature violated the rule against commercial exhortations.
			
						- 
				klx
- Posts: 179
- Joined: Tue Jun 15, 2021 8:11 pm
- Full name: Emanuel Torres
Re: Forget Syzygy -- Presenting the Emanuel Torresbase!
Also, the Emanuel Torresbase is adjustable if you need more speed.
[Moderation warning] This signature violated the rule against commercial exhortations.
			
						- 
				hgm  
- Posts: 28396
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Forget Syzygy -- Presenting the Emanuel Torresbase!
Indeed. But that is not how EGT are used. Their main use is probing close to the leaves of the search tree, to see if the position there is won, lost or draw. So that you can go for the won positions, and avoid the others. And if the tree has a million nodes, the difference between using conventional EGT and your method is whther you complete the search in 1 sec or in a year.
- 
				klx
- Posts: 179
- Joined: Tue Jun 15, 2021 8:11 pm
- Full name: Emanuel Torres
Re: Forget Syzygy -- Presenting the Emanuel Torresbase!
Oh ok, didn't know this. Just learned about endgame databases last week, so I'm not up to speed with all intricacies yet. How can we make best use of the Emanuel Torresbase then? I feel like I'm on to something.
[Moderation warning] This signature violated the rule against commercial exhortations.
			
						- 
				AndrewGrant
- Posts: 1960
- Joined: Tue Apr 19, 2016 6:08 am
- Location: U.S.A
- Full name: Andrew Grant
Re: Forget Syzygy -- Presenting the Emanuel Torresbase!
So this is just a theory? You've not actually done anything to bring it to reality?
			
			
									
						
										
						- 
				klx
- Posts: 179
- Joined: Tue Jun 15, 2021 8:11 pm
- Full name: Emanuel Torres
Re: Forget Syzygy -- Presenting the Emanuel Torresbase!
Well I did the investigation and math in the post above as a kind of proof of concept, but for the actual implementation I plan to start this weekend. Mostly wanted to get some feedback, and to share the idea since I never heard of this concept before and in case this ends up laying the foundation for 8 men database.AndrewGrant wrote: ↑Thu Jun 24, 2021 11:30 pm So this is just a theory? You've not actually done anything to bring it to reality?
[Moderation warning] This signature violated the rule against commercial exhortations.
			
						- 
				AndrewGrant
- Posts: 1960
- Joined: Tue Apr 19, 2016 6:08 am
- Location: U.S.A
- Full name: Andrew Grant
Re: Forget Syzygy -- Presenting the Emanuel Torresbase!
Engines can achieve depth 20 easily, but its heavily pruned. If you want a proven tree, your AlphaBeta is very limited in its depth without unsafe pruning decisions.klx wrote: ↑Fri Jun 25, 2021 12:52 amWell I did the investigation and math in the post above as a kind of proof of concept, but for the actual implementation I plan to start this weekend. Mostly wanted to get some feedback, and to share the idea since I never heard of this concept before and in case this ends up laying the foundation for 8 men database.AndrewGrant wrote: ↑Thu Jun 24, 2021 11:30 pm So this is just a theory? You've not actually done anything to bring it to reality?
- 
				phhnguyen  
- Posts: 1525
- Joined: Wed Apr 21, 2010 4:58 am
- Location: Australia
- Full name: Nguyen Hong Pham
Re: Forget Syzygy -- Presenting the Emanuel Torresbase!
So it is just an idea, not a real practice nor a real base!klx wrote: ↑Thu Jun 24, 2021 6:09 pm Hi there, I have come up with a pretty revolutionary idea to vastly reduce the size of a DTM endgame database.
Here are the facts that lead to my discovery:
1. The vast majority of positions are won within a few number of moves. For example, from the syzygy stats site it seems that towards 99% of positions are won in less than 20 plies (depending the table, some are a lot more like 99.95%).
2. We can trivially search to depth 20 plies for endgames with alpha-beta in a fraction of a second.
So, in the Emanuel Torresbase, we store a special identifier instead of the actual DTM for these "easily-won/lost" positions. During query of a position, if this special value is found, we do alpha-beta to find the outcome. In other words, we can cut out up to 99.95% of the table!
The Emanuel Torresbase is fully adjustable. We can configure the threshold up and down to prefer more disk usage and less compute time, and vice versa. With threshold 0, it becomes plain endgame database. With threshold Infinity it becomes pure alpha-beta search. The threshold can be configured per table, and either in plies or percent.
The Emanuel Torresbase reduces the size of existing databases and paves the way for 8-men database.
Syzygy 7 men: 16.7 TiB
Minus 99.95%: 8.5 GiB
Estimated 8 men size: 8.5 GiB * (16.7 TiB / 149.2 GiB) = 974 GiB
Emanuel Torresbase!
I guess the reality may be much harder than you think. From my experience (I have done some studies a few years ago):
1) I agree a large number of non-draw positions, not that 99%, but about 90% is easy to search out in reasonable time, say, within 3 seconds threshold
2) However, 10% of non-draw positions may take much much longer, even hours
3) The most trouble is about draw-positions. You can't easily search and conclude if a position is a draw. Sometimes you need to search over 100 plies (because of rule 50-moves) and/or hours
4) the mix between 2 and 3 gives you no good solution
Your idea of collecting data for some specific positions only (then search or do whatever from them) is a bad one since they will eat a lot of space, memory, and time for processing (you may need to load them all into memory then make some binary searches for a given position). Your data may end up quite closed on size with Syzygy and/or not worth trouble. Note that for a standard EGTB, it doesn't store position's identifier (such as hash keys, FENs) nor require searching: the probe function can locate exactly and instantly (via index functions) where the data it needs from files without loading them all nor doing any search.
IMHO, it turns out the best solution for a mix of data-searching is something similar to a bitbase (https://www.chessprogramming.org/Endgame_Bitbases).
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
			
						The most features chess GUI, based on opensource Banksia - the chess tournament manager