A different tablebase encoding format
Moderators: hgm, Rebel, chrisw
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: A different tablebase encoding format
Because there are so many unknowns about the target environment, the only reasonable approach is to deliver the tablebase files in uncompressed format only (or a gzip of each entire file) and let the end user decide which, if any, compression schema is appropriate.
-
- Posts: 5569
- Joined: Tue Feb 28, 2012 11:56 pm
Re: A different tablebase encoding format
I don't see at all why request frequency would be important.sje wrote:I disagree. If the request frequency is high enough, at some point no compression will beat any compression with respect to total time usage.syzygy wrote:There is no single solution that is optimal for all hardware combinations, but using no compression at all seems to be about the worst one could do.
Only if everything fits in RAM will no compression outperform compression. But then why not use compression and go from n-piece tablebases to (n+1)-piece tablebases.
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: A different tablebase encoding format
So much depends on storage media, access time, access latency, demand frequency, and other factors that only empirical data from real world implementations can provide answers. And those answers will be as different as are the host environments.
My experience with Symbolic's earlier incarnation running on the chess servers and with 60 GB of uncompressed tablebase files is that there is simply no time available for decompression calculations given a high access frequency.
A big latency caused by decompress computation hurts. A bigger hurt from decompression is that potentially many blocks have to be transferred and decompressed to fetch a single score value.
A five man endgame tablebase is a five dimensional object and any linear map means that most accesses will not be physically close in the file's data. So no matter how the indexing is done, a slew of decompressed blocks may never be referenced beyond the very first time it's seen, and that just for a single score value. But the long wad of uncompressed data will still eat RAM until automatically purged by Unix or intentionally released by the program.
My experience with Symbolic's earlier incarnation running on the chess servers and with 60 GB of uncompressed tablebase files is that there is simply no time available for decompression calculations given a high access frequency.
A big latency caused by decompress computation hurts. A bigger hurt from decompression is that potentially many blocks have to be transferred and decompressed to fetch a single score value.
A five man endgame tablebase is a five dimensional object and any linear map means that most accesses will not be physically close in the file's data. So no matter how the indexing is done, a slew of decompressed blocks may never be referenced beyond the very first time it's seen, and that just for a single score value. But the long wad of uncompressed data will still eat RAM until automatically purged by Unix or intentionally released by the program.
-
- Posts: 5569
- Joined: Tue Feb 28, 2012 11:56 pm
Re: A different tablebase encoding format
What might be interesting to know is that practically nothing that you write applies to modern tablebase compression schemes like mine.
State-of-the-art tablebase compression versus no compression is not a trade-off but essentially a no brainer. Obviously there is no single compression scheme that is optimal for any hardware combination, but it is very easy to do better than no compression at all on any hardware (unless the uncompressed tables fully fit in RAM, etc.).
Just an arbitrary data point: final 10-piece position in game 9 of the last TCEC final. Over 250K hits/s for Stockfish (30966K hits in 2m01), over 1M hits/s for Komodo (11567K in 11s), no noticeable slowdown in nps for either.
http://tcec.chessdom.com/archive.php?se=6&sf&ga=9
State-of-the-art tablebase compression versus no compression is not a trade-off but essentially a no brainer. Obviously there is no single compression scheme that is optimal for any hardware combination, but it is very easy to do better than no compression at all on any hardware (unless the uncompressed tables fully fit in RAM, etc.).
Just an arbitrary data point: final 10-piece position in game 9 of the last TCEC final. Over 250K hits/s for Stockfish (30966K hits in 2m01), over 1M hits/s for Komodo (11567K in 11s), no noticeable slowdown in nps for either.
http://tcec.chessdom.com/archive.php?se=6&sf&ga=9
-
- Posts: 1334
- Joined: Sun Jul 17, 2011 11:14 am
Re: A different tablebase encoding format
I think you're both comparing apples and oranges.
Steven seems to be on about DTM, which is very slow to probe, thus compression hurts.
Ronald uses DTZ50 at root, which is comparatively not time critical, and WDL during search, which fits in memory, and compression gives you a higher hit rate as more entries fit in memory, thus negating the decompression cost by reducing chance of having to read from disk.
DTM is not well known for its compression abilities, so modern DTM tablebases use heavy compression (Gaviota uses LZMA).
Matthew:out
Steven seems to be on about DTM, which is very slow to probe, thus compression hurts.
Ronald uses DTZ50 at root, which is comparatively not time critical, and WDL during search, which fits in memory, and compression gives you a higher hit rate as more entries fit in memory, thus negating the decompression cost by reducing chance of having to read from disk.
DTM is not well known for its compression abilities, so modern DTM tablebases use heavy compression (Gaviota uses LZMA).
Matthew:out
Some believe in the almighty dollar.
I believe in the almighty printf statement.
I believe in the almighty printf statement.
-
- Posts: 589
- Joined: Tue Jun 04, 2013 10:15 pm
Re: A different tablebase encoding format
A modern scheme separates the WDL fraction from the bulk data. The probing speed of the bulk data is largely irrelevant for search. This is true for any metric.ZirconiumX wrote:I think you're both comparing apples and oranges.
Steven seems to be on about DTM, which is very slow to probe, thus compression hurts.
[Account deleted]
-
- Posts: 5569
- Joined: Tue Feb 28, 2012 11:56 pm
Re: A different tablebase encoding format
We're comparing comparable things: suitable formats for (high-frequency) probing during the search. (Anyway, your point is still that DTM needs a compression scheme, which Steven sort of denies.)ZirconiumX wrote:I think you're both comparing apples and oranges.
Steven seems to be on about DTM, which is very slow to probe, thus compression hurts.
Ronald uses DTZ50 at root, which is comparatively not time critical, and WDL during search, which fits in memory, and compression gives you a higher hit rate as more entries fit in memory, thus negating the decompression cost by reducing chance of having to read from disk.
DTM is not well known for its compression abilities, so modern DTM tablebases use heavy compression (Gaviota uses LZMA).
The choice to place WDL data in separate files (and removing this data from the DTZ or DTM data essentially by removing the sign and compressing draws as "don't cares") is part of the design of the format. I'll admit that there are legitimate reasons for choosing against this separation, but either way compression schemes can be found that are going to be a win on all practical hardware. (Full uncompressed tables in RAM is not practical.)
Compression schemes that rely on large block sizes are not very suitable for this imho.
It might be that my compression scheme is not great for DTM values, but I have never tried this. It seems to work pretty well for DTZ. In case of separate WDL and DTM/DTZ, the latter only being probed at the root, a compression scheme for DTM/DTZ based on large block sizes should be fine, obviously.
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
DTM vs WLD
DTM vs WDL
Consider the endgame KBNK WTM.
With a DTM tablebase, any probe at any depth gives an immediate, branch-ending result with an exact score, just as if the entire subtree had been searched up to 65 ply deep. That exact score can be backed up in the search and eventually be made visible to the user via mate announcement or a resignation.
With a WLD tablebase, a probe will only show "White wins" (or draws in a very few and obvious cases). Since nearly all of the score entries will be "White wins", it's no surprise that a high compression can be had. But a simple "White wins" score can't distinguish among wins of varying lengths, so it's necessary to have either some extra, specific code or a separate DTM tablebase. But why not use that separate DTM tablebase in the first place? If extra, specific code is needed, why not just have that code instead of any tablebase?
If tablebases are made, or at least made available, in uncompressed DTM format, then that means that a chess program author is free to use that data for building custom compressed versions or building WLD versions, or versions with both compression and WLD. But you can't go from WLD to DTM, and one compression scheme might be wholly inappropriate for differing endgames.
I have no interest with WLD. If someone wants to take my data to make a WLD tablebase, that doesn't bother me at all. The same if they want to use some compression scheme. But they're not forced into either; they have a choice and that's the point.
Consider the endgame KBNK WTM.
With a DTM tablebase, any probe at any depth gives an immediate, branch-ending result with an exact score, just as if the entire subtree had been searched up to 65 ply deep. That exact score can be backed up in the search and eventually be made visible to the user via mate announcement or a resignation.
With a WLD tablebase, a probe will only show "White wins" (or draws in a very few and obvious cases). Since nearly all of the score entries will be "White wins", it's no surprise that a high compression can be had. But a simple "White wins" score can't distinguish among wins of varying lengths, so it's necessary to have either some extra, specific code or a separate DTM tablebase. But why not use that separate DTM tablebase in the first place? If extra, specific code is needed, why not just have that code instead of any tablebase?
If tablebases are made, or at least made available, in uncompressed DTM format, then that means that a chess program author is free to use that data for building custom compressed versions or building WLD versions, or versions with both compression and WLD. But you can't go from WLD to DTM, and one compression scheme might be wholly inappropriate for differing endgames.
I have no interest with WLD. If someone wants to take my data to make a WLD tablebase, that doesn't bother me at all. The same if they want to use some compression scheme. But they're not forced into either; they have a choice and that's the point.
-
- Posts: 589
- Joined: Tue Jun 04, 2013 10:15 pm
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: DTM vs WLD
Re-read my last sentence in the above post.