A different tablebase encoding format

sje · Post by **sje** » Tue Aug 12, 2014 9:27 pm

Because there are so many unknowns about the target environment, the only reasonable approach is to deliver the tablebase files in uncompressed format only (or a gzip of each entire file) and let the end user decide which, if any, compression schema is appropriate.

syzygy · Post by **syzygy** » Fri Aug 15, 2014 2:31 am

sje wrote:
syzygy wrote:There is no single solution that is optimal for all hardware combinations, but using no compression at all seems to be about the worst one could do.
I disagree. If the request frequency is high enough, at some point no compression will beat any compression with respect to total time usage.

I don't see at all why request frequency would be important.

Only if everything fits in RAM will no compression outperform compression. But then why not use compression and go from n-piece tablebases to (n+1)-piece tablebases.

sje · Post by **sje** » Fri Aug 15, 2014 8:16 am

So much depends on storage media, access time, access latency, demand frequency, and other factors that only empirical data from real world implementations can provide answers. And those answers will be as different as are the host environments.

My experience with Symbolic's earlier incarnation running on the chess servers and with 60 GB of uncompressed tablebase files is that there is simply no time available for decompression calculations given a high access frequency.

A big latency caused by decompress computation hurts. A bigger hurt from decompression is that potentially many blocks have to be transferred and decompressed to fetch a single score value.

A five man endgame tablebase is a five dimensional object and any linear map means that most accesses will not be physically close in the file's data. So no matter how the indexing is done, a slew of decompressed blocks may never be referenced beyond the very first time it's seen, and that just for a single score value. But the long wad of uncompressed data will still eat RAM until automatically purged by Unix or intentionally released by the program.

syzygy · Post by **syzygy** » Fri Aug 15, 2014 3:35 pm

What might be interesting to know is that practically nothing that you write applies to modern tablebase compression schemes like mine.

State-of-the-art tablebase compression versus no compression is not a trade-off but essentially a no brainer. Obviously there is no single compression scheme that is optimal for any hardware combination, but it is very easy to do better than no compression at all on any hardware (unless the uncompressed tables fully fit in RAM, etc.).

Just an arbitrary data point: final 10-piece position in game 9 of the last TCEC final. Over 250K hits/s for Stockfish (30966K hits in 2m01), over 1M hits/s for Komodo (11567K in 11s), no noticeable slowdown in nps for either.

http://tcec.chessdom.com/archive.php?se=6&sf&ga=9

ZirconiumX · Post by **ZirconiumX** » Fri Aug 15, 2014 4:33 pm

I think you're both comparing apples and oranges.

Steven seems to be on about DTM, which is very slow to probe, thus compression hurts.

Ronald uses DTZ50 at root, which is comparatively not time critical, and WDL during search, which fits in memory, and compression gives you a higher hit rate as more entries fit in memory, thus negating the decompression cost by reducing chance of having to read from disk.

DTM is not well known for its compression abilities, so modern DTM tablebases use heavy compression (Gaviota uses LZMA).

Matthew:out

mvk · Post by **mvk** » Sat Aug 16, 2014 9:11 am

ZirconiumX wrote:I think you're both comparing apples and oranges.

Steven seems to be on about DTM, which is very slow to probe, thus compression hurts.

A modern scheme separates the WDL fraction from the bulk data. The probing speed of the bulk data is largely irrelevant for search. This is true for any metric.

syzygy · Post by **syzygy** » Sun Aug 17, 2014 4:05 am

ZirconiumX wrote:I think you're both comparing apples and oranges.

Steven seems to be on about DTM, which is very slow to probe, thus compression hurts.

Ronald uses DTZ50 at root, which is comparatively not time critical, and WDL during search, which fits in memory, and compression gives you a higher hit rate as more entries fit in memory, thus negating the decompression cost by reducing chance of having to read from disk.

DTM is not well known for its compression abilities, so modern DTM tablebases use heavy compression (Gaviota uses LZMA).

We're comparing comparable things: suitable formats for (high-frequency) probing during the search. (Anyway, your point is still that DTM needs a compression scheme, which Steven sort of denies.)

The choice to place WDL data in separate files (and removing this data from the DTZ or DTM data essentially by removing the sign and compressing draws as "don't cares") is part of the design of the format. I'll admit that there are legitimate reasons for choosing against this separation, but either way compression schemes can be found that are going to be a win on all practical hardware. (Full uncompressed tables in RAM is not practical.)

Compression schemes that rely on large block sizes are not very suitable for this imho.

It might be that my compression scheme is not great for DTM values, but I have never tried this. It seems to work pretty well for DTZ. In case of separate WDL and DTM/DTZ, the latter only being probed at the root, a compression scheme for DTM/DTZ based on large block sizes should be fine, obviously.

sje · Post by **sje** » Sun Aug 17, 2014 5:17 am

DTM vs WDL

Consider the endgame KBNK WTM.

With a DTM tablebase, any probe at any depth gives an immediate, branch-ending result with an exact score, just as if the entire subtree had been searched up to 65 ply deep. That exact score can be backed up in the search and eventually be made visible to the user via mate announcement or a resignation.

With a WLD tablebase, a probe will only show "White wins" (or draws in a very few and obvious cases). Since nearly all of the score entries will be "White wins", it's no surprise that a high compression can be had. But a simple "White wins" score can't distinguish among wins of varying lengths, so it's necessary to have either some extra, specific code or a separate DTM tablebase. But why not use that separate DTM tablebase in the first place? If extra, specific code is needed, why not just have that code instead of any tablebase?

If tablebases are made, or at least made available, in uncompressed DTM format, then that means that a chess program author is free to use that data for building custom compressed versions or building WLD versions, or versions with both compression and WLD. But you can't go from WLD to DTM, and one compression scheme might be wholly inappropriate for differing endgames.

I have no interest with WLD. If someone wants to take my data to make a WLD tablebase, that doesn't bother me at all. The same if they want to use some compression scheme. But they're not forced into either; they have a choice and that's the point.

mvk · Post by **mvk** » Sun Aug 17, 2014 3:36 pm

You really should attempt to understand Ronald's work first.

sje · Post by **sje** » Sun Aug 17, 2014 5:07 pm

Re-read my last sentence in the above post.

A different tablebase encoding format

Re: A different tablebase encoding format

Re: A different tablebase encoding format

Re: A different tablebase encoding format

Re: A different tablebase encoding format

Re: A different tablebase encoding format

Re: A different tablebase encoding format

Re: A different tablebase encoding format

DTM vs WLD

Re: DTM vs WLD

Re: DTM vs WLD