Not really the same principle, but from the compression point of view it might indeed be the same.Rein Halbersma wrote:The Chinook project also pioneered partial information databases (at least a draw, at most a draw), that also have this type of values. There is a paper on their website about this. IIRC, compression suffers a bit, but it's the same principle.syzygy wrote:I might have a look into this. At some point I was considering adding a run-length encoding option, but my first experiments weren't very promising.
A complication is that my WDL tables can have up to 5 values (because I distinguish between real wins/losses and wins/losses that are drawn by the 50-move rule). But of course not all tables are affected by the 50-move rule.
I just encode symbols into Huffman codes until the 64-byte block runs out of space.Maybe I should read your code in more detail, but can you explain how you use Huffman coding to get a fixed-length 64 byte code block? I thought that Huffman is a form of fixed-to-variable encoding (each fixed-sized input character to a variable-sized bitstring)?
I do use some additional tweaks. For example, the symbol that does not fit usually corresponds to the concatenation of a pair of other symbols. If the Huffman code for the first symbol of that pair does fit, and the Huffman code for the second symbol of that pair is shorter than the Huffman code of the original symbol, then I add the (Huffman code of the) first symbol of the pair to the block that I am filling and start the next block with the (Huffman code of the) second symbol. Another tweak is checking whether the symbol that does not fit anymore is the first symbol of a pair whose symbol does fit (which sort of adds random data to the end of the block, but that does not hurt). These tweaks only marginally improve compression, but the improvement is for free so I take it.
But you're right that this does give some truncation overhead. I think it's something like 5 bits per block on average or so.