Compression of chess databases

Discussion of chess software programming and technical issues.

Moderators: hgm, Dann Corbit, Harvey Williamson

Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Compression of chess databases

Post by Daniel Shawul »

Here are the two files I produced:
SAN (32bits/move) 7ziped at 4.1mb: https://dl.dropbox.com/u/55295461/san.7z
8bits/move 7zipped at 2.47mb: https://dl.dropbox.com/u/55295461/gmv.7z
Extract the SAN zip file to see that the moves are represented as they are with about 32bits/move but yet the size is still good.
Note that the uncompressed san file is about 17mb which is much bigger than the uncompressed 8bits/move format which is at 4.4mb. And yet the results after compression are close.
User avatar
phhnguyen
Posts: 1431
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Compression of chess databases

Post by phhnguyen »

Thank you.

You are right about my bad compression rate. I have tested to use 1 byte / move then compress move-only file. I still have bigger numbers. Using zip from my Mac, I have 2.9MB. Using 7z from a Win, I got 2.7 MB.

Perhaps, it may be not really problem of compressor but move order of generator: yours may be simply better than mine (so compressor predicts easier). After generating, I sort moves in a simple order (Queen first, Rook second,... King last), just for being sure that even I change the generator later, the index system won't be hurt. That move order is not used for search engine. If you takes one from search engine, it may be the reason. Do you take one like that?
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Compression of chess databases

Post by Daniel Shawul »

phhnguyen wrote:Thank you.

You are right about my bad compression rate. I have tested to use 1 byte / move then compress move-only file. I still have bigger numbers. Using zip from my Mac, I have 2.9MB. Using 7z from a Win, I got 2.7 MB.
You should get same numbers because now we are using the same database and 8bit format. If you are in doubt ,extract the files I gave you, then compress them again with 7z. You will get about 2.5mb with default setting, but if you use max compression by going in to command line you will squeeze in a few bytes to get to around 2.467mb. Same story with the SAN file. That reminds me the SAN file has reduced in size from 4.1mb to 3.5mb using nanozip but the gmov file did not decrease by much. Remember the san file started from an astounding 32bits/move on average.
Perhaps, it may be not really problem of compressor but move order of generator: yours may be simply better than mine (so compressor predicts easier). After generating, I sort moves in a simple order (Queen first, Rook second,... King last), just for being sure that even I change the generator later, the index system won't be hurt. That move order is not used for search engine. If you takes one from search engine, it may be the reason. Do you take one like that?
My move generator is as random as yours, infact it is worse since I don't sort. I just generat captures then non-captures. The best test to eliminate any prediction would be to random sort the moves after generation but I have no time to do that...