Here are the two files I produced:
SAN (32bits/move) 7ziped at 4.1mb: https://dl.dropbox.com/u/55295461/san.7z
8bits/move 7zipped at 2.47mb: https://dl.dropbox.com/u/55295461/gmv.7z
Extract the SAN zip file to see that the moves are represented as they are with about 32bits/move but yet the size is still good.
Note that the uncompressed san file is about 17mb which is much bigger than the uncompressed 8bits/move format which is at 4.4mb. And yet the results after compression are close.
Compression of chess databases
Moderators: hgm, Dann Corbit, Harvey Williamson
-
Daniel Shawul
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
-
phhnguyen
- Posts: 1431
- Joined: Wed Apr 21, 2010 4:58 am
- Location: Australia
- Full name: Nguyen Hong Pham
Re: Compression of chess databases
Thank you.
You are right about my bad compression rate. I have tested to use 1 byte / move then compress move-only file. I still have bigger numbers. Using zip from my Mac, I have 2.9MB. Using 7z from a Win, I got 2.7 MB.
Perhaps, it may be not really problem of compressor but move order of generator: yours may be simply better than mine (so compressor predicts easier). After generating, I sort moves in a simple order (Queen first, Rook second,... King last), just for being sure that even I change the generator later, the index system won't be hurt. That move order is not used for search engine. If you takes one from search engine, it may be the reason. Do you take one like that?
You are right about my bad compression rate. I have tested to use 1 byte / move then compress move-only file. I still have bigger numbers. Using zip from my Mac, I have 2.9MB. Using 7z from a Win, I got 2.7 MB.
Perhaps, it may be not really problem of compressor but move order of generator: yours may be simply better than mine (so compressor predicts easier). After generating, I sort moves in a simple order (Queen first, Rook second,... King last), just for being sure that even I change the generator later, the index system won't be hurt. That move order is not used for search engine. If you takes one from search engine, it may be the reason. Do you take one like that?
-
Daniel Shawul
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Compression of chess databases
You should get same numbers because now we are using the same database and 8bit format. If you are in doubt ,extract the files I gave you, then compress them again with 7z. You will get about 2.5mb with default setting, but if you use max compression by going in to command line you will squeeze in a few bytes to get to around 2.467mb. Same story with the SAN file. That reminds me the SAN file has reduced in size from 4.1mb to 3.5mb using nanozip but the gmov file did not decrease by much. Remember the san file started from an astounding 32bits/move on average.phhnguyen wrote:Thank you.
You are right about my bad compression rate. I have tested to use 1 byte / move then compress move-only file. I still have bigger numbers. Using zip from my Mac, I have 2.9MB. Using 7z from a Win, I got 2.7 MB.
My move generator is as random as yours, infact it is worse since I don't sort. I just generat captures then non-captures. The best test to eliminate any prediction would be to random sort the moves after generation but I have no time to do that...Perhaps, it may be not really problem of compressor but move order of generator: yours may be simply better than mine (so compressor predicts easier). After generating, I sort moves in a simple order (Queen first, Rook second,... King last), just for being sure that even I change the generator later, the index system won't be hurt. That move order is not used for search engine. If you takes one from search engine, it may be the reason. Do you take one like that?