Here is my experiment on a PGN of 50861 games (from Olivier Deville's collection).rvida wrote: I think 50% reduction can be easily achieved using a domain specific predictor (i.e. a chess engine). Right now I am running some tests on a sample PGN database and the prediction rates are very compression friednly. Looks like 3-3.5 bits per move is easily achievable while still maintaining the ability to do random accesses with game granularity.
More details later...
I used Critter 1.6a as a predictor doing 1 ply full width search + qsearch. Prediction is quite good even though these are human games.
This is the resulting symbol frequency distribution:
Code: Select all
# 1: 35.90540% (1587274)
# 2: 16.77592% (741615)
# 3: 10.12348% (447530)
# 4: 6.53990% (289110)
# 5: 4.82253% (213190)
# 6: 3.69800% (163478)
# 7: 3.16578% (139950)
# 8: 2.64557% (116953)
# 9: 2.23507% (98806)
#10: 1.83142% (80962)
#11: 1.56484% (69177)
#12: 1.64580% (72756)
#13: 1.20171% (53124)
#14: 1.07487% (47517)
#15: 0.92266% (40788)
#16: 0.82613% (36521)
#17: 0.82412% (36432)
#18: 0.75637% (33437)
#19: 0.69821% (30866)
#20: 0.62128% (27465)
#21: 0.40441% (17878)
#22: 0.38331% (16945)
#23: 0.30782% (13608)
#24: 0.23928% (10578)
#25: 0.19614% (8671)
#26: 0.17318% (7656)
#27: 0.10458% (4623)
#28: 0.08406% (3716)
#29: 0.05757% (2545)
#30: 0.04440% (1963)
#31: 0.03561% (1574)
#32: 0.02531% (1119)
rest: 0.06524% (2884)
total games: 50861
total halfmoves: 4420711
halfmoves/game avg = 86.92 min = 39 max = 391