Tablebase Compression

Discussion of chess software programming and technical issues.

Moderator: Ras

Michael Neish
Posts: 70
Joined: Wed Apr 05, 2006 9:22 am

Tablebase Compression

Post by Michael Neish »

Hi,

I'm sure that this has been mentioned before, but I can't remember the rebuttal (if there was one).

I don't really know how these databases are set up, so this may seem naive, but here goes ...

Couldn't tablebases be massively compressed by storing instead a reduced number of positions?

For example, if a position is known to be a win, but it takes 50 plies to play out, could a position be stored every ten plies, and the program be relied on to find its way from one to the next by means of normal search?

Opinions welcome. Thanks.

Regards,

Mike.
Dann Corbit
Posts: 12799
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Tablebase Compression

Post by Dann Corbit »

Michael Neish wrote:Hi,

I'm sure that this has been mentioned before, but I can't remember the rebuttal (if there was one).

I don't really know how these databases are set up, so this may seem naive, but here goes ...

Couldn't tablebases be massively compressed by storing instead a reduced number of positions?

For example, if a position is known to be a win, but it takes 50 plies to play out, could a position be stored every ten plies, and the program be relied on to find its way from one to the next by means of normal search?

Opinions welcome. Thanks.

Regards,

Mike.
The positions are already readuced a great deal.
First, the rotations and reflections are removed.
Also, positions that are not possible (like two adjoining kings) are eliminated.

So what you end up with is a small file compared to what you might thing would be needed.

After which, the tablebase files are typically compressed with a compression library.

Consider this file:
kqkb.gtb.cp4 which has a size of 935,406 bytes.

Try to come up with a scheme that has the exact answers for any win, loss, or draw between a king and queen verses a king and bishop that can possibly occur and make it fit into less than one million bytes.

By the way, the position is never stored in the tablebase file. The position is used to compute an address for lookup in the file, and the location of that address holds the answer.
Dirt
Posts: 2851
Joined: Wed Mar 08, 2006 10:01 pm
Location: Irvine, CA, USA

Re: Tablebase Compression

Post by Dirt »

Say you're searching, and near you're maximum depth you see a capture and get few enough pieces on the board to be in a tablebase. It's not good to have to look ten plies deeper to know whether it's a win or not.
jwes
Posts: 778
Joined: Sat Jul 01, 2006 7:11 am

Re: Tablebase Compression

Post by jwes »

Michael Neish wrote:Hi,

I'm sure that this has been mentioned before, but I can't remember the rebuttal (if there was one).

I don't really know how these databases are set up, so this may seem naive, but here goes ...

Couldn't tablebases be massively compressed by storing instead a reduced number of positions?

For example, if a position is known to be a win, but it takes 50 plies to play out, could a position be stored every ten plies, and the program be relied on to find its way from one to the next by means of normal search?

Opinions welcome. Thanks.

Regards,

Mike.
It could work, though it would be 4,8 or 16 plies. You would arithmatically shift the distance to mate or conversion right by the number of bits you wanted to save and add 1 to wins. It would be a compromise between bitbases and tablebases. Someone would have to try it to see how deep you would have to search to force mate from any tablebase position.
Edmund
Posts: 670
Joined: Mon Dec 03, 2007 3:01 pm
Location: Barcelona, Spain

Re: Tablebase Compression

Post by Edmund »

jwes wrote:
Michael Neish wrote:Hi,

I'm sure that this has been mentioned before, but I can't remember the rebuttal (if there was one).

I don't really know how these databases are set up, so this may seem naive, but here goes ...

Couldn't tablebases be massively compressed by storing instead a reduced number of positions?

For example, if a position is known to be a win, but it takes 50 plies to play out, could a position be stored every ten plies, and the program be relied on to find its way from one to the next by means of normal search?

Opinions welcome. Thanks.

Regards,

Mike.
It could work, though it would be 4,8 or 16 plies. You would arithmatically shift the distance to mate or conversion right by the number of bits you wanted to save and add 1 to wins. It would be a compromise between bitbases and tablebases. Someone would have to try it to see how deep you would have to search to force mate from any tablebase position.
We might be able to combine this trick with bitbases. ie. use bitbases for decisions in the searchtree and use this compressed DTZ database at root level. In combination with the bitbase this reduced Tablebase may even be further compressed.
LiquidNitrogenOverclocker

Re: Tablebase Compression

Post by LiquidNitrogenOverclocker »

Michael Neish wrote:Couldn't tablebases be massively compressed by storing instead a reduced number of positions?
What you are talking about is commonly referred to as "run length encoding" and, by and large, this is used most often in the world of checkers programming (not for the specific implementation you mentioned, but it works on the same principle).

The problem would be more difficult to implement for Distance To Conversion rather than Distance To Win. Recall DTC will only count moves away from either a pawn promotion or a piece capture. So, for any position, it knows the shortest path to take to convert to another database. At the time the "move is made", what is not known, is WHICH type of conversion it is heading for, unless it is THE converting move itself.

A player can be winning, and the DTC could be odd or even.

In Distance To Win databases, the winner always wins in an odd number of plies, the loser always loses in an even number of plies.

This is very helpful for compression for two reasons.

1) You don't need to attach "win or lose" information to the data, it is in the odd/even number that is stored at the index.

2) If a database is dominated by one value (maybe most of the Q vs. R positions where the side with the Q is to move) then it is very possible to get extremely long runs of "true distance rounded to the nearest 10" as you suggested.

Instead of seeing data like:

47,45,45,45,43,49,33,35,21,13,41,27 you could compress it to

6:49, 2:39, 21, 13, 31, 27 which would preserve the same info.

I think Ed Gilbert did implement this "rounding" function for his 10-piece Distance To Conversion databases in the game of checkers (he computed 8.5 trillion endgame positions over the course of about 4.2 computer years).