Looking for a specification for Syzygy tablebases

clayt · Post by **clayt** » Tue Oct 11, 2022 11:50 pm

I'm currently in the process of implementing a Syzygy tablebase prober for my engine. I know of many implementations of probers, but haven't seen any documentation that fully describes how a tablebase is formatted and what the "correct" way to work with one is. Accordingly, I'm looking for some kind of specification on the binary format for Syzygy tablebases. Do you guys know of one?

expositor · Post by **expositor** » Wed Oct 12, 2022 5:41 am

As far as I'm aware, Ronald de Man's implementation is the specification (if a specification can be said to exist at all). I'd love to be proven wrong, however! since I'm planning to write my own implementation soon, as well.

lithander · Post by **lithander** » Wed Oct 12, 2022 1:42 pm

I've also very recently entered the rabbithole of trying to implement tablebase probing in Leorik without using a 3rd party library (because there's none in C# afaik) and I found https://chess.stackexchange.com/questio ... nformation to provide a good overview but in the end I'm just stepping through the probing code in C and try to understand what it does and port a subset of it to C#.

I don't use Ronald de Man's implementation as my only guide though but also look at Fathom.

Michel · Post by **Michel** » Wed Oct 12, 2022 5:15 pm

clayt wrote: ↑Tue Oct 11, 2022 11:50 pm I'm currently in the process of implementing a Syzygy tablebase prober for my engine. I know of many implementations of probers, but haven't seen any documentation that fully describes how a tablebase is formatted and what the "correct" way to work with one is. Accordingly, I'm looking for some kind of specification on the binary format for Syzygy tablebases. Do you guys know of one?

I was planning to write a specification some time ago but I gave up. It's not a trivial task.

syzygy · Post by **syzygy** » Sat Oct 15, 2022 10:55 pm

Let's see. Let's limit to WDL for now.

A pawnless tablebase file typically has 2 tables: wtm, btm.
A pawnful tablebase file typically has 8 tables: wtm,bt,m with leading pawn on a-d (so: wtm-a, btm-a, wtm-b, btm-b, wtm-c, btm-c, wtm-d, wtm-d, in that order).

Symmetric tables like KQvKQ and KPvKP only have 1 or 4 tables (wtm).

-------------

The main header of the tablebases file:
bytes 0-3: magic number
byte 4:
bit 0 is set for a non-symmetric table, i.e. separate wtm and btm.
bit 1 is set for a pawnful table.
bits 4-7: number of pieces N (N=5 for KRPvKR)

Then, for a pawnless tablebase file, N+1 bytes.
The lower nibbles of these bytes relate to the wtm table, the higher to the btm table.
The nibbles in the first byte are important for mapping positions to an index value, but I won't get into this detail now.
The nibbles in the next N bytes list the piece types in a particular order. (P=1,N=2,B=3,R=4,Q=5,K=6 for white, add 8 for black.)

For a pawnful tables, there are 4x (N+1) (1 pawn) or 4x (N+2) bytes (>= 2 pawns).
The first 1 or 2 bytes I won't try to explain now, then N bytes to list the pieces types in a particular order.

Example: KQvKR

Code: Select all

00000000  71 e8 23 5d 41 11 56 ec  65 ce 00 06 13 00 4e 00

So a magic number 0x5d23e871 (for WDL tables).
Byte 4 = 0x41 means 3 pieces, non-symmetric.
Byte 5 = 0x11, so values 1 and 1 for wtm and btm to be used in index calculations.
Then 56, ec, 65, ce, meaning order wK, bR, wQ, bK for wtm, and wQ, bK, wK, bR for btm.

Next, there may be a padding byte to align the position within the tablebase file to a multiple of 2 bytes.

-------------

For each table:

8 bytes with information about the index structure into the table.
byte 0: flags (*)
byte 1: blocksize
byte 2: idxbits
byte 3: num_blocks - real_num_blocks
bytes 4-7: real_num_blocks.

Then a specification of the canonical Huffman code used for the table (to encode RePair symbols):
byte 0: max_len
byte 1: min_len
Followed by (max_len - min_len + 1) u16 values.

Then information specifying the RePair code book.
bytes 0-1: number of symbols (max 4094 I think, maybe 4095 is allowed).
For each symbol 3 bytes.
Most symbols pair two other symbols. The 3 bytes = 2x12 bits specify these two other symbols.
The terminal symbols have the encode a value from 0-4095 and have the other 12 bits set to 0xfff.

Next there may be a padding byte to align to a multiple of 2.

(*) if bit 7 of "flags" is set, then the table is "constant", i.e. all positions have the same value, which is written in byte 1, and the rest is skipped, also for the next parts.

-------------

A data structure that is used to quickly go from the "index" of a TB position to the relevant compressed data block that contains the value for that position.

Now, for each table:
num_indices entries of 6 bytes, where

Code: Select all

size_t num_indices = (tb_size + (1ULL << idxbits) - 1) >> idxbits;

where tb_size is the number to the number of positions in the table (so the size of the range of index values).
(Here idxbits is a per-table value, and tb_size will depend on file of the leading pawn if >=2 pawns.)

And again for each table:
num_blocks entries of 2-byte values.

Then padding bytes to align to a multiple of 64.

-------------

The actual compressed table data.
For each table:
real_num_blocks blocks of (1 << blocksize) bytes.
Note again that real_num_blocks and blocksize are per-table values.
Each table is padded to a multiple of 64 bytes.

For WDL tables, blocksize is usually 6 (-> 64 bytes). Tables which compress very well have blocksize 5 (-> 32 bytes).

-------------

16-byte checksum (not MD5 but calculated with cityhash. See calc_checksum() in checksum.c.
This checksum is not checked by the probing code but is intended for checking the integrity of the file, e.g. after download.

clayt · Post by **clayt** » Sat Oct 15, 2022 11:16 pm

Thank you! This is a big help.

syzygy · Post by **syzygy** » Sun Oct 16, 2022 3:27 am

syzygy wrote: ↑Sat Oct 15, 2022 10:55 pmNow, for each table:
num_indices entries of 6 bytes, where
Code: Select all
size_t num_indices = (tb_size + (1ULL << idxbits) - 1) >> idxbits;
where tb_size is the number to the number of positions in the table (so the size of the range of index values).
(Here idxbits is a per-table value, and tb_size will depend on file of the leading pawn if >=2 pawns.)

And again for each table:
num_blocks entries of 2-byte values.

Each 2-byte value corresponds to the number of positions stored in a compressed block minus 1. (The maximum number of positions in a (32- or 64-byte) block is therefore 65536.)

So given the index value of a TB position, you can find the block that stores the position by subtracting the number of positions in block 0, 1, 2, 3, ...., and stop just before you go negative. This block is then decompressed and the relevant value is extracted.

To avoid always having to loop over the num_blocks entries starting from block 0, there are the num_indices entries with 6 bytes.

Basically the whole index range 0...(tb_size-1) is split into num_indices sections of size 2^idxbits. (The last section will normally be incomplete.)
Each of the 6-byte entries consists of a 4-byte value and a 2-byte value (in that order).
The 4-byte value is the number of the compressed block which contains the value exactly in the middle of the section, i.e. position k * 2^idxbits + 2^(idxbits-1) for the kth 6-byte entry.
The 2-byte value is is the offset of the same position within its compressed block.

So the probing code works as follows:
- given a position, an index value idx is calcuated.
- k = idx / 2^idxbits now points to the relevant 6-byte entry.
- the 6-byte entry gives the compressed block and the offiset into this block for the kth "middle" value.
- idx -= idx - (k * 2^idxbits + 2^(idxbits-1)) is our position relative to this kth "middle" value.
- now subtract or add block sizes (counted in # of positions in a compressed block) until we have the right compressed block for the position being probed and the offset within that block.

Code: Select all

  if (litIdx < 0)
    while (litIdx < 0)
      litIdx += d->sizeTable[--block] + 1;
  else
    while (litIdx > d->sizeTable[block])
      litIdx -= d->sizeTable[block++] + 1;

And then decompress that block until we have the right value.

idxbits is chosen such that the 2^idxbits corresponds to about 16 x the average block size (in # positions). If a range corresponds to 16 blocks, a probe needs max 8 of these substractions or additions, so 4 on average. (But number of posiitons in a block can vary quite a lot from block to block.)

real_num_blocks is the actual number of compressed blocks.
Since the index structure essentially assumes that the "middle" of the final 2^idxbits section corresponds to an existing index value, the block sizes are padded with values of 65535 (corresponding to 65536 positions) to make sure that this middle position is covered. These padded values add a few blocks to real_num_blocks to get num_blocks.

Looking for a specification for Syzygy tablebases

Looking for a specification for Syzygy tablebases

Re: Looking for a specification for Syzygy tablebases

Re: Looking for a specification for Syzygy tablebases

Re: Looking for a specification for Syzygy tablebases

Re: Looking for a specification for Syzygy tablebases

Re: Looking for a specification for Syzygy tablebases

Re: Looking for a specification for Syzygy tablebases