Header format for polyglot books

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Header format for polyglot books

Post by Don »

Don wrote:
mcostalba wrote:
Michel wrote: Please feel free to comment.
How can you be sure that an opening position doesn't have zero key ? IOW what about hash collisions ?

P.S: In case someone plans to answer something along the lines of "the probabilty of this is very very small", "doens't happens in practice", etc please do me a favor and be so kind to post instead "collision avoidance is not guaranteed". Thanks in advance.

P.P.S: Giving the above, perhaps the "move" field should keep a not valid move value, for instance where from square equals destination square.
The probability is quite small but the point is that it could in fact produce an error. Of course the current polyglot format has the same issue with EVERY record, it's possible that a collision will occur and the move will also be valid. So what is so special about key zero in this respect?

The best solution of course is to make sure the move field is zero too or some other invalid move.
I did not fully understand the format, but after looking more closely I thing the proposal is a good one. I would suggest that all programs from now on consider "zero key" records to be invalid to avoid problems.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
User avatar
hgm
Posts: 28384
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Header format for polyglot books

Post by hgm »

One could argue that including code to test for move legality is a bad idea, because it only drives up the probability of malfunctioning. Because the chances that the extra code required to do the checking will cause a crash, because it is hit by a cosmic ray or some other natural disaster in the cases that there is no collision is so much larger than the probability that there will be a collission.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Header format for polyglot books

Post by Don »

hgm wrote:One could argue that including code to test for move legality is a bad idea, because it only drives up the probability of malfunctioning. Because the chances that the extra code required to do the checking will cause a crash, because it is hit by a cosmic ray or some other natural disaster in the cases that there is no collision is so much larger than the probability that there will be a collission.
Legal checking can be seen as roughly equivalent to adding "bits" of extra key. So it would be a benefit for all positions, not just the "zero record."

How many bits of extra key equivalent is move verification? I'll bet it's not much, perhaps 2 or 3 bits at most.

What is the probability of a collision in a 2 million position polyglot book?

Don
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
User avatar
hgm
Posts: 28384
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Header format for polyglot books

Post by hgm »

The probability that a probe for a position not in the book would turn up a false hit on a book position would be 2M/2^64 = 2^21/2^64 = 1/2^53 ~ 1e-16. Adding the header without guaranteeing the move field would be invalid would drive this up to 1.0000005e-16.
Michel
Posts: 2292
Joined: Mon Sep 29, 2008 1:50 am

Re: Header format for polyglot books

Post by Michel »

I refactored the utility for handling pgheaders and added some regression tests. The utility is now available here

http://hardy.uhasselt.be/Toga/pgheader.tar.gz

The api is contained in the source files pgheader.h and pgheader.c. pgheader.h provides the following functions

Code: Select all

int  pgheader_create  (char **header, const char *variants, const char *comment);
int  pgheader_read    (char **header, const char *infile);
int  pgheader_write   (const char *header, const char *infile, const char *outfile);
int  pgheader_delete  (const char *infile, const char *outfile);
void pgheader_error   (const char *prompt, int ret);
pgheader --help yields the following output

Code: Select all

pgheader add <options>
Add a header
Options:
-i, --in <file>             input file
-o, --out <file>            output file (default: book.bin)
-v, --variants <variants>   comma separated list of supported variants (default: normal)
-c, --comment <comment>     free format string, may contain newlines encoded as two char strings \n
                            (default: "")

pgheader show <options>
Print the header
Options:
-i, --in <file>             input file

pgheader delete <options>
Delete the header
Options:
-i, --in <file>             input file
-o, --out <file>            output file (default: book.bin)

pgheader test
Run regression tests
Michel
Posts: 2292
Joined: Mon Sep 29, 2008 1:50 am

Re: Header format for polyglot books

Post by Michel »

The gnuchess book is the first available book with the new header!

Code: Select all

http://hardy.uhasselt.be/Toga/gnuchess-release/gnuchess.bin

Code: Select all

$ file gnuchess.bin
gnuchess.bin: Polyglot chess opening book (version 1.0)

$ pgheader show -i gnuchess.bin
@PG@
1.0
1
normal
Created by GnuChess.
petero2
Posts: 725
Joined: Mon Apr 19, 2010 7:07 pm
Location: Sweden
Full name: Peter Osterlund

Re: Header format for polyglot books

Post by petero2 »

Michel wrote:
IIRC, the 0x0 magic maps to
[d]8/8/8/8/8/8/8/8 w - - 0 1
which would never happen in a FIDE-compliant game of chess but I have no idea about other variants.
Nice point! Luckily it will not cause a hash collision in other variants either since for non FIDE chess the keys are xor'ed with a variant key which is non-zero.
But the point is not valid, because the empty board is not the only position that hashes to 0. Another example is
[d]2b1k3/4p3/3p1p2/p2P2p1/P2P4/2P2PP1/4P3/2NQKB2 b - - 0 1
This is just nitpicking though. I don't believe this header format will cause problems in practice. If you are worried about hash collisions, you should not use the polyglot book format.
Michel
Posts: 2292
Joined: Mon Sep 29, 2008 1:50 am

Re: Header format for polyglot books

Post by Michel »

But the point is not valid, because the empty board is not the only position that hashes to 0. Another example is
[d]2b1k3/4p3/3p1p2/p2P2p1/P2P4/2P2PP1/4P3/2NQKB2 b - - 0 1

Wonderful!!! I was not aware that positions that hashed to 0x0 were explicitly known. And it is even a position that looks vaguely reasonable.
This is just nitpicking though. I don't believe this header format will cause problems in practice. If you are worried about hash collisions, you should not use the polyglot book format.
Exactly. But _if_ you are worried then you can just regard 0x0 as invalid for book lookup.
User avatar
hgm
Posts: 28384
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Header format for polyglot books

Post by hgm »

How did you get this position? Exhaustive search for 'reasonable positions'? (With in each file white-pieces, white pawns, black pawns, black pieces (all zero or more) in that order? Or did you confine pieces to the back rank (and Kings to the e-file)?)

I guess one could make a table of hash keys of all reasonable Pawn stuctures (or as may as fit in memory) and then exhaustively generate piece-only keys until you match one of the Pawn keys. If you can store 0.5G Pawn keys, you would on average have to generate 32G piece-only keys to get a collision, which sounds doable.
Last edited by hgm on Sun Sep 23, 2012 4:07 pm, edited 1 time in total.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Header format for polyglot books

Post by Don »

hgm wrote:How did you get this position? Exhaustive search for 'reasonable positions'? (With in each file white-pieces, white pawns, black pawns, black pieces (all zero or more) in that order?)
Maybe we can get Steven Edwards to run perft 14 next - and modify his software to report polyglot keys that are zero! If there are none, then we will at least know we can use this system up to 14 ply without breaking any software.

Don
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.