Creating Books from .PGN files

Dave_N · Post by **Dave_N** » Tue Dec 20, 2011 3:48 pm

I had a look at the Polyglot format and found that it doesn't give a win/loss/draw percentage and instead gives a weight, so I decided to have another look at creating books. My first thought was to modify the polyglot book creation to store white wins in the "weight" variable and draws or black wins in the "learn" variable. I decided to ignore polyglot and look at making my own format from my own pgn tree code...

At the moment I have written a code that will merge games into 1 game with variations, however I cannot detect transpositions. So for example if 2 games take a Ruy Lopez line with the Marshall attack and one line has black playing b5 after a6 and the other has Nf6 first (as in the mainline) then I don't have the games rejoining after the positions become equal, and instead the merged game is a variation of the first.

Obviously I could not compare FEN in a realistic time to detect the transposition so I would need some kind of hash table (probably) unless I create a FEN tree ... I am sure this has been tried before but is it practical?

hgm · Post by **hgm** » Tue Dec 20, 2011 5:56 pm

Polyglot format is what you could call a 'cooked' book, which does as much as possible of the procesing in advance. In the end the only thing that is relevant (and observable) is the probability with which everymove is played. In some ('raw') formats, these probabilities are calculated in the probing code, from the basic statistics, but Polylot books store those probabilities directly, as the weights. Polyglot's book building function sets the weights as the number of hlf-points scored with the move, which is a dubious method, which, IMO, could be much improved on.

Dave_N · Post by **Dave_N** » Tue Dec 20, 2011 6:08 pm

Yeah I read that the weight is (wins+losses)/2 and the improvement I considered would be at least to score more highly for the win percentage, I like to use win/draw as separate percentages, and if the learn variable isn't used I would try to overwrite this.

(I realized the FEN tree would be ridiculous).

I also googled about mathematical techniques for computing hash keys and so far I don't have a solution to computing the linear independence without brute force ...

I don't know if any book formats use a pgn tree to compute the openings but I am sure its a huge amount of time for larger books.

hgm · Post by **hgm** » Tue Dec 20, 2011 6:53 pm

No, weight = wins + draws/2. But it makes no sense for me to prefer a move that scored 2 out of 100 over one that cored 1 out of 2 by 2 to 1. It is true that the playing frequency is more significant than the percentage (if the games are taken from sufficiently strong players), but not infinitely more significant. At someoint the algorithm must be able to decide a move that always loses is no good...

kbhearn · Post by **kbhearn** » Wed Dec 21, 2011 6:34 am

You can eliminate false positive hash hits in a book by simply storing the full position (4 bits per square = 32 bytes, plus either a byte for flags or encode them into the squares - only 4-5 64 bit compares to validate a position). You'd still use a hash key of some sort to spread out the positions evenly amongst the buckets (you don't want to have to check every position for a match), just wouldn't have to worry about the odd pair of positions that cause a false positive :)

hgm · Post by **hgm** » Wed Dec 21, 2011 8:56 am

With 64-bit key, like Polyglot books use, worrying about collissions seems a bit paranoic. Even with 4 billion positions in the book it would be unlikely there was a single collission. Doubling the size of the book by using stronger keys would probably hurt more than accepting you play an illegal opening move once every quadrillion games...

Polyglot (and XBoard) probing code don't even use hashing, but a binary search. (The moves are sorted by key in the book.)

Michel · Post by **Michel** » Wed Dec 21, 2011 9:10 am

Polyglot (and XBoard) probing code don't even use hashing, but a binary search. (The moves are sorted by key in the book.)

For really large books the binary search should probably be replaced by interpolation
search. That would dramatically reduce the number of seeks necessary as interpolation search is O(log log N) and binary search is O(log N).

I always wanted to implement this but never got around to it. Part of the reason is that there are currently some "limitations" in Polyglot which prevent you from making large books. That should be fixed first.

Dave_N · Post by **Dave_N** » Wed Dec 21, 2011 10:40 am

Perhaps detecting collisions during book creation and having a resolution code ... I don't understand how the book is binary searched, and the interpolation isn't something I can implement, however I am going to attempt some modifications - if I make any progress I'll post some source

Dave_N · Post by **Dave_N** » Wed Dec 21, 2011 11:10 am

well I can get this version
http://alpha.uhasselt.be/Research/Algeb ... t-release/
(the first tar.gz)
to compile, I can't see the problem with book size at first read through book_make.cpp

hgm · Post by **hgm** » Wed Dec 21, 2011 11:14 am

Binary search is simply bisection: you starting the middle, to see if the key there is smaller or larger than the one you are searching, and then you know if it should be in the first or second half of the book, and you look in the middle of that half, etc., untill the section the searched entry must be in has <= 1 entry in it.

Creating Books from .PGN files

Creating Books from .PGN files

Re: Creating Books from .PGN files

Re: Creating Books from .PGN files

Re: Creating Books from .PGN files

Re: Creating Books from .PGN files

Re: Creating Books from .PGN files

Re: Creating Books from .PGN files

Re: Creating Books from .PGN files

Re: Creating Books from .PGN files

Re: Creating Books from .PGN files