sampling a polyglot book ?

Discussion of chess software programming and technical issues.

Moderator: Ras

lucasart
Posts: 3243
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

sampling a polyglot book ?

Post by lucasart »

Is there a tool out there to generate an EPD file by sampling from a Polyglot book ?

For example, I want to play up to 8 moves from the book, and dump the resulting FEN into the EPD file, and so on. Of course you would specify the number of sample, so as to obtain an EPD with as many lines as desired.

This will be useful for my CLI tournament program, which will use EPD only for the moment.

Also the added advantage, is that you can really see what these positions are, and run an automated sanity check, like analyze each for 1sec with a top level engine, to make sure the score is close enough to zero, so that the position isn't biaised for white or black.

As we all know, the best test suite is a given set of positions that does not change. Selecting randomly from a book increases the variance of the estimator for two reasons: 1/ another source of randomness whose effect on the result cannot by definition be accounted for by any tool (bayeselo or other) 2/ crappy book lines introduced by using automated book generation method without testing thouroughlly all positions (which is hardly possible when there are millions).
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
Michel
Posts: 2292
Joined: Mon Sep 29, 2008 1:50 am

Re: sampling a polyglot book ?

Post by Michel »

The polyglot book entries are hash codes. So they cannot immediately be converted to a FEN.

However polyglot includes a utility which can dump the lines from the book where a line for white is defined as

<white book><black arbitrary><white book><black arbitrary>....

and a line for black is defined as

<white arbitrary><black book><white arbitrary><black book><white arbitrary>....

Code: Select all

       polyglot dump-book

       PolyGlot supports the following options

       -bin (default: book.bin)
           Input file in PolyGlot book format.

       -color
           The color for whom to generate the lines.

       -out (default: book_<color>.txt)
           The name of the output file.
Here are the first few lines of performance.bin for white

Dump of "/home/vdbergh/SRC/CHESS/Toga142JD_linux_version/performance.bin" for white.

Code: Select all

1: 1. e4{33%} a6 2. d4{100%} b5 3. Nf3{64%} e6 4. Bd3{100%} Bb7 5. O-O{75%} c5 6. c3{100%} Nf6 7. Re1{100%}
2: 1. e4{33%} a6 2. d4{100%} b5 3. Nf3{64%} e6 4. Bd3{100%} Bb7 5. Qe2{25%}
3: 1. e4{33%} a6 2. d4{100%} b5 3. Nf3{64%} Bb7 4. Bd3{100%} e6 {trans: line=1, ply=8}
4: 1. e4{33%} a6 2. d4{100%} b5 3. Bd3{36%} Bb7 4. Nf3{100%} {trans: line=3, ply=7}
5: 1. e4{33%} a6 2. d4{100%} e6 3. Nf3{56%} b5 {trans: line=1, ply=6}
6: 1. e4{33%} a6 2. d4{100%} e6 3. Nf3{56%} c5 4. c3{100%} d5 5. e5{100%} Bd7 6. Bd3{100%} cxd4 7. Nxd4{100%} Nc6 8. Nxc6{100%}
7: 1. e4{33%} a6 2. d4{100%} e6 3. Bd3{44%}
8: 1. e4{33%} b6 2. d4{100%} e6 3. Nf3{57%} Bb7 4. Bd3{100%} c5 5. c3{87%} Nf6 6. Qe2{61%} cxd4 7. cxd4{100%}
9: 1. e4{33%} b6 2. d4{100%} e6 3. Nf3{57%} Bb7 4. Bd3{100%} c5 5. c3{87%} Nf6 6. Qe2{61%} d5 7. e5{100%}
lucasart
Posts: 3243
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: sampling a polyglot book ?

Post by lucasart »

Thanks. So there's no tool that directly does it then.

On second thought, I'll probably add Polyglot book support to my CLI. I googled around and it seems hard to find books in EPD format, so I'll have to do my own programmatically, so I might as well do full Polyglot support.

I had a look at the code from Stockfish (book.h and book.bin), and it seems that adapting it to my program would be easy (just use my own Board class instead of SF's Position class, and I can almost copy/paste thre rest). As my program is GPL, there would be licensing issues in doing that. And of course, I'll add a big thanks to Marco Costalba in the credits section.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
User avatar
Evert
Posts: 2929
Joined: Sat Jan 22, 2011 12:42 am
Location: NL

Re: sampling a polyglot book ?

Post by Evert »

I think I based Jazz' polybook code on the code in polyglot itself. It's not very hard; my main annoyance with it is the second set of hash codes that are needed. There's an easy solution for that, of course, but I'm irrationally attached to my own hash codes.

It's actually not very hard to create a set of starting positions from a polyglot book: just walk each variation encountered in the opening book and spit out the FEN at the leaf node. You still need the code to read the book, obviously.

I still prefer starting from an EPD set though. I just wish I had some endgame specific ones, for testing end-game evaluation. That's obviously a lot harder than opening positions because if the positions are balanced, they're probably drawn...
User avatar
hgm
Posts: 28464
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: sampling a polyglot book ?

Post by hgm »

You could use XBoard for this. Just install a program that does nothing as a a WB v1 engine (doesn't matter what program, as long as it is not an engine), set a TC of 1 sec (0:01 min), let it use GUI book, set the option to save the final position of a game on file (in Options -> Save ), and run a match with as many games as you need.

That should give you your sampling of the terminal book positions, as the first 'engine' to be out of book will forfeit on time after 1 sec.
Michel
Posts: 2292
Joined: Mon Sep 29, 2008 1:50 am

Re: sampling a polyglot book ?

Post by Michel »

In a typical polyglot book the majority of positions is not directly connected to the root (for example in a single color book _no_ position is connected to the root). So if you only follow the lines completely in the book you miss a lot of positions.
User avatar
hgm
Posts: 28464
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: sampling a polyglot book ?

Post by hgm »

True. But in that case, what would be meant by a representative sample from the book?

Of course the method I sketched can also be used for one-color books, when you let the side for which the book contains no moves be played b a real engine. The opponent, not being an engine, then still forfeits as soon as he gets out of book. (To speed things up you could write an engine that resigns immediately in resonse to 'go'.)
Michel
Posts: 2292
Joined: Mon Sep 29, 2008 1:50 am

Re: sampling a polyglot book ?

Post by Michel »

But in that case, what would be meant by a representative sample from the book?
What I defined as a "line" in my post above (bookmoves for one player, arbitrary moves for the other player, end=bookmove) is the right thing I think for books which are meant to be used as repertoires (like performance.bin). Using a book as repertoire means you are not assuming the opponent is using the same book (which will be the case if you are playing on a server for example).

Of course for a tournament book the criteria are different. Perhaps repertoire books should not be used in tournaments since they could conceivably skew the results (except in book tournaments of course).
User avatar
hgm
Posts: 28464
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: sampling a polyglot book ?

Post by hgm »

OK, this is one definition, but probably not what the OP had in mind. If you want to make a sampling for the purpose of using it as start positions for a tourney, you don't want the side that has no book moves to play as a random mover. He must do reasonable moves, or most positions you get would already be decided in favor of the book side.

Best would actually be to use an engine, or a group of engines that do have an own book, and let the dummy play with the Polyglot book as GUI book.
Michel
Posts: 2292
Joined: Mon Sep 29, 2008 1:50 am

Re: sampling a polyglot book ?

Post by Michel »

you don't want the side that has no book moves to play as a random mover
Not as a random mover since he is only allowed to move to positions in the book (if there is no such position the line ends right there). This is how human opening repertoires are written.

But in retrospect while I think this is the right definition for a "line" I agree with you this is not a good way of sampling positions from the book since the opponent would still be allowed to make fairly bad moves.