Creating Books from .PGN files

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Dave_N
Posts: 153
Joined: Fri Sep 30, 2011 7:48 am

Re: Creating Books from .PGN files

Post by Dave_N »

This is completely obvious of course, vectors will pre-allocate large areas of memory, the simple fix is to store the move list as a string. The method works for smaller pgn's.
Dave_N
Posts: 153
Joined: Fri Sep 30, 2011 7:48 am

Re: Creating Books from .PGN files

Post by Dave_N »

Even storing a string for the move list seems to cause too much memory allocation, however the squeezing the memory requirements a lot has made the experiment successful.
Harald
Posts: 318
Joined: Thu Mar 09, 2006 1:07 am

Re: Creating Books from .PGN files

Post by Harald »

Dave_N wrote:
Harald wrote:Perhaps this post has some examples for parts of this thread:
http://www.talkchess.com/forum/viewtopi ... ght=python
In fact I have memory issues for files > 4Mb, I have to rethink my loading strategy because I created a vector for strings for each game, so either I have a memory leak or 4Mb will inflate to >40Mb when the vectors have been created... I don't parse the games until they have been selected and after the merge the game that was inserted into the root game is deleted.
I also found a bug on vector<Object*>::clear() where the memory that was allocated does not dissappear (according to task manager).
What I did was: Store the entire list of PGN games in one big binary game
tree with thousends of variants. I count the leaves (+win=draw-loss) and
back up these counts to the root (opening position). I can use filters like
tree depths or number of games when I output the tree. Perhaps this is
a way to select an opening book? For a book I would store the hash values
of each position together with some statistics.

Now to the bug: vector::clear() does not free the memory behind the
stored object pointers. That's obvous. But it does also not free/delete the
memory used by the vector. That is not a bug. You should read some
documentation about clear(), reserve() and resize(). If you want to give
the vector memory back to the (system) memory pool than you can use
a trick with swap(). Instantiate a second empty vector of the same type
and then swap the content. That may help.

Harald
Dave_N
Posts: 153
Joined: Fri Sep 30, 2011 7:48 am

Re: Creating Books from .PGN files

Post by Dave_N »

Quick update, I have changed the polyglot version that I downloaded to build with win/draw/loss numbers ...

The book_save() function now contains this ...

Code: Select all

      ASSERT&#40;keep_entry&#40;pos&#41;);

      write_integer&#40;file,8,Book->entry&#91;pos&#93;.key&#41;;
      write_integer&#40;file,2,Book->entry&#91;pos&#93;.move&#41;;
      write_integer&#40;file,2,entry_score&#40;&Book->entry&#91;pos&#93;));
      write_integer&#40;file,2,entry_score_blackwin&#40;&Book->entry&#91;pos&#93;)); // formerly 0 padded to 64 bit
      write_integer&#40;file,2,Book->entry&#91;pos&#93;.n&#41;; // formerly 0 padded to 64 bit
and the move loop in book_insert() now looks like this

Code: Select all

      while &#40;pgn_next_move&#40;pgn,string,256&#41;) &#123;

         if &#40;ply < MaxPly&#41; &#123;

            move = move_from_san&#40;string,board&#41;;

            if &#40;move == MoveNone || !move_is_legal&#40;move,board&#41;) &#123;
               my_fatal&#40;"book_insert&#40;)&#58; illegal move \"%s\" at line %d, column %d,game %d\n",string,pgn->move_line,pgn->move_column,pgn->game_nb&#41;;
            &#125;

            pos = find_entry&#40;board,move&#41;;

            Book->entry&#91;pos&#93;.n++;
			if&#40; result == 1 )
				Book->entry&#91;pos&#93;.sum++;
			if&#40; result == -1 )
				Book->entry&#91;pos&#93;.black_win++; //(&#40;result == 0&#41;? ? Book->entry&#91;pos&#93;.draw&#41;;

            if &#40;Book->entry&#91;pos&#93;.n >= COUNT_MAX&#41; &#123;
               halve_stats&#40;board->key&#41;;
            &#125;

            move_do&#40;board,move&#41;;
            ply++;
            //result = -result;
         &#125;
      &#125;
as can be seen I have stored white wins in "sum" and black wins in "black_win" and not inverted the result for each move.

I think I would prefer if the halve_stats() function could be removed. To maintain compatibility I could try to keep the original "sum" variable and simply calculate the correct value, perhaps black_win could store the number of wins for the side to move, then I could maintain compatibility with formats that do not use the padding section ...

Also perhaps an extended format is worth considering to avoid the halve_stats() necessity, with 32 bit integers saved.

I am also thinking about modifications to book_merge ().
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Creating Books from .PGN files

Post by Don »

hgm wrote:No, weight = wins + draws/2. But it makes no sense for me to prefer a move that scored 2 out of 100 over one that cored 1 out of 2 by 2 to 1. It is true that the playing frequency is more significant than the percentage (if the games are taken from sufficiently strong players), but not infinitely more significant. At someoint the algorithm must be able to decide a move that always loses is no good...
You can go by the total count or the win percentage if you modify the numbers somehow to account for that. If you want to go by the win percentage you could add N pseudo draws to spread out the noise a bit. So you might set the weight to something like:

weight = (wins + (K + draws) / 2) / (wins + draws + K)

where K is a constant such as 50 - it could be anything depending on how much weight you want to give to a priori expectation that "all moves are equal"

If you want to go by frequency played there is certainly some mathematical way to gradually fold in extra weight for particularly good results.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Creating Books from .PGN files

Post by Don »

hgm wrote:No, weight = wins + draws/2. But it makes no sense for me to prefer a move that scored 2 out of 100 over one that cored 1 out of 2 by 2 to 1. It is true that the playing frequency is more significant than the percentage (if the games are taken from sufficiently strong players), but not infinitely more significant. At someoint the algorithm must be able to decide a move that always loses is no good...
Here is what a polyglot web page says:
weight

In the Polyglot source code this field is called "count" but it is in fact a measure for the quality of the move. It should be at least one.
The Polyglot book generator sets it to 2*(wins)+(draws), globally scaled to fit into 16 bits. A move with a weight of zero is deleted from the book. This is just a convention and book authors are free to set this field according to their taste (as long as it is at least one).

If random play is enabled in Polyglot then the probability that a move is selected is its weight divided by the sum of the weights of all the moves in the given position.
There is an additional field that could be use called "learn" but I think it is basically undefined so it can be used for anything.
flok

Re: Creating Books from .PGN files

Post by flok »

hgm wrote:No, weight = wins + draws/2. But it makes no sense for me to prefer a move that scored 2 out of 100 over one that cored 1 out of 2 by 2 to 1. It is true that the playing frequency is more significant than the percentage (if the games are taken from sufficiently strong players), but not infinitely more significant. At someoint the algorithm must be able to decide a move that always loses is no good...
Maybe http://www.evanmiller.org/how-not-to-so ... ating.html solves that problem (quantity versus factor).