Creating Books from .PGN files
Moderators: hgm, Rebel, chrisw
-
- Posts: 153
- Joined: Fri Sep 30, 2011 7:48 am
Re: Creating Books from .PGN files
This is completely obvious of course, vectors will pre-allocate large areas of memory, the simple fix is to store the move list as a string. The method works for smaller pgn's.
-
- Posts: 153
- Joined: Fri Sep 30, 2011 7:48 am
Re: Creating Books from .PGN files
Even storing a string for the move list seems to cause too much memory allocation, however the squeezing the memory requirements a lot has made the experiment successful.
-
- Posts: 318
- Joined: Thu Mar 09, 2006 1:07 am
Re: Creating Books from .PGN files
What I did was: Store the entire list of PGN games in one big binary gameDave_N wrote:In fact I have memory issues for files > 4Mb, I have to rethink my loading strategy because I created a vector for strings for each game, so either I have a memory leak or 4Mb will inflate to >40Mb when the vectors have been created... I don't parse the games until they have been selected and after the merge the game that was inserted into the root game is deleted.Harald wrote:Perhaps this post has some examples for parts of this thread:
http://www.talkchess.com/forum/viewtopi ... ght=python
I also found a bug on vector<Object*>::clear() where the memory that was allocated does not dissappear (according to task manager).
tree with thousends of variants. I count the leaves (+win=draw-loss) and
back up these counts to the root (opening position). I can use filters like
tree depths or number of games when I output the tree. Perhaps this is
a way to select an opening book? For a book I would store the hash values
of each position together with some statistics.
Now to the bug: vector::clear() does not free the memory behind the
stored object pointers. That's obvous. But it does also not free/delete the
memory used by the vector. That is not a bug. You should read some
documentation about clear(), reserve() and resize(). If you want to give
the vector memory back to the (system) memory pool than you can use
a trick with swap(). Instantiate a second empty vector of the same type
and then swap the content. That may help.
Harald
-
- Posts: 153
- Joined: Fri Sep 30, 2011 7:48 am
Re: Creating Books from .PGN files
Quick update, I have changed the polyglot version that I downloaded to build with win/draw/loss numbers ...
The book_save() function now contains this ...
and the move loop in book_insert() now looks like this
as can be seen I have stored white wins in "sum" and black wins in "black_win" and not inverted the result for each move.
I think I would prefer if the halve_stats() function could be removed. To maintain compatibility I could try to keep the original "sum" variable and simply calculate the correct value, perhaps black_win could store the number of wins for the side to move, then I could maintain compatibility with formats that do not use the padding section ...
Also perhaps an extended format is worth considering to avoid the halve_stats() necessity, with 32 bit integers saved.
I am also thinking about modifications to book_merge ().
The book_save() function now contains this ...
Code: Select all
ASSERT(keep_entry(pos));
write_integer(file,8,Book->entry[pos].key);
write_integer(file,2,Book->entry[pos].move);
write_integer(file,2,entry_score(&Book->entry[pos]));
write_integer(file,2,entry_score_blackwin(&Book->entry[pos])); // formerly 0 padded to 64 bit
write_integer(file,2,Book->entry[pos].n); // formerly 0 padded to 64 bit
Code: Select all
while (pgn_next_move(pgn,string,256)) {
if (ply < MaxPly) {
move = move_from_san(string,board);
if (move == MoveNone || !move_is_legal(move,board)) {
my_fatal("book_insert(): illegal move \"%s\" at line %d, column %d,game %d\n",string,pgn->move_line,pgn->move_column,pgn->game_nb);
}
pos = find_entry(board,move);
Book->entry[pos].n++;
if( result == 1 )
Book->entry[pos].sum++;
if( result == -1 )
Book->entry[pos].black_win++; //((result == 0)? ? Book->entry[pos].draw);
if (Book->entry[pos].n >= COUNT_MAX) {
halve_stats(board->key);
}
move_do(board,move);
ply++;
//result = -result;
}
}
I think I would prefer if the halve_stats() function could be removed. To maintain compatibility I could try to keep the original "sum" variable and simply calculate the correct value, perhaps black_win could store the number of wins for the side to move, then I could maintain compatibility with formats that do not use the padding section ...
Also perhaps an extended format is worth considering to avoid the halve_stats() necessity, with 32 bit integers saved.
I am also thinking about modifications to book_merge ().
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: Creating Books from .PGN files
You can go by the total count or the win percentage if you modify the numbers somehow to account for that. If you want to go by the win percentage you could add N pseudo draws to spread out the noise a bit. So you might set the weight to something like:hgm wrote:No, weight = wins + draws/2. But it makes no sense for me to prefer a move that scored 2 out of 100 over one that cored 1 out of 2 by 2 to 1. It is true that the playing frequency is more significant than the percentage (if the games are taken from sufficiently strong players), but not infinitely more significant. At someoint the algorithm must be able to decide a move that always loses is no good...
weight = (wins + (K + draws) / 2) / (wins + draws + K)
where K is a constant such as 50 - it could be anything depending on how much weight you want to give to a priori expectation that "all moves are equal"
If you want to go by frequency played there is certainly some mathematical way to gradually fold in extra weight for particularly good results.
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: Creating Books from .PGN files
Here is what a polyglot web page says:hgm wrote:No, weight = wins + draws/2. But it makes no sense for me to prefer a move that scored 2 out of 100 over one that cored 1 out of 2 by 2 to 1. It is true that the playing frequency is more significant than the percentage (if the games are taken from sufficiently strong players), but not infinitely more significant. At someoint the algorithm must be able to decide a move that always loses is no good...
There is an additional field that could be use called "learn" but I think it is basically undefined so it can be used for anything.weight
In the Polyglot source code this field is called "count" but it is in fact a measure for the quality of the move. It should be at least one.
The Polyglot book generator sets it to 2*(wins)+(draws), globally scaled to fit into 16 bits. A move with a weight of zero is deleted from the book. This is just a convention and book authors are free to set this field according to their taste (as long as it is at least one).
If random play is enabled in Polyglot then the probability that a move is selected is its weight divided by the sum of the weights of all the moves in the given position.
Re: Creating Books from .PGN files
Maybe http://www.evanmiller.org/how-not-to-so ... ating.html solves that problem (quantity versus factor).hgm wrote:No, weight = wins + draws/2. But it makes no sense for me to prefer a move that scored 2 out of 100 over one that cored 1 out of 2 by 2 to 1. It is true that the playing frequency is more significant than the percentage (if the games are taken from sufficiently strong players), but not infinitely more significant. At someoint the algorithm must be able to decide a move that always loses is no good...