Book learning

CRoberson · Post by **CRoberson** » Fri May 13, 2011 6:11 am

I think I see the disconnect between Eelco and HGM. Eelco assumes that a program should play all openings equally well. I don't. A program may play all openings reasonably well, but some openings will be played better than others. This is because all programs have a playing style/personality just like humans do. It is quite normal for a human to play some opening like the Black Knights Tango well, but play any line of some other opening (say the French) poorly.

The programs bias/style/personality (for the most part) comes from the terms and coefficients of the position evaluator. Everybody self tunes or auto-tunes these and I don't believe there exists one perfect tuning that applies to all openings.

Also, there is Eelco's statement about using a move that the program doesn't like, because it was used by a Master. There have been moves that have been used by Masters for years until a bust for that line was found. This implies that all moves made by Masters can not be trusted. This just adds more to the argument of don't force a program to play a line that it doesn't like.

Yes, I read the Polyglot learning code as well as the move choice code. I do not like it at all. I would not allow a move that is liked at a small amount to be picked in a tournament over one that is heavily liked, but Polyglot will.

hgm · Post by **hgm** » Fri May 13, 2011 9:17 am

Are you talking about the book-making code of Polyglot now? I noticed that there is something very strange there, as it would prefer a move with a score of 10 out of 100 over one with a score of 5 out of 7 (with a 2:1 ratio).

I have been thinking about replacing that too by a more fundamentally sound algorithm to determine weights. But it is not really related to the learning.

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Fri May 13, 2011 9:49 am

I noticed that there is something very strange there, as it would prefer a move with a score of 10 out of 100 over one with a score of 5 out of 7 (with a 2:1 ratio).

Do such cases really happen? I doubt there are many positions where you can find such a situation.

hgm · Post by **hgm** » Fri May 13, 2011 10:18 am

Well, the example is very extreme, of course. But it could very well hapen that a variation that was believed to be good for the opponent was played a lot, and that the common defense against it was move A, which thus got many plays, but rather poor score. Then a novelty B was found in this position, which basically refuted the previous move. It was played a few times, much less than A, but with a very good score, which convinced people that it indeed was a refutation. So they stopped playing the previous move, and B never got the chance to get anywhere near the number of plays that A has. This makes scenarious where A has 40% out of 1000 plays while B has 60% out of 100 plays not unrealistic. Polyglot would continue to prefer the poor move by 400:60. And even worse, it would not discourage the preceeding move from being played, as all the 600 points scored against it because it was met by the inferior reply A continue to count, while they are in fact no longer relevant, as good players will meet it with move B for sure.

Now I know of course that serious book builders would select the games from which they build the book by date, pay attention to the ratings, etc., exactly to cleanse the input data from such misleading cues. But this is a case that could very well be handled automatically, by proper application of minimax. If the 40% vs 60% scores for move A and B would lead to fafor B over A in play by 90% vs 10%, the average score of this node should be set to 58% (90% x 60% + 10% x 40%), so 42% for the opponent's preceding move, based on 1/(0.81/100 + 0.01/1000) = 122 'virtual' games, rather than at 640 out of 1100 games (58%).

Dann Corbit · Post by **Dann Corbit** » Fri May 13, 2011 7:47 pm

hgm wrote:Well, the example is very extreme, of course. But it could very well hapen that a variation that was believed to be good for the opponent was played a lot, and that the common defense against it was move A, which thus got many plays, but rather poor score. Then a novelty B was found in this position, which basically refuted the previous move. It was played a few times, much less than A, but with a very good score, which convinced people that it indeed was a refutation. So they stopped playing the previous move, and B never got the chance to get anywhere near the number of plays that A has. This makes scenarious where A has 40% out of 1000 plays while B has 60% out of 100 plays not unrealistic. Polyglot would continue to prefer the poor move by 400:60. And even worse, it would not discourage the preceeding move from being played, as all the 600 points scored against it because it was met by the inferior reply A continue to count, while they are in fact no longer relevant, as good players will meet it with move B for sure.

Now I know of course that serious book builders would select the games from which they build the book by date, pay attention to the ratings, etc., exactly to cleanse the input data from such misleading cues. But this is a case that could very well be handled automatically, by proper application of minimax. If the 40% vs 60% scores for move A and B would lead to fafor B over A in play by 90% vs 10%, the average score of this node should be set to 58% (90% x 60% + 10% x 40%), so 42% for the opponent's preceding move, based on 1/(0.81/100 + 0.01/1000) = 122 'virtual' games, rather than at 640 out of 1100 games (58%).

Why not mini-max the book with alpha-beta upon update? Doesn't that remove the problems of refutations being discovered converging slowly?

hgm · Post by **hgm** » Fri May 13, 2011 9:38 pm

What do you mean by 'on update'? Writing of the learn info? I guess the whole learning could be conducted as a kind of Monte-Carlo UCT search.

Dann Corbit · Post by **Dann Corbit** » Fri May 13, 2011 10:39 pm

hgm wrote:What do you mean by 'on update'? Writing of the learn info? I guess the whole learning could be conducted as a kind of Monte-Carlo UCT search.

I mean like this:
When you write out an EPD position record into your learn database, it carries a list of numbers:
0. Depth of search
1. Centipawn evaluation
2. Computer Standard time control wins for side to move
3. Computer Standard time control losses for side to move
4. Computer Standard time control draws for side to move
5. Computer Blitz time control wins for side to move
6. Computer Blitz time control losses for side to move
7. Computer Blitz time control draws for side to move
8. Historic wins for side to move
9. Historic losses for side to move
10. Historic draws for side to move

and whatever else you would like.
If you play enough games, eventually, the wins, losses and draws will allow you to make the best possible choices. However, you also have depth and ce figures, which can be used together with all of your other analyzed positions that are stored (about 3/4 of a million will do for starters) and you can run alpha-beta against these positions with the new information.

PV nodes are very rare. Therefore, we should maintain a list of every pv node that has ever been analyzed, along with as much statistical information as possible. These pv nodes can be used to pre-load the hash table, and also when we exit the program we can update our pv nodes with the information we have gained from analyzing and also game outcome.

Is my meaning clear?

hgm · Post by **hgm** » Fri May 13, 2011 10:59 pm

Yes, but Polyglot books are a lot more simplistic than what you describe. Each move only has a weight, which is supposed to stay fixed, and a learnScore and nrOfPlays. And the two learn fields are not even updated from PV nodes in the search, but from moves played at game level. (Polyglot has no knowledge of the search.)

During the building of a book from a PGN collection, one can of course use an intermediate representation that stores much more information

Don · Post by **Don** » Wed May 25, 2011 3:27 pm

Gian-Carlo Pascutto wrote:Ah, right, then it's not so bad. But if the 5% move then scores 4 straight losses, it will still be at 10%, which doesn't look desirable to me.

I'm not sure why you want to force exploration of the moves. This is useful if there is no strong prior, but here there is good information already about what moves are likely to be good.

If the learning is done off-line it's ok. Doesn't this remind you of MCTS in some ways?

Don · Post by **Don** » Wed May 25, 2011 3:45 pm

Dann Corbit wrote:
hgm wrote:Well, the example is very extreme, of course. But it could very well hapen that a variation that was believed to be good for the opponent was played a lot, and that the common defense against it was move A, which thus got many plays, but rather poor score. Then a novelty B was found in this position, which basically refuted the previous move. It was played a few times, much less than A, but with a very good score, which convinced people that it indeed was a refutation. So they stopped playing the previous move, and B never got the chance to get anywhere near the number of plays that A has. This makes scenarious where A has 40% out of 1000 plays while B has 60% out of 100 plays not unrealistic. Polyglot would continue to prefer the poor move by 400:60. And even worse, it would not discourage the preceeding move from being played, as all the 600 points scored against it because it was met by the inferior reply A continue to count, while they are in fact no longer relevant, as good players will meet it with move B for sure.

Now I know of course that serious book builders would select the games from which they build the book by date, pay attention to the ratings, etc., exactly to cleanse the input data from such misleading cues. But this is a case that could very well be handled automatically, by proper application of minimax. If the 40% vs 60% scores for move A and B would lead to fafor B over A in play by 90% vs 10%, the average score of this node should be set to 58% (90% x 60% + 10% x 40%), so 42% for the opponent's preceding move, based on 1/(0.81/100 + 0.01/1000) = 122 'virtual' games, rather than at 640 out of 1100 games (58%).
Why not mini-max the book with alpha-beta upon update? Doesn't that remove the problems of refutations being discovered converging slowly?

I think mini-max of the book is a very good way to customize a book for a given program. In general you want your program to come out of book with a position that it thinks is good. A significant percentage of Komodo's losses in tournament games happens when we come out of book with what Komodo considers a losing (or close to losing) position.

However I believe that for this to work the program being tuned must search every book position to ensure there is a valid choice (at least from the programs point of view.) For example:

In position P the book gives Ne5 and d4, but both moves are inferior to c3 which is not in the book. In this case, any mini-max that involves position P will be in error. It might be a good line if c3 is allowed and thus the computer will avoid it when playing white, or as black think it's a great position to force white into.

So my idea for mini-maxing the book is to evaluate every position and insert the computers choice, if different, into the book also. In the cases where the computer has it's own TN you might also consider verifying the new choice with additional thinking time to avoid gratuitously adding moves to the book (that might also get recursively expanded upon.)

I have never seen a book that didn't have some kind of omission such as this. Cilkchess lost a game to "King" one year because the book did not consider an obvious choice. In general I have not payed a lot of attention to the book in our programs and it has cost us dearly. There are a few programs that do not test very well or rank very high on the rating lists but get much better results in tournaments because of exceptionally good book preparation. Once you get a terrible position it's difficult to recover even against a program 2 or 3 hundred ELO weaker.

Book learning

Re: Book learning

Re: Book learning

Re: Book learning

Re: Book learning

Re: Book learning

Re: Book learning

Re: Book learning

Re: Book learning

Re: Book learning

Re: Book learning