On Opening books in 2015

zullil · Post by **zullil** » Tue Mar 10, 2015 11:39 am

Zenmastur wrote: So the point is, just because the displayed PV is NOT a valid line of play (because it doesn't correspond to the score given for the root move) doesn't mean that the score for the root move is wrong! It simply means the PV is unreliable and has to be created from scratch. This fact has absolutely no effect on the soundness of the selected move. The PV displayed is simply window dressing, and nothing more!

Right. This is a point that seems widely misunderstood. If nothing else, surely everyone should see that the "later" moves in the PV must have been searched less deeply than the first move.

Laskos · Post by **Laskos** » Tue Mar 10, 2015 12:37 pm

zullil wrote:
Zenmastur wrote: So the point is, just because the displayed PV is NOT a valid line of play (because it doesn't correspond to the score given for the root move) doesn't mean that the score for the root move is wrong! It simply means the PV is unreliable and has to be created from scratch. This fact has absolutely no effect on the soundness of the selected move. The PV displayed is simply window dressing, and nothing more!
Right. This is a point that seems widely misunderstood. If nothing else, surely everyone should see that the "later" moves in the PV must have been searched less deeply than the first move.

What is widely understood by book builders is that an engine (at any time control) jumping from move to move based solely on its eval in the opening is inferior to a book, a fact often confirmed by the engine itself, when it is surprised by the book moves towards the end of the book line. It's not uncommon that a book engine exits from the book with 30 cp swing in its favor against an engine solely using its jumping from move to move eval, the engine itself often admits that it erred in the opening. Irrespective of time control. Books, unlike engines' own eval, are not collections of best 1-movers.

zullil · Post by **zullil** » Tue Mar 10, 2015 12:54 pm

Laskos wrote: What is widely understood by book builders is that an engine (at any time control) jumping from move to move based solely on its eval in the opening is inferior to a book, a fact often confirmed by the engine itself, when it is surprised by the book moves towards the end of the book line. It's not uncommon that a book engine exits from the book with 30 cp swing in its favor against an engine solely using its jumping from move to move eval, the engine itself admits that it erred in the opening. Irrespective of time control. Books, unlike engines' own eval, are not collections of best 1-movers.

Didn't intend to malign book creators, or anyone else. Simply noting that many folks on this forum seem to believe that the entire PV delivered by an engine carries the same "reliability" as the first move.

reflectionofpower · Post by **reflectionofpower** » Tue Mar 10, 2015 1:06 pm

Laskos wrote:
zullil wrote:
Zenmastur wrote: So the point is, just because the displayed PV is NOT a valid line of play (because it doesn't correspond to the score given for the root move) doesn't mean that the score for the root move is wrong! It simply means the PV is unreliable and has to be created from scratch. This fact has absolutely no effect on the soundness of the selected move. The PV displayed is simply window dressing, and nothing more!
Right. This is a point that seems widely misunderstood. If nothing else, surely everyone should see that the "later" moves in the PV must have been searched less deeply than the first move.
What is widely understood by book builders is that an engine (at any time control) jumping from move to move based solely on its eval in the opening is inferior to a book, a fact often confirmed by the engine itself, when it is surprised by the book moves towards the end of the book line. It's not uncommon that a book engine exits from the book with 30 cp swing in its favor against an engine solely using its jumping from move to move eval, the engine itself often admits that it erred in the opening. Irrespective of time control. Books, unlike engines' own eval, are not collections of best 1-movers.

nice point

zullil · Post by **zullil** » Tue Mar 10, 2015 1:48 pm

Laskos wrote: Books, unlike engines' own eval, are not collections of best 1-movers.

Each move in a book line should be a best move in its position, so books are indeed collections of best 1-movers. The point is that one less-than-best move can be fatal, and no engine plays perfectly. Even an engine that chooses a best move with probability 0.99 has only a probability of 0.99^20 = 0.82 of playing correctly in a twenty move line (assuming independence, which seems a bad assumption

).

Seems to me that a book is analogous to a fully-trained neural net, while a book-less engine is analogous to a completely untrained one.

Zenmastur · Post by **Zenmastur** » Tue Mar 10, 2015 2:02 pm

Laskos wrote:
zullil wrote:
Zenmastur wrote: So the point is, just because the displayed PV is NOT a valid line of play (because it doesn't correspond to the score given for the root move) doesn't mean that the score for the root move is wrong! It simply means the PV is unreliable and has to be created from scratch. This fact has absolutely no effect on the soundness of the selected move. The PV displayed is simply window dressing, and nothing more!
Right. This is a point that seems widely misunderstood. If nothing else, surely everyone should see that the "later" moves in the PV must have been searched less deeply than the first move.
What is widely understood by book builders is that an engine (at any time control) jumping from move to move based solely on its eval in the opening is inferior to a book, a fact often confirmed by the engine itself, when it is surprised by the book moves towards the end of the book line. It's not uncommon that a book engine exits from the book with 30 cp swing in its favor against an engine solely using its jumping from move to move eval, the engine itself often admits that it erred in the opening. Irrespective of time control. Books, unlike engines' own eval, are not collections of best 1-movers.

You seem to be implying that non-book builders don't know that a single eval isn't as good as a book. I don't think this is true.

What is apparently not widely understood by book builders is there is a way to build books that is more accurate and perfectly tuned to a particular engine without painstakingly hand building and tuning them. And a major portion of the work can be done during normally testing.

I would like to be able to tell you that all of the work could be done during testing but I'm sure that the development community would object to "exploration" games being run during testing for fear that it might taint the results of the tests. I'm pretty sure that it wouldn't (the number of 'exploration' game is something like k*log(n/h)^j+c where c, h, j, and k are constants and n is the total number of games played. So it should be a pretty small number compared to the number of "exploitation" games) but the idea that it "could" is probably enough to dissuade the testers from doing this during normal testing. The "exploration" games can be done in stages based on depth and can be done anytime machines are available and don't have other testing work or when the testing cue has more machines than needed.

Periodically, the individual books from the tests machines can be collected and merged into a master book and then redistributed back to the test machines. The only requirement for doing this is that the master count and count that each machine records must be kept separate so that during the next merge only new game are added to the master counts. This requires two count fields in each record. A master count and a new count.

This book will be tuned to the engine that is used to create it because the "rewards" are simply the score from any particular positions. This type of book will self tune to maximize the "rewards" and therefore the engines game results. No hand tuning required. e.g. if the engine doesn't know how to play the KID the lines chosen will reflect that fact.

Regards,

Zen

jdart · Post by **jdart** » Tue Mar 10, 2015 2:32 pm

All tuning methods are imperfect IMO.

If you base on evals, you are always cutting the search off after some time. And there is always a chance that deeper search reveals a very different PV and score. Plus scores in many cases do not predict the game result well, unless they are very large.

If you base on game results, then you are tuning the early opening and midgame moves based on a reward that is very far out and where there are usually many opportunities to branch away from the PV.

Tuning helps. Automated tuning can work. But it is imperfect.

--Jon

Zenmastur · Post by **Zenmastur** » Wed Mar 11, 2015 12:30 am

jdart wrote:All tuning methods are imperfect IMO.

If you base on evals, you are always cutting the search off after some time. And there is always a chance that deeper search reveals a very different PV and score. Plus scores in many cases do not predict the game result well, unless they are very large.

If you base on game results, then you are tuning the early opening and midgame moves based on a reward that is very far out and where there are usually many opportunities to branch away from the PV.

Tuning helps. Automated tuning can work. But it is imperfect.

--Jon

I agree that all methods are imperfect. However, this isn't a justification for not trying to improve the situation. Having a self-generating, self-tuning book that is orders of magnitude more flexible than current books is a step in the right direction IMHO. While not having a plan is a good way to see that little or no improvement is ever made.

Evals are good if you don't have anything else to base your judgment on. i.e. The position is not part of the playable book. Keeping track of the deepest search done on each position, the number of nodes searched and its score while not perfect are useful. Keeping game statistics for each position is also useful. Being able to compare the two is very nice! e.g. If the deepest search on a position is 40 plies, required 20 billion nodes, and returns a score of +1.65 and your average results from the position is 0.1 clearly there's a problem. Not having to manually intervene to fix the problem is priceless!! Better yet is the fact that this would be a VERY unlikely event with this type of book because the self tuning feature would keep this from happening.

Neither of the metrics you mentioned are perfect. I think Crafty uses an intermediate score that is modified in some way. I don't see any reason why more than one metric can't be stored in the book. Disk space is cheap, and there really aren't that many position in the openings. E.G. I took 4,837,576 games comprising 399,266,847 moves and made a book from them. In the first 60 plies there were 175,512,625 unique positions. Of these, only 14,038,521 position were seen more than once. Of these, only 1,090,115 positions were seen 5 times or more which is approximately 1 game in a million. If the book is only 10-plies long this number drops to 99,334. Of these 99,334 position not all of them would be played on a regular basis due to the effects of the algorithms move selection and constant tuning.

This type of book would be larger than current books because the record size will be larger and more positions will be stored, but this seems like a minor issue since most people have no problem finding tens or hundreds of gigabytes to store table-bases. In the end the size of the book will depend mostly on its maximum depth and to a lessor extent on how much information about each position is stored.

Regards,

Zen

jdart · Post by **jdart** » Wed Mar 11, 2015 3:40 am

I weight book moves by frequency and win/loss. Plus quite a bit of manual tuning. I do not store evals in the book but I have thought about doing that.

The problem is, if I have multiple sources of information, I do not know how to combine them. What do you do if a position looks bad from a win/loss point of view but in fact was played frequently by strong players? Maybe there is a way to equalize but most historical games did not find it. Or maybe it used to be a good move but now there is a refutation.

Or suppose you have an eval for a move but it is shallow and you also have some win/loss information but based on few games. You don't know much about it and what you do know might point in different directions. What should the weight be?

I am sure you could invent reasonable heuristics for this. Or use some "hyper-parameter" learning to learn what to weight. But I don't think it is simple.

--Jon

Zenmastur · Post by **Zenmastur** » Wed Mar 11, 2015 6:48 am

jdart wrote:I weight book moves by frequency and win/loss. Plus quite a bit of manual tuning. I do not store evals in the book but I have thought about doing that.

The problem is, if I have multiple sources of information, I do not know how to combine them.

Why would you want to combine them? They're not closely related so I don't see any benefit to this. Instead of combining them, compare them. Pawn advantage can be converted to ELO and ELO can be converted to a win/loss score ratio. If the win-loss-draw ratio doesn't correspond to the position's eval then something is amiss and some action needs to be taken. If however, they do closely correspond, all is well and the engine can progress to the next stage in its move selection. In a tournament you would want to turn off the active learning so the program doesn't start testing some untried line of play and end up spoiling its tournament results. This doesn't mean that all learning is turned off for tournaments, just the "active" exploration of new lines is inhibited. Passive data collection would go on as usual.

So the question remains, what do you do when the eval/pawn advantage doesn't match the engines win/loss statistics, or the win/loss statistics of third parties from the same position. (I separate these two because they shouldn't be mixed as they quantify completely separate experience sets and therefore should be kept separately in the position records.) I wouldn't worry too much about being unable to match the win/loss ratio of third parties unless it is way off. If its way off the program may have a bug or be lacking in it's evaluation function. In tournaments I don't play openings I don't understand if I can avoid it and I don't care if everyone else in the world has a plus score with that opening. I play openings I do understand and have done well with in the past. When the program's eval says +1.65 and its losing more often than it should be then the bandit algorithm can do several things. The first thing it's likely to do is to adjust the play percentages of the available moves. It does this all the time anyway so nothing is unusual about this. In most instances it will already have tried other moves so it may have a substitute move waiting in the wings. Other than adjusting the play percentages for each move it can select an untried move, this is the exploration part of the algorithm that actively increases it's knowledge about the opening while maximizing its game score. As a last resort, and if time controls permit it could extend the time allotted to this move to see if this will significantly change its eval. i.e. overcome the horizon effect. This is easy to do if most of your games are played at fast time controls.

jdart wrote:What do you do if a position looks bad from a win/loss point of view but in fact was played frequently by strong players? Maybe there is a way to equalize but most historical games did not find it. Or maybe it used to be a good move but now there is a refutation.

The good part about these algorithms are that YOU don't have to do too much other than insert it into your book routines. If the book routines are designed properly and a "NEW" refutation move is found by a third party, feed those games into the book update routines. Ideally this should be enough to get the book routines onto the right track. Of course this won't fix any problems that occur after the last book move is played. So, this will not fix an engine that is deficient in some way. It will however, learn to avoid positions the engine doesn't score well with. So if the book routines starts to avoiding lots of lines of play it's a sign the engine has a problem.

jdart wrote:Or suppose you have an eval for a move but it is shallow and you also have some win/loss information but based on few games. You don't know much about it and what you do know might point in different directions. What should the weight be?

I am sure you could invent reasonable heuristics for this. Or use some "hyper-parameter" learning to learn what to weight. But I don't think it is simple.

I think you are thinking about this all wrong! You don't assign weights because you don't get to pick the moves that are played. Period! Full stop! These are pretty well developed algorithms. Some of them have been proven optimal for the tasks they were design for. So as a programmer you need to pick one of these algorithms and then adapt it for use with chess opening-books. Most of the decisions will be about what features you want and the best way to implement them and not about which move to select. This may sound restrictive, but its no more restrictive than you wanting to play the Sicilian as white only to have your opponent answer e4 with e5, e6, or c3 etc.

If you're seriously considering adding this type of book to an engine the first thing I would do is download a few of the papers on the subject and read then. This should answer most of your questions. If you want the titles of a few papers, ask and you shall receive!

Regards,

Zen

On Opening books in 2015

Re: On Opening books in 2015

Re: On Opening books in 2015

Re: On Opening books in 2015

Re: On Opening books in 2015

Re: On Opening books in 2015

Re: On Opening books in 2015

Re: On Opening books in 2015

Re: On Opening books in 2015

Re: On Opening books in 2015

Re: On Opening books in 2015