Thanks for giving us your permission to use “YOUR” ideas.
That's very noble of you!
What your describing is a testing frame work, much like the SF testing frame work! Gee, I wonder where I've heard this before?... Write a simple Alpha-Beta engine …., and run XXXXX games in parallel …. The XXXXX games will try to create …. an opening book.
Well, that's close, but an MAB based learning book needs more than just the PV nodes to function properly. It will use non PV nodes to store info about what has and hasn't actually been seen in OTB play and to store search information (in lieu of actual game play for non-played moves of interest.) e.g. If a leaf node in the book is reached for a second time the MAB algorithm would have you randomly select one of the, as yet, un-played moves to determine it's reward structure. It will continue to do this until all possible moves from this position have been “sampled”. This is a highly sub-optimal strategy for a chess opening book. (many of these moves will drop pieces or worse) Instead of this, a single move should be selected and searched.. If it returns a “reasonable” score (i.e. not too much less the score of the best “sampled” or searched moves then it can be played OTB. If the search returns a winning score (i.e. a mate score or tablebase score that represents a win) AND this score is better than any previously sampled move then it should be played. If however, the search returns a considerably worse score than “sampled” moves then a few options are available. If it's a mate score and we are losing then it should be marked as such and it will NEVER be used by the book. If not too much time has been used searching this move and enough time is left on the clock a second as yet un-sampled move can be searched. If not enough time remains a previously “sampled” move can be made. This saves time in “growing” the book and keeps the program from making terrible moves in the opening just to “claim” that it's sampled each move at least once.Dann Corbit wrote: ↑Wed Jul 17, 2019 11:27 pmOr, perhaps, why aren't modern opening books database files that contain:
1. Number of visits
2. WDL information (#Wins / #Losses / #Draws)
3. Depth of search
4. ce score
5-999: whatever else you want to collect.
An opening book should be a collection of pv nodes with as much data attached to the nodes as we can possibly find useful.
The search scores for such moves should be placed in a non-leaf records (I.e records that are NOT part of the “playable” book) in the book along with pertinent information such as search depth, nodes searched, best move, etc. That doesn't make them a part of the “playable” book AND most such records will never become a playable part of the book but they do hold valuable information that the book algorithm can use to effectively balance the “need” to play “good” moves with the “want” to explore new opening move with the expectation of finding better lines of play.
So, you're on the right track as more information is needed to be stored, but not “JUST” about PV nodes. Non PV nodes need to be saved and information about them stored for the book to be able to “learn” which moves are best.
I disagree!dragontamer5788 wrote: ↑Wed Jul 17, 2019 11:50 pm
I guess I was pushing MCTS specifically. The mathematical MCTS "understanding" of a node is that it is a a "armed bandit" (American slang for "slot machine"). If you have a room filled with hundreds-of-thousands of slot machines, the MCTS methodology was created to "solve" the slot machine problem (aka: Multi-armed Bandit problem).
How do you "search" for the best slot machine, in a room filled with many, many slot machines? Where the "best" slot machine is the one that gives you the highest win-rate (or the lowest loss-rate). MCTS is a exploration vs exploitation methodology that will asymptotically find the best slot machine (if given enough time). It is also proven to be the fastest asymptotic complexity for the multi-armed bandit problem. (Much like how Knuth proves alpha-beta pruning to be the best asymptotic search for speeding up minimax... MCTS has been mathematically proven to solve the multi-armed bandit problem)
So other algorithms can be faster only by a constant factor compared to MCTS (where "faster" is the number of times a node is visited). Something faster may exist, but it will be in the same complexity class as MCTS.
All the math behind the multi-armed bandit problem, MCTS, and all that seems to elegantly map to the opening book. Far better than say, PV-search, Alpha-beta, or other concepts.
Using purely random move choices make this a NO-go proposition because of the losses they will cause and because the rate at which useful knowledge will be gained will be glacially slow. In order to make this a viable solution for chess opening books AB engines or similar methods will be used to greatly speed the process and avoid most if not all random choices. i.e few if any random moves will be needed and no random play-outs should be used.
I proposed this approach a few years ago, and more recently, and I said essentially the exact same thing but it hasn't provided much motivation for the community to adopt a “standard” learning book. Or any new books, much less a learning book as far as I can tell. Maybe, the threat posed by NN engines will provide the required motivation. I guess we'll see.In effect: a given position in an opening book is a "slot machine". It has some win/loss ratio, but we don't know it, and the only way to find out is to repeatedly play that slot machine and collect statistics. But we have other slot machines (other positions) we also need to collect statistics from. MCTS seems to be an unusually good "match" to the opening book problem.