Have there been any attempts to use MCTS just for tuning of the eval weights of a "regular" chess engine?
Tuning the eval to predict the outcome of an alpha-beta search is a bit hopeless because of all the tactics that can't be encoded in a typical evaluation function. MCTS might average out the tactics sufficiently that its results can directly be used for tuning. The engine would then use alpha-beta for playing games.
Google's AlphaGo team has been working on chess
Moderators: hgm, Rebel, chrisw
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
-
- Posts: 273
- Joined: Wed Aug 24, 2016 9:49 pm
Re: Google's AlphaGo team has been working on chess
I dont think that would work particularly well. No matter what optimization algorithm one uses we only have a linear funciton of evaluation parameters which I doubt can encode tactics (very well).syzygy wrote:Have there been any attempts to use MCTS just for tuning of the eval weights of a "regular" chess engine?
Tuning the eval to predict the outcome of an alpha-beta search is a bit hopeless because of all the tactics that can't be encoded in a typical evaluation function. MCTS might average out the tactics sufficiently that its results can directly be used for tuning. The engine would then use alpha-beta for playing games.
Neural networks aren't like that and can even learn non-linear functions well.
As for tuning the "regular" evaluation parameters and dont quite understand what you mean by "using mcts to train eval parameters"
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Google's AlphaGo team has been working on chess
Run random simulations to get some sort of "averaged" estimation of the evaluation, the idea being that the tactics of the position are washed out sufficiently by the nature of mcts. Then tune the eval weights to better predict that washed out tactic-free estimation.CheckersGuy wrote:As for tuning the "regular" evaluation parameters and dont quite understand what you mean by "using mcts to train eval parameters"
I don't really expect it to work, but I wouldn't expect AlphaZero Chess to work, either
-
- Posts: 536
- Joined: Thu Mar 09, 2006 3:01 pm
Re: Google's AlphaGo team has been working on chess
??Henk wrote:That's fast show us some games.
If this was meant for me, it would take thousands of cpu/gpu (both) hours to even just begin to train any reasonable NN. It has been estimated that AlphaGo Zero would take something like about 1,700 years on typical consumer hardware. For chess, I only got it to start running, and am still trying to understand it. There is a Connect4 game he did also that is a simpler starting point. The author is considering a distributed approach, I think.
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: Google's AlphaGo team has been working on chess
Alpha0 NN has only tactics of depth 8 encoded in weights and it works.syzygy wrote:Run random simulations to get some sort of "averaged" estimation of the evaluation, the idea being that the tactics of the position are washed out sufficiently by the nature of mcts. Then tune the eval weights to better predict that washed out tactic-free estimation.CheckersGuy wrote:As for tuning the "regular" evaluation parameters and dont quite understand what you mean by "using mcts to train eval parameters"
I don't really expect it to work, but I wouldn't expect AlphaZero Chess to work, either
MCTS should be used for playing not training.
Here is what could actually work.
Instead of alpha-beta use identical MCTS as in Alpha0.
When you reach leaf node instead of only preforming eval perform optimized mpv (of moves that wouldn't be cut by LMR) depth 8 search + QS and return score. Transform the best move score into v and ratio of move scores into probability for each move to be played all using some linear function.
If you could for example perform this mpv depth 8 search+QS in 10ms, using 64 cores machine you'd have 6400 MCT searches performed each second. That is still quite shy from 80k performed in Alpha0, but with more powerful machine you'd get there.
And despite all the hype about Alpha0's NN eval, I really doubt that it is better than depth 8 mpv search + QS of SF.
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Google's AlphaGo team has been working on chess
I think this is more or less what Stéphane is now working on.Milos wrote:Alpha0 NN has only tactics of depth 8 encoded in weights and it works.syzygy wrote:Run random simulations to get some sort of "averaged" estimation of the evaluation, the idea being that the tactics of the position are washed out sufficiently by the nature of mcts. Then tune the eval weights to better predict that washed out tactic-free estimation.CheckersGuy wrote:As for tuning the "regular" evaluation parameters and dont quite understand what you mean by "using mcts to train eval parameters"
I don't really expect it to work, but I wouldn't expect AlphaZero Chess to work, either
MCTS should be used for playing not training.
Here is what could actually work.
Instead of alpha-beta use identical MCTS as in Alpha0.
When you reach leaf node instead of only preforming eval perform optimized mpv (of moves that wouldn't be cut by LMR) depth 8 search + QS and return score. Transform the best move score into v and ratio of move scores into probability for each move to be played all using some linear function.
If you could for example perform this mpv depth 8 search+QS in 10ms, using 64 cores machine you'd have 6400 MCT searches performed each second. That is still quite shy from 80k performed in Alpha0, but with more powerful machine you'd get there.
And despite all the hype about Alpha0's NN eval, I really doubt that it is better than depth 8 mpv search + QS of SF.
If I now understand the AlphaZero MCTS correctly, it is actually a sort of book-builder search. Build a tree by expanding leaf nodes and back up the results to the root, using some rule to select the next leaf node to be expanded.
If this works well with SF's alpha-beta search and evaluation, that would be very funny.
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: Google's AlphaGo team has been working on chess
Yes this is exactly what Stephane is working on (just found his fork a moment ago), except that he uses regular instead of mpv search.syzygy wrote:I think this is more or less what Stéphane is now working on.
If I now understand the AlphaZero MCTS correctly, it is actually a sort of book-builder search. Build a tree by expanding leaf nodes and back up the results to the root, using some rule to select the next leaf node to be expanded.
If this works well with SF's alpha-beta search and evaluation, that would be very funny.
You are right about MCTS. The rule is actually quite simple multiarmed bandit problem solution, where the nodes that have higher visit count get less explored in the future.
-
- Posts: 273
- Joined: Wed Aug 24, 2016 9:49 pm
Re: Google's AlphaGo team has been working on chess
the article on wikipedia actually describes this fairly well https://en.wikipedia.org/wiki/Monte_Carlo_tree_searchsyzygy wrote:I think this is more or less what Stéphane is now working on.Milos wrote:Alpha0 NN has only tactics of depth 8 encoded in weights and it works.syzygy wrote:Run random simulations to get some sort of "averaged" estimation of the evaluation, the idea being that the tactics of the position are washed out sufficiently by the nature of mcts. Then tune the eval weights to better predict that washed out tactic-free estimation.CheckersGuy wrote:As for tuning the "regular" evaluation parameters and dont quite understand what you mean by "using mcts to train eval parameters"
I don't really expect it to work, but I wouldn't expect AlphaZero Chess to work, either
MCTS should be used for playing not training.
Here is what could actually work.
Instead of alpha-beta use identical MCTS as in Alpha0.
When you reach leaf node instead of only preforming eval perform optimized mpv (of moves that wouldn't be cut by LMR) depth 8 search + QS and return score. Transform the best move score into v and ratio of move scores into probability for each move to be played all using some linear function.
If you could for example perform this mpv depth 8 search+QS in 10ms, using 64 cores machine you'd have 6400 MCT searches performed each second. That is still quite shy from 80k performed in Alpha0, but with more powerful machine you'd get there.
And despite all the hype about Alpha0's NN eval, I really doubt that it is better than depth 8 mpv search + QS of SF.
If I now understand the AlphaZero MCTS correctly, it is actually a sort of book-builder search. Build a tree by expanding leaf nodes and back up the results to the root, using some rule to select the next leaf node to be expanded.
If this works well with SF's alpha-beta search and evaluation, that would be very funny.
Basically, mcts consicts of 3 steps.
1.Selection
2.expansion
3.simulation
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Google's AlphaGo team has been working on chess
But what AlphaZero is doing seems quite different from this:CheckersGuy wrote:the article on wikipedia actually describes this fairly well https://en.wikipedia.org/wiki/Monte_Carlo_tree_search
They are not playing out any games "to the very end". And the randomness of the selection is also quite limited (if not completely absent - the paper states that the edge is selected that maximizes an "upper confidence bound").Wikipedia wrote:The application of Monte Carlo tree search in games is based on many playouts. In each playout, the game is played out to the very end by selecting moves at random.
AlphaZero seems to use the term "simulation" for selection plus expansion. I see selection and I see expansion, but I don't see a separate "simulation".Basically, mcts consicts of 3 steps.
1.Selection
2.expansion
3.simulation
-
- Posts: 931
- Joined: Tue Mar 09, 2010 3:46 pm
- Location: New York
- Full name: Álvaro Begué (RuyDos)
Re: Google's AlphaGo team has been working on chess
Randomness has never been a requirement in the tree part of the search. The earliest description I remember of this type of algorithm with MoGo's UCT search, which picked the move that maximizes a formula called UCB1 ("UCB" stands for "Upper Confidence Bound", just like in the recent paper). Actually I imagine most implementations don't have any random element in the tree part of the playout.syzygy wrote:[...] And the randomness of the selection is also quite limited (if not completely absent - the paper states that the edge is selected that maximizes an "upper confidence bound").
AlphaGo initially combined the results of playouts with the evaluation at the leaf of the tree (that was probably its biggest innovation), and then AlphaGo Zero and AlphaZero do away with the playout altogether, using only an evaluation function. Still, the whole mechanism for growing the tree and for backing up values is the same, so it feels like the MCTS label fits.