comparing minimax and averaging MCTS with alphabeta rollouts

Daniel Shawul · Post by **Daniel Shawul** » Tue Mar 20, 2018 7:37 pm

I did a comparison of the two backup operators (minmax and averaging) for MCTS, and the fullwidth alpha-beta rollouts with nullmove + LMR.

Preliminary results of an on-going round robin tournament at 20+0.1

Code: Select all

 1.                         scorpio-mcts-ab    232     41     41    367    74.8%   -76    19.3%
 2.                        scorpio-mcts-min    203     40     40    366    71.9%   -67    22.4%
 3.                    scorpio-mcts-min-old     32     39     39    367    48.9%    -9    24.3%
 4.                        scorpio-mcts-avg   -466     76     76    366     4.4%   156     3.3%

The pure MCTS with MINMAX backup operators (scorpio-mcts-min) is now dangerously close to scorpio-mcts-ab (alphabeta rollouts with nullmove and lmr). scorpio-mcts-min-old is the older version of the MINMAX backup operator that also did a little bit of averaging. What it does is it picks the child with the highest score but then the parents statistics is averaged instead of being set the the negative of the best child's score. When i made it to backup purely the minimax score by doing the later, i got a +170 jump in strength, and now it is rivaling alpha-beta rollouts with nullmove + lmr.

Backing up the average score is terribly bad for chess MCTS, which btw is what AlphaZero and Leela-chess use ... Maybe the policy network helps there but i am not so sure to start with an algorithm that is +700 elo weaker is such a good idea.

Danel

noobpwnftw · Post by **noobpwnftw** » Tue Mar 20, 2018 8:40 pm

I honestly think that it is just not ideal to use MCTS architecture in chess.

The results that manifested otherwise does not essentially prove MCTS is better suited for chess but actually a possible indication that move ordering is more important than PV depth.

So would you like to try something like using shallow a/b searches(which will eventually do full eval on every position) to sort moves?

The way we order moves is fast and brief, and 0 actually did the opposite, this is what I think fundamentally different on how to shape the search tree if you take away all the other fanciness.

gladius · Post by **gladius** » Tue Mar 20, 2018 11:02 pm

Daniel Shawul wrote:Backing up the average score is terribly bad for chess MCTS, which btw is what AlphaZero and Leela-chess use ... Maybe the policy network helps there but i am not so sure to start with an algorithm that is +700 elo weaker is such a good idea.

Agreed, it's certainly worth investigating other approaches, and the 700 ELO is no joke

. This quote from the paper is interesting though (even if it doesn't really have any justification):

AlphaZero evaluates positions using non-linear function approximation based on a deep neural network, rather than the linear function approximation used in typical chess programs. This provides a much more powerful representation, but may also introduce spurious approximation errors. MCTS averages over these approximation errors, which therefore tend to cancel out when evaluating a large subtree. In contrast, alpha-beta search computes an explicit minimax, which propagates the biggest approximation errors to the root of the subtree. Using MCTS may allow AlphaZero to effectively combine its neural network representations with a powerful, domain-independent search.

jhellis3 · Post by **jhellis3** » Tue Mar 20, 2018 11:36 pm

This is quite interesting. It would also be useful to see how the approaches scale vs each other with increasing time.

Daniel Shawul · Post by **Daniel Shawul** » Wed Mar 21, 2018 2:46 am

Final result with 20+0.1

Code: Select all

 1.                         scorpio-mcts-ab    245     25     25   1002    75.3%   -81    19.2%
 2.                        scorpio-mcts-min    208     24     24   1002    71.5%   -68    24.2%
 3.                    scorpio-mcts-min-old     42     24     24   1002    49.8%   -13    20.6%
 4.                        scorpio-mcts-avg   -493     50     50   1002     3.4%   165     3.2%

I will increase the time control to 60+0.1 and report the result.

Michel · Post by **Michel** » Wed Mar 21, 2018 8:05 am

gladius wrote:
Daniel Shawul wrote:Backing up the average score is terribly bad for chess MCTS, which btw is what AlphaZero and Leela-chess use ... Maybe the policy network helps there but i am not so sure to start with an algorithm that is +700 elo weaker is such a good idea.
Agreed, it's certainly worth investigating other approaches, and the 700 ELO is no joke . This quote from the paper is interesting though (even if it doesn't really have any justification):

AlphaZero evaluates positions using non-linear function approximation based on a deep neural network, rather than the linear function approximation used in typical chess programs. This provides a much more powerful representation, but may also introduce spurious approximation errors. MCTS averages over these approximation errors, which therefore tend to cancel out when evaluating a large subtree. In contrast, alpha-beta search computes an explicit minimax, which propagates the biggest approximation errors to the root of the subtree. Using MCTS may allow AlphaZero to effectively combine its neural network representations with a powerful, domain-independent search.

I am quite suspicious of that theory. It is clear that minimax also averages in some way. Otherwise searching deeper would not produce more reliable scores.

It seems to me that this theory could be tested by creating an A-B engine that would use the value head of Leela-chess's NN (and perhaps the policy head for move ordering) and comparing it to the MCTS version of Leela-chess.

Daniel Shawul · Post by **Daniel Shawul** » Wed Mar 21, 2018 2:46 pm

Here are the results with a longer time control of 60+0.1

Code: Select all

 1.                         scorpio-mcts-ab    303     26     26   1000    80.4%  -100    18.4%
 2.                        scorpio-mcts-min    208     24     24   1000    71.3%   -69    27.6%
 3.                    scorpio-mcts-min-old    -16     24     24   1000    44.2%     6    22.0%
 4.                        scorpio-mcts-avg   -493     43     43   1000     4.1%   165     5.6%

The gap has now increased to an +800 elo. It seems averaging hurts more with longer time control. The gap between alpha-beta rollouts and minmax-MCTS has alos increased to +90 elo now.

Daniel

Daniel Shawul · Post by **Daniel Shawul** » Thu Mar 22, 2018 6:10 pm

Two issues with the current averaging scheme

a) I was averaging raw scores instead of winning probabilities i.e. logistic(score). So computing the winning probabilities with logistic(score), averaging them, and then convert back to scores with logit(p), I get a slightly improved performance for mcts-avg which is about -570 elos compared to mcts-ab.

b) Unfortunately, i was using a minimax operator at the leaves for _all_ schemes by mistake, hence the mcts-avg backup operator was benefitting from that. When i made it to use averaging at the leaves, the elo went down to -800 or maybe even more.

Another issue is that what AlphaZero does to compute average is to keep nodes visit count (N) and total scores (W). This does not forget old score of a node once it becomes expanded. I believe the average should be computed from scratch from the scores of the children each time we backup. That is the reason why we have scorpio-mcts-min-old because i was doing averaging with old scores after i compute the minmax score for the children. If not doing that benefits the MINIMAX backup method by 170 elos, it should do the same for the AVG backup operator as well. This is like averaging scores of two different iterative deepening searches. We just forget the old ID search and use the latest.

Daniel

Daniel Shawul · Post by **Daniel Shawul** » Thu Mar 22, 2018 6:25 pm

Michel wrote:
gladius wrote:
Daniel Shawul wrote:Backing up the average score is terribly bad for chess MCTS, which btw is what AlphaZero and Leela-chess use ... Maybe the policy network helps there but i am not so sure to start with an algorithm that is +700 elo weaker is such a good idea.
Agreed, it's certainly worth investigating other approaches, and the 700 ELO is no joke . This quote from the paper is interesting though (even if it doesn't really have any justification):

AlphaZero evaluates positions using non-linear function approximation based on a deep neural network, rather than the linear function approximation used in typical chess programs. This provides a much more powerful representation, but may also introduce spurious approximation errors. MCTS averages over these approximation errors, which therefore tend to cancel out when evaluating a large subtree. In contrast, alpha-beta search computes an explicit minimax, which propagates the biggest approximation errors to the root of the subtree. Using MCTS may allow AlphaZero to effectively combine its neural network representations with a powerful, domain-independent search.
I am quite suspicious of that theory. It is clear that minimax also averages in some way. Otherwise searching deeper would not produce more reliable scores.

It seems to me that this theory could be tested by creating an A-B engine that would use the value head of Leela-chess's NN (and perhaps the policy head for move ordering) and comparing it to the MCTS version of Leela-chess.

Minmax does not average, the root node just takes the score of one of the leaf nodes. So theoretically going from ID depth of d to d+1, you can end up from a winning score to a loosing score.
What we do in regular alpha-beta search is to assume there is no uncertainty in the evaluation function, and grow the tree as long as time allows us. The outcomes from a shallow ID search are thrown out.

MCTS does not use this approach and mixes different ID search depths. So averaging the score of children is a way of dealing with the uncertainity of different children being searched to different depths. You would search a node to a larger depth if you visited it more.

And yet, as scorpio-mcts-min, shows using the minmax operator even comapring scores of children searched to significantly different depths gives a +600 elo boost in chess. Even in a standard alpha-beta searcher with heavy LMR + nullmove, averaging is not used to counter for the fact that children are not explored equally.

Daniel

IQ · Post by IQ » Fri Mar 23, 2018 9:42 am

Daniel Shawul wrote:And yet, as scorpio-mcts-min, shows using the minmax operator even comapring scores of children searched to significantly different depths gives a +600 elo boost in chess. Even in a standard alpha-beta searcher with heavy LMR + nullmove, averaging is not used to counter for the fact that children are not explored equally.

I pretty firmly believe that deep mind just wanted to show that the same framework is able to "conquer" three different games. I think that the Alpha-Zero DNN as an evaluation function with a traditional tree search, instead of the averaging MCTS would be even stronger. Their argument about "averaging" out errors seems rather unconvincing without further evidence. But as they already got good results with the same MCTS approach as in the other games (Go, Shogi) they probably didn't even see the need to investigate how their DNN would behave in a more traditional tree search and just made an off-hand remark to justify it.

comparing minimax and averaging MCTS with alphabeta rollouts

comparing minimax and averaging MCTS with alphabeta rollouts

Re: comparing minimax and averaging MCTS with alphabeta roll

Re: comparing minimax and averaging MCTS with alphabeta roll

Re: comparing minimax and averaging MCTS with alphabeta roll

Re: comparing minimax and averaging MCTS with alphabeta roll

Re: comparing minimax and averaging MCTS with alphabeta roll

Re: comparing minimax and averaging MCTS with alphabeta roll

Re: comparing minimax and averaging MCTS with alphabeta roll

Re: comparing minimax and averaging MCTS with alphabeta roll

Re: comparing minimax and averaging MCTS with alphabeta roll