Discussion of chess software programming and technical issues.
Moderators: hgm, Harvey Williamson, bob
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.

Daniel Shawul
 Posts: 3504
 Joined: Tue Mar 14, 2006 10:34 am
 Location: Ethiopia

Contact:
Post
by Daniel Shawul » Tue Mar 20, 2018 6:37 pm
I did a comparison of the two backup operators (minmax and averaging) for MCTS, and the fullwidth alphabeta rollouts with nullmove + LMR.
Preliminary results of an ongoing round robin tournament at 20+0.1
Code: Select all
1. scorpiomctsab 232 41 41 367 74.8% 76 19.3%
2. scorpiomctsmin 203 40 40 366 71.9% 67 22.4%
3. scorpiomctsminold 32 39 39 367 48.9% 9 24.3%
4. scorpiomctsavg 466 76 76 366 4.4% 156 3.3%
The pure MCTS with MINMAX backup operators (scorpiomctsmin) is now dangerously close to scorpiomctsab (alphabeta rollouts with nullmove and lmr). scorpiomctsminold is the older version of the MINMAX backup operator that also did a little bit of averaging. What it does is it picks the child with the highest score but then the parents statistics is averaged instead of being set the the negative of the best child's score. When i made it to backup purely the minimax score by doing the later, i got a +170 jump in strength, and now it is rivaling alphabeta rollouts with nullmove + lmr.
Backing up the average score is terribly bad for chess MCTS, which btw is what AlphaZero and Leelachess use ... Maybe the policy network helps there but i am not so sure to start with an algorithm that is +700 elo weaker is such a good idea.
Danel

noobpwnftw
 Posts: 237
 Joined: Sun Nov 08, 2015 10:10 pm
Post
by noobpwnftw » Tue Mar 20, 2018 7:40 pm
I honestly think that it is just not ideal to use MCTS architecture in chess.
The results that manifested otherwise does not essentially prove MCTS is better suited for chess but actually a possible indication that move ordering is more important than PV depth.
So would you like to try something like using shallow a/b searches(which will eventually do full eval on every position) to sort moves?
The way we order moves is fast and brief, and 0 actually did the opposite, this is what I think fundamentally different on how to shape the search tree if you take away all the other fanciness.

gladius
 Posts: 535
 Joined: Tue Dec 12, 2006 9:10 am
Post
by gladius » Tue Mar 20, 2018 10:02 pm
Daniel Shawul wrote:Backing up the average score is terribly bad for chess MCTS, which btw is what AlphaZero and Leelachess use ... Maybe the policy network helps there but i am not so sure to start with an algorithm that is +700 elo weaker is such a good idea.
Agreed, it's certainly worth investigating other approaches, and the 700 ELO is no joke
. This quote from the paper is interesting though (even if it doesn't really have any justification):
AlphaZero evaluates positions using nonlinear function approximation based on a deep neural network, rather than the linear function approximation used in typical chess programs. This provides a much more powerful representation, but may also introduce spurious approximation errors. MCTS averages over these approximation errors, which therefore tend to cancel out when evaluating a large subtree. In contrast, alphabeta search computes an explicit minimax, which propagates the biggest approximation errors to the root of the subtree. Using MCTS may allow AlphaZero to effectively combine its neural network representations with a powerful, domainindependent search.

jhellis3
 Posts: 348
 Joined: Fri Aug 16, 2013 10:36 pm
Post
by jhellis3 » Tue Mar 20, 2018 10:36 pm
This is quite interesting. It would also be useful to see how the approaches scale vs each other with increasing time.

Daniel Shawul
 Posts: 3504
 Joined: Tue Mar 14, 2006 10:34 am
 Location: Ethiopia

Contact:
Post
by Daniel Shawul » Wed Mar 21, 2018 1:46 am
Final result with 20+0.1
Code: Select all
1. scorpiomctsab 245 25 25 1002 75.3% 81 19.2%
2. scorpiomctsmin 208 24 24 1002 71.5% 68 24.2%
3. scorpiomctsminold 42 24 24 1002 49.8% 13 20.6%
4. scorpiomctsavg 493 50 50 1002 3.4% 165 3.2%
I will increase the time control to 60+0.1 and report the result.

Michel
 Posts: 1966
 Joined: Sun Sep 28, 2008 11:50 pm
Post
by Michel » Wed Mar 21, 2018 7:05 am
gladius wrote:Daniel Shawul wrote:Backing up the average score is terribly bad for chess MCTS, which btw is what AlphaZero and Leelachess use ... Maybe the policy network helps there but i am not so sure to start with an algorithm that is +700 elo weaker is such a good idea.
Agreed, it's certainly worth investigating other approaches, and the 700 ELO is no joke
. This quote from the paper is interesting though (even if it doesn't really have any justification):
AlphaZero evaluates positions using nonlinear function approximation based on a deep neural network, rather than the linear function approximation used in typical chess programs. This provides a much more powerful representation, but may also introduce spurious approximation errors. MCTS averages over these approximation errors, which therefore tend to cancel out when evaluating a large subtree. In contrast, alphabeta search computes an explicit minimax, which propagates the biggest approximation errors to the root of the subtree. Using MCTS may allow AlphaZero to effectively combine its neural network representations with a powerful, domainindependent search.
I am quite suspicious of that theory. It is clear that minimax also averages in some way. Otherwise searching deeper would not produce more reliable scores.
It seems to me that this theory could be tested by creating an AB engine that would use the value head of Leelachess's NN (and perhaps the policy head for move ordering) and comparing it to the MCTS version of Leelachess.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

Daniel Shawul
 Posts: 3504
 Joined: Tue Mar 14, 2006 10:34 am
 Location: Ethiopia

Contact:
Post
by Daniel Shawul » Wed Mar 21, 2018 1:46 pm
Here are the results with a longer time control of 60+0.1
Code: Select all
1. scorpiomctsab 303 26 26 1000 80.4% 100 18.4%
2. scorpiomctsmin 208 24 24 1000 71.3% 69 27.6%
3. scorpiomctsminold 16 24 24 1000 44.2% 6 22.0%
4. scorpiomctsavg 493 43 43 1000 4.1% 165 5.6%
The gap has now increased to an +800 elo. It seems averaging hurts more with longer time control. The gap between alphabeta rollouts and minmaxMCTS has alos increased to +90 elo now.
Daniel

Daniel Shawul
 Posts: 3504
 Joined: Tue Mar 14, 2006 10:34 am
 Location: Ethiopia

Contact:
Post
by Daniel Shawul » Thu Mar 22, 2018 5:10 pm
Two issues with the current averaging scheme
a) I was averaging raw scores instead of winning probabilities i.e. logistic(score). So computing the winning probabilities with logistic(score), averaging them, and then convert back to scores with logit(p), I get a slightly improved performance for mctsavg which is about 570 elos compared to mctsab.
b) Unfortunately, i was using a minimax operator at the leaves for _all_ schemes by mistake, hence the mctsavg backup operator was benefitting from that. When i made it to use averaging at the leaves, the elo went down to 800 or maybe even more.
Another issue is that what AlphaZero does to compute average is to keep nodes visit count (N) and total scores (W). This does not forget old score of a node once it becomes expanded. I believe the average should be computed from scratch from the scores of the children each time we backup. That is the reason why we have scorpiomctsminold because i was doing averaging with old scores after i compute the minmax score for the children. If not doing that benefits the MINIMAX backup method by 170 elos, it should do the same for the AVG backup operator as well. This is like averaging scores of two different iterative deepening searches. We just forget the old ID search and use the latest.
Daniel

Daniel Shawul
 Posts: 3504
 Joined: Tue Mar 14, 2006 10:34 am
 Location: Ethiopia

Contact:
Post
by Daniel Shawul » Thu Mar 22, 2018 5:25 pm
Michel wrote:gladius wrote:Daniel Shawul wrote:Backing up the average score is terribly bad for chess MCTS, which btw is what AlphaZero and Leelachess use ... Maybe the policy network helps there but i am not so sure to start with an algorithm that is +700 elo weaker is such a good idea.
Agreed, it's certainly worth investigating other approaches, and the 700 ELO is no joke
. This quote from the paper is interesting though (even if it doesn't really have any justification):
AlphaZero evaluates positions using nonlinear function approximation based on a deep neural network, rather than the linear function approximation used in typical chess programs. This provides a much more powerful representation, but may also introduce spurious approximation errors. MCTS averages over these approximation errors, which therefore tend to cancel out when evaluating a large subtree. In contrast, alphabeta search computes an explicit minimax, which propagates the biggest approximation errors to the root of the subtree. Using MCTS may allow AlphaZero to effectively combine its neural network representations with a powerful, domainindependent search.
I am quite suspicious of that theory. It is clear that minimax also averages in some way. Otherwise searching deeper would not produce more reliable scores.
It seems to me that this theory could be tested by creating an AB engine that would use the value head of Leelachess's NN (and perhaps the policy head for move ordering) and comparing it to the MCTS version of Leelachess.
Minmax does not average, the root node just takes the score of one of the leaf nodes. So theoretically going from ID depth of d to d+1, you can end up from a winning score to a loosing score.
What we do in regular alphabeta search is to assume there is no uncertainty in the evaluation function, and grow the tree as long as time allows us. The outcomes from a shallow ID search are thrown out.
MCTS does not use this approach and mixes different ID search depths. So averaging the score of children is a way of dealing with the uncertainity of different children being searched to different depths. You would search a node to a larger depth if you visited it more.
And yet, as scorpiomctsmin, shows using the minmax operator even comapring scores of children searched to significantly different depths gives a +600 elo boost in chess. Even in a standard alphabeta searcher with heavy LMR + nullmove, averaging is not used to counter for the fact that children are not explored equally.
Daniel

IQ
 Posts: 161
 Joined: Thu Dec 17, 2009 9:46 am
Post
by IQ » Fri Mar 23, 2018 8:42 am
Daniel Shawul wrote:And yet, as scorpiomctsmin, shows using the minmax operator even comapring scores of children searched to significantly different depths gives a +600 elo boost in chess. Even in a standard alphabeta searcher with heavy LMR + nullmove, averaging is not used to counter for the fact that children are not explored equally.
I pretty firmly believe that deep mind just wanted to show that the same framework is able to "conquer" three different games. I think that the AlphaZero DNN as an evaluation function with a traditional tree search, instead of the averaging MCTS would be even stronger. Their argument about "averaging" out errors seems rather unconvincing without further evidence. But as they already got good results with the same MCTS approach as in the other games (Go, Shogi) they probably didn't even see the need to investigate how their DNN would behave in a more traditional tree search and just made an offhand remark to justify it.