Michel wrote:gladius wrote:Daniel Shawul wrote:Backing up the average score is terribly bad for chess MCTS, which btw is what AlphaZero and Leela-chess use ... Maybe the policy network helps there but i am not so sure to start with an algorithm that is +700 elo weaker is such a good idea.
Agreed, it's certainly worth investigating other approaches, and the 700 ELO is no joke

. This quote from the paper is interesting though (even if it doesn't really have any justification):
AlphaZero evaluates positions using non-linear function approximation based on a deep neural network, rather than the linear function approximation used in typical chess programs. This provides a much more powerful representation, but may also introduce spurious approximation errors. MCTS averages over these approximation errors, which therefore tend to cancel out when evaluating a large subtree. In contrast, alpha-beta search computes an explicit minimax, which propagates the biggest approximation errors to the root of the subtree. Using MCTS may allow AlphaZero to effectively combine its neural network representations with a powerful, domain-independent search.
I am quite suspicious of that theory. It is clear that minimax also averages in some way. Otherwise searching deeper would not produce more reliable scores.
It seems to me that this theory could be tested by creating an A-B engine that would use the value head of Leela-chess's NN (and perhaps the policy head for move ordering) and comparing it to the MCTS version of Leela-chess.
Minmax does not average, the root node just takes the score of one of the leaf nodes. So theoretically going from ID depth of d to d+1, you can end up from a winning score to a loosing score.
What we do in regular alpha-beta search is to assume there is no uncertainty in the evaluation function, and grow the tree as long as time allows us. The outcomes from a shallow ID search are thrown out.
MCTS does not use this approach and mixes different ID search depths. So averaging the score of children is a way of dealing with the uncertainity of different children being searched to different depths. You would search a node to a larger depth if you visited it more.
And yet, as scorpio-mcts-min, shows using the minmax operator even comapring scores of children searched to significantly different depths gives a +600 elo boost in chess. Even in a standard alpha-beta searcher with heavy LMR + nullmove, averaging is not used to counter for the fact that children are not explored equally.
Daniel