I do get how it works but apparently you dont. The move predictions of the neural networks initially provide a strong bias. This bias gets less as the search traverses the tree more often to get to the even distribution you are talking about. This is similiar to what the uct policy does. Just look at the equations and not at the text and this is not really hard to understandMilos wrote:You really don't get how UCT without rollouts operates, do you?CheckersGuy wrote:Currently we can not make many assumption about how "deep" AlphaZero evaluates. This completly depends on how well the neural network can predict the move probabilites. If in every position the NN gave (almost) 50% probility for the two candidate moves you can get quite deep searches
Try simulating it on the paper, it would help.
It has nothing to do with NN output, even with distribution of probabilities, because algorithm tries to get as even exploration of the tree as possible.
What I can immediately tell you that in training matches (800 MCTS per move) depth is hardly over 10, and actual games against SF expected (80000 MCTS per move) depth that was reached is up to 20 at most.
The even distribution only happens after many many traverses through the tree. If you only do a few searches the bias of the nn is very high. This concept is very similiar to RAVE and/or uct which is commonly used in mcts....
In the alphaZero paper the lower the probabilites the following way. P(s,a)/1+N(s,a) where N is the number of traverses through the node and P the move probability.
Now give the candiate move 99 % probabilty and draw the picture yourself. Takes some iterations to lower the probability to encourage searching other moves