Post subject: Re: Google's AlphaGo team has been working on chess    Posted: Wed Dec 13, 2017 9:28 pm

Milos wrote:
 CheckersGuy wrote: I do get how it works but apparently you dont. The move predictions of the neural networks initially provide a strong bias. This bias gets less as the search traverses the tree more often to get to the even distribution you are talking about. This is similiar to what the uct policy does. Just look at the equations and not at the text and this is not really hard to understand The even distribution only happens after many many traverses through the tree. If you only do a few searches the bias of the nn is very high. This concept is very similiar to RAVE and/or uct which is commonly used in mcts.... In the alphaZero paper the lower the probabilites the following way. P(s,a)/1+N(s,a) where N is the number of traverses through the node and P the move probability. Now give the candiate move 99 % probabilty and draw the picture yourself. Takes some iterations to lower the probability to encourage searching other moves

So many wrong things.
In the beginning of the training NN weights are random so output probabilities are random and uniform which gives exactly the shallowest possible tree.
Even for highly selective NN (later in training), in each node there are at least few best moves that have similar probability 0.5-0.7 (and A0 uses UCT policy that selects other candidates much more often than original UCB1 - which you wrote up even though you omitted sqrt of sum of total visit count). 0.99 never happens unless NN actually sees a mate in few moves, so your example is totally irrelevant.

I was obviously talking about the trained network and at the first few iterations the move probabilites provide a stronger bias. 0.99 may be a little high. As for a few moves having 0.5-07 probability I dont think so. Because the some of all the probabilites should be 1. If you had more than two moves with a probability of 0.5-0.7 this would be wrong already
