A Simple Alpha(Go) Zero Tutorial

Daniel Shawul · Post by **Daniel Shawul** » Fri Jan 05, 2018 6:37 pm

If you are not training a policy network (P=1), then C=1 should be fine.

Henk · Post by **Henk** » Fri Jan 05, 2018 7:13 pm

Daniel Shawul wrote:If you are not training a policy network (P=1), then C=1 should be fine.

And if you are training a policy network ?

Daniel Shawul · Post by **Daniel Shawul** » Fri Jan 05, 2018 7:29 pm

Well P(s,a) depends on the branching factor of the particular game. Say in chess you have 20 moves on average, then a uniform P(S,a) would be 1/20, and your C would have to be adjusted accordingly for the same level of exploration. There is a theoretical optimal C=sqrt(2) for pure UCT (i.e. no biases for moves) which has the sqrt(log(n)/n_i) formula in stead of sqrt(n)/1+n_i. A0 actually tunes C along with its other hyperparameters. I found very low exploration coefficient with minmax style updates works better for Chess that is more tactical than Go.

trulses · Post by **trulses** » Tue Jan 09, 2018 2:35 pm

Henk wrote:
Daniel Shawul wrote:If you are not training a policy network (P=1), then C=1 should be fine.
And if you are training a policy network ?

This number should also depend on the number of simulations you're running, you're only going to get asymptotic behavior with some significant number of simulations. To extract useful information on a very low number of simulations you might need to lower the exploration constant quite a bit. If your searches are extremely shallow they might as well be greedy, if you don't do this the policy labels that come out will just look like uniform noise.

However, if you are using "some significant number" of simulations just ignore everything I just said. In any event, monitor the policy labels (this is N(s, a)/N(s) if you're following the paper) and see if you're happy with them, easiest way to tune this constant.

A Simple Alpha(Go) Zero Tutorial

Re: A Simple Alpha(Go) Zero Tutorial

Re: A Simple Alpha(Go) Zero Tutorial

Re: A Simple Alpha(Go) Zero Tutorial

Re: A Simple Alpha(Go) Zero Tutorial