A Simple Alpha(Go) Zero Tutorial
Moderators: hgm, Dann Corbit, Harvey Williamson
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.

 Posts: 368
 Joined: Sat May 05, 2012 12:48 pm
 Location: Bergheim
Re: A Simple Alpha(Go) Zero Tutorial
Great find. Thank you.
For followup, note links (from above):
https://github.com/suragnair/alphazerogeneral
Downloadable paper:
https://github.com/suragnair/alphazero ... riteup.pdf
For followup, note links (from above):
https://github.com/suragnair/alphazerogeneral
Downloadable paper:
https://github.com/suragnair/alphazero ... riteup.pdf
Re: A Simple Alpha(Go) Zero Tutorial
"It assumes basic familiarity with machine learning and reinforcement learning concepts, and should be accessible if you understand neural network basics and Monte Carlo Tree Search. "BeyondCritics wrote:http://web.stanford.edu/~surag/posts/alphazero.html
I guess "simple" is in the mind of the beholder
Re: A Simple Alpha(Go) Zero Tutorial
I'm using 1 + 2 / (1 + Exp(sum) in the output layer to get v(s) value's between [1, 1] but Exp is now consuming most of all processing time.
Are there faster alternatives ?
Are there faster alternatives ?

 Posts: 4073
 Joined: Tue Mar 14, 2006 10:34 am
 Location: Ethiopia
 Contact:
Re: A Simple Alpha(Go) Zero Tutorial
ReLU is used after the convolution steps  which leads to faster convergence and also faster computation time, but you are bound to sigmoid or tanh in the fully connecteld layers
 hgm
 Posts: 25600
 Joined: Fri Mar 10, 2006 9:06 am
 Location: Amsterdam
 Full name: H G Muller
 Contact:
Re: A Simple Alpha(Go) Zero Tutorial
Just tabulate the function, so that it requires only an array access.Henk wrote:I'm using 1 + 2 / (1 + Exp(sum) in the output layer to get v(s) value's between [1, 1] but Exp is now consuming most of all processing time.
Are there faster alternatives ?
Re: A Simple Alpha(Go) Zero Tutorial
I haven't started with implementing convolution steps. Might be that these layers make it even much slower. So better not optimize yet.Daniel Shawul wrote:ReLU is used after the convolution steps  which leads to faster convergence and also faster computation time, but you are bound to sigmoid or tanh in the fully connecteld layers
Re: A Simple Alpha(Go) Zero Tutorial
Argument is a double. Or you mean (make it discrete) first convert it into an integer and then do a lookup to get an approximation.hgm wrote:Just tabulate the function, so that it requires only an array access.Henk wrote:I'm using 1 + 2 / (1 + Exp(sum) in the output layer to get v(s) value's between [1, 1] but Exp is now consuming most of all processing time.
Are there faster alternatives ?
 hgm
 Posts: 25600
 Joined: Fri Mar 10, 2006 9:06 am
 Location: Amsterdam
 Full name: H G Muller
 Contact:
Re: A Simple Alpha(Go) Zero Tutorial
For inference 8bit integers seem to be enough for cell outputs. This at least is what the Google gen1 TPUs use. Only for the backpropagation during training a better precision is needed. Of course this means that the weight x output products can be 16 bit, and a number of those will be summed to act as input to the sigmoid layer.
But it should not cause any problems to do a piecewise linear approximation of the sigmoid. E.g. quantize the input in 256 intervals, and tabulate both the function and its derivative in each interval.
The coursest approximation of the sigmoid would be to just clip f(x) = x at 1 and +1. Even that might work.
But it should not cause any problems to do a piecewise linear approximation of the sigmoid. E.g. quantize the input in 256 intervals, and tabulate both the function and its derivative in each interval.
The coursest approximation of the sigmoid would be to just clip f(x) = x at 1 and +1. Even that might work.
Re: A Simple Alpha(Go) Zero Tutorial
In Monte Carlo Tree search I'm using PUCT. But I don't know what would be a reasonable value for exploration constant C ( degree of exploration). Would it be more like 0.9 or 0.1 or something else ?