how alphazero gathers data?

OfekShochat · Post by **OfekShochat** » Tue Oct 27, 2020 1:39 pm

I wonder how alpha zero (or lc0) gathers data from self-played games. is it random positions from the games?
thanks

Daniel Shawul · Post by **Daniel Shawul** » Tue Oct 27, 2020 1:58 pm

Only a fraction of positions, say 50%, is sampled from a game to avoid the bad effect of correlated positions on the value head.
Recent versions of A0 even go as far as sampling only 12% of the positions.

The selfplay games are conducted with a temperature of 1 upto move 15, i.e. moves are randomly sampled based on their visit ratio
of the MCTS search. From move 15 onwards, the best move is selected i.e. 0 temperature. The first half guarantees you get a different game, much like random sampling from a big opening book. The second half is done for accurate evaluation of endgames --- although lc0 uses a low temperature there too.

hgm · Post by **hgm** » Tue Oct 27, 2020 1:59 pm

These engines are trained by playing very fast games against itself, with very small search trees. The search algorithm they use is such that branches that have a very bad score for one player compared to the tree root are hardly searched, and most effort goes into the reasonable branches, or the difficult-to-judge branches (which need a lot of search before it is obvious that they are good or bad).

The neural network they contain is then adjusted in the direction of better predictions for how large the sub-tree for each move will be, and (with another output) to better predict the result of the entire game in which the searched position occurred.

OfekShochat · Post by **OfekShochat** » Tue Oct 27, 2020 2:39 pm

thanks! im actually making a nn mcts engine. so that helps!

how alphazero gathers data?

how alphazero gathers data?

Re: how alphazero gathers data?

Re: how alphazero gathers data?

Re: how alphazero gathers data?