I wonder how alpha zero (or lc0) gathers data from self-played games. is it random positions from the games?
thanks
how alphazero gathers data?
Moderator: Ras
-
OfekShochat
- Posts: 50
- Joined: Thu Oct 15, 2020 10:19 am
- Full name: Ofek Shochat
-
Daniel Shawul
- Posts: 4186
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: how alphazero gathers data?
Only a fraction of positions, say 50%, is sampled from a game to avoid the bad effect of correlated positions on the value head.
Recent versions of A0 even go as far as sampling only 12% of the positions.
The selfplay games are conducted with a temperature of 1 upto move 15, i.e. moves are randomly sampled based on their visit ratio
of the MCTS search. From move 15 onwards, the best move is selected i.e. 0 temperature. The first half guarantees you get a different game, much like random sampling from a big opening book. The second half is done for accurate evaluation of endgames --- although lc0 uses a low temperature there too.
Recent versions of A0 even go as far as sampling only 12% of the positions.
The selfplay games are conducted with a temperature of 1 upto move 15, i.e. moves are randomly sampled based on their visit ratio
of the MCTS search. From move 15 onwards, the best move is selected i.e. 0 temperature. The first half guarantees you get a different game, much like random sampling from a big opening book. The second half is done for accurate evaluation of endgames --- although lc0 uses a low temperature there too.
Last edited by Daniel Shawul on Tue Oct 27, 2020 2:00 pm, edited 1 time in total.
-
hgm
- Posts: 28429
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: how alphazero gathers data?
These engines are trained by playing very fast games against itself, with very small search trees. The search algorithm they use is such that branches that have a very bad score for one player compared to the tree root are hardly searched, and most effort goes into the reasonable branches, or the difficult-to-judge branches (which need a lot of search before it is obvious that they are good or bad).
The neural network they contain is then adjusted in the direction of better predictions for how large the sub-tree for each move will be, and (with another output) to better predict the result of the entire game in which the searched position occurred.
The neural network they contain is then adjusted in the direction of better predictions for how large the sub-tree for each move will be, and (with another output) to better predict the result of the entire game in which the searched position occurred.
-
OfekShochat
- Posts: 50
- Joined: Thu Oct 15, 2020 10:19 am
- Full name: Ofek Shochat
Re: how alphazero gathers data?
thanks! im actually making a nn mcts engine. so that helps!