[Moderation] This thread was split off from the original LCZero update thread ( http://talkchess.com/forum/viewtopic.ph ... &start=260 ), and meant to continue the discussion, because the other was getting unmanageably long.
lucasart wrote:
Sorry if that's a dumb newbie question. I'm not familiar at all with NN.
But wouldn't it save a lof of time to:
* first train the net on high quality games (ie. Stockfish level)
* then you'd start with a reasonably strong net to improve with reinforced learning
The original AlphaGo approach was indeed to train a NN using supervised learning on top-level human games, and improve this NN with self-play using reinforcement learning (it's more complicated than that, since they had NN for both the move selection and the position evaluation).
The AlphaGoZero and AlphaZero approaches started from scratch using self-play reinforcement learning. The claim is that this ultimately leads to better results. Apparently, top-human level play has some biases that prevents or slows down progress beyond a certain level of play. Another surprising claim is that the full self-play algorithm actually requires less games to get to the top level.
lucasart wrote:
Sorry if that's a dumb newbie question. I'm not familiar at all with NN.
But wouldn't it save a lof of time to:
* first train the net on high quality games (ie. Stockfish level)
* then you'd start with a reasonably strong net to improve with reinforced learning
The original AlphaGo approach was indeed to train a NN using supervised learning on top-level human games, and improve this NN with self-play using reinforcement learning (it's more complicated than that, since they had NN for both the move selection and the position evaluation).
The AlphaGoZero and AlphaZero approaches started from scratch using self-play reinforcement learning. The claim is that this ultimately leads to better results. Apparently, top-human level play has some biases that prevents or slows down progress beyond a certain level of play. Another surprising claim is that the full self-play algorithm actually requires less games to get to the top level.
if bias is a difference from perfect play then random play has a bigger bias than top human level.
Fixing the errors that random play has seems to me an harder task than fixing the errors of top humans.
lucasart wrote:
Sorry if that's a dumb newbie question. I'm not familiar at all with NN.
But wouldn't it save a lof of time to:
* first train the net on high quality games (ie. Stockfish level)
* then you'd start with a reasonably strong net to improve with reinforced learning
The original AlphaGo approach was indeed to train a NN using supervised learning on top-level human games, and improve this NN with self-play using reinforcement learning (it's more complicated than that, since they had NN for both the move selection and the position evaluation).
The AlphaGoZero and AlphaZero approaches started from scratch using self-play reinforcement learning. The claim is that this ultimately leads to better results. Apparently, top-human level play has some biases that prevents or slows down progress beyond a certain level of play. Another surprising claim is that the full self-play algorithm actually requires less games to get to the top level.
if bias is a difference from perfect play then random play has a bigger bias than top human level.
Fixing the errors that random play has seems to me an harder task than fixing the errors of top humans.
"Bias" does not mean the same thing as "imperfection". A bias is a systematic difference from imperfection, i.e. a difference that has a directional component. This is pertinent because it could cause the network to get stuck in a local minimum some distance away from the global optimum.
CMCanavessi wrote:In the end, TSCP proved to still be too much for poor Leela Gen 20, even at 40/40. The match ended 8-2 in favor of TSCP.
I'll upload the full pgn in a minute
TSCP is 1700Elo on CCRL40/4. Error margins are huge, but 8:2 score is at least 200Elo difference for TSCP.
That means LeelaZero is more than 2000 Elo behind SF9.
Still extremely long way to go.