Michael Sherwin wrote:How do you know that?
It is what the paper says. Only keep the sub-tree after the moves played in the game. If you start a new game the tree from the last move of the previous game is entirely useless, as the initial position is not in there. And there were no moves before the beginning from which you could take a sub-tree.
And I can tell you that it makes no sense. It is far superior to maintain the entire tree and use it over and over.
It makes perfect sense to me. Because the purpose of the training games is to measure the quality of the NN response, and how you have to tweek it afterwards such that it starts to prefer moves that are good for winning. Not to play good games despite of a sucking NN, based on statistics. Because that would mask the failures of the NN to a large extent, so that you wouldn't know what to tweek to make it better.
The NN would become saturated and start losing valuable data. Also the reinforcement learning in the tree would be thin if deleted every game. There is no way that A0 could have that result against SF with out a lot of deep retained learning.
You cannot possibly know that. They say it is possible and that they did it. Who do you think I should believe?
And that can't all be held in the NN. It would have to retain all the data from all the training games to do what it did.
I have no idea why you think that. All that is needed to beat Stockfish is play better Chess. Stockfish at longer TC would be able to convicingly beat Stockfish at at faster TC (time odds), without the need for having a learn file.
Besides, if every game was lost and every game changed the NN then the NN would end up based on the very last few games and all the first games would lose all effect in the NN and would have been worthless.
Not at all. It depends on the learning-rate parameter, and with so many games to learn from this was definitely set quite low. So effectively each game changes the NN only very little, and it takes very long (i.e. many games) to completely erase the effects from earlier games. And of course it is very good that it completely forgets all the games played when its Elo was still much below from what it eventually gets. The WDL statistics of those is very unreliable, because the quality of play sucks. They were only good for discovering the coursest concepts, like that it is better to have a Queen than a Knight, but the large number of blunders that made it possible to discover this by frequently losing a Queen for a Knight would mask the more subtle evaluation terms with noise. Once the course terms are learned, they will not be forgotten; in absense of evidence to the contrary they would just stay at their optimal value.