yanquis1972 wrote: ↑Wed Jul 11, 2018 4:41 pm
crem wrote: ↑Wed Jul 11, 2018 12:29 pm
Unlike previous training attempts where learning rate was reduced frequently (every 20 nets or so) and slowly (by dividing by 3), test10 tries to replicate what is believed DeepMind (change LR 2 times in total, by dividing by 10) did. With that in mind test10 did not reduce LR yet at all.
The first LR reduction will happen around network id10098 (probably testing will restart or moved to main server before that though). After LR change, progress should be fast again.
In general squeezing everything from one LR before switching to a next one is known to improve final quality of the next (at the cost of training speed).
just bc i'm curious, is that known from previous test runs, or someone elses work in the field?
is the final idea (i assume it is, since you say this run may not be reset) to start at 256? one thing i didn't get about the test runs was the lack of experimentation with the appropriate time to promote to a larger net...was the idea that 64x6 learning parameters could be carried over to the big nets?
That's some common NN learning knowledge, e.g. there are arxiv papers about that.
As we never finished any of test runs, we cannot really compare any final results. Actually intuition kind of suggests opposite to be true ("Why change stepwise if constant smooth lowering of LR seems more natural?", "Every time learning slows down, we reduce LR and it helps, so we should do that.").
But it seems that it's an area where intuition is wrong.
As for the final idea, I guess the current plan is to start generating games at 256 blocks, but from the same games train networks of other sizes (64, 128 and 196) in parallel.
Most likely there will be a reset as many things will change (cpuct in training games will be 1.7 rather than 1.2, new network file format will have more metadata, there are some changes to training process which will make it more similar to alphazero's training).