Re: Lc0 saturation.
Posted: Mon Aug 20, 2018 8:31 pm
What was the size of the alpha zero net?
20x256.
Why not? the 20x256 net did the job they wanted from it, which was to beat a somewhat recent Stockfish under test conditions. Why should they have expended much more resources to train an even bigger and stronger net? Google can find better uses for those TPUs, they're not that invested in board games except as a means to an end (i.e. commercially useful AI applications). They published their article and made their point, and it is now up to the Leela dev team and other chess programmers to take this as far as possible.
Test10 with its 256x20 architecture represented a huge improvement over the 192x15 main net. Deepmind reported that for Alphago Zero, 256x40 ended up far stronger than 256x20. So far, all available data indicates that larger net sizes work, they just increase the initial investment of training such a net by reinforcement learning.
Why not go directly for 256 x 40 then?jkiliani wrote: ↑Mon Aug 20, 2018 11:45 pmTest10 with its 256x20 architecture represented a huge improvement over the 192x15 main net. Deepmind reported that for Alphago Zero, 256x40 ended up far stronger than 256x20. So far, all available data indicates that larger net sizes work, they just increase the initial investment of training such a net by reinforcement learning.
That will happen at some point, but the prevailing sentiment right now is to get a bit more sure about the self-play and training parameters before "going all in". In addition, just today a bug was discovered that prevents all recent (including test10) networks from using the 50-move rule input plane. Once we are confident that everything works fine, I think a "big net" run will be in the cards.Dann Corbit wrote: ↑Mon Aug 20, 2018 11:50 pmWhy not go directly for 256 x 40 then?jkiliani wrote: ↑Mon Aug 20, 2018 11:45 pmTest10 with its 256x20 architecture represented a huge improvement over the 192x15 main net. Deepmind reported that for Alphago Zero, 256x40 ended up far stronger than 256x20. So far, all available data indicates that larger net sizes work, they just increase the initial investment of training such a net by reinforcement learning.
It seems that each new stage throws away the old stuff.
They didn't publish the article because publishing means that it has been peer-reviewed and accepted by a journal or a conference.