TalkChess.com

Posted: **Mon Aug 20, 2018 8:31 pm**

What was the size of the alpha zero net?

Posted: **Mon Aug 20, 2018 8:34 pm**

Dann Corbit wrote: ↑Mon Aug 20, 2018 8:31 pm What was the size of the alpha zero net?

20x256.

(So Lc0 10xxx nets are 20x256 too. Older Lc0 9xxx & 4xxx nets are 6x64.)

Posted: **Mon Aug 20, 2018 11:12 pm**

JJJ wrote: ↑Mon Aug 20, 2018 5:32 pm I heard bigger net was coming anyway. I don't think Lczero will reach the level of the big 3 with this size of net, but it might with the next one. Anyway, in one year, Lczero will be the best engine in the world.

Then why did Deep Mind stop with 20x256?

Posted: **Mon Aug 20, 2018 11:24 pm**

Werewolf wrote: ↑Mon Aug 20, 2018 11:12 pm
JJJ wrote: ↑Mon Aug 20, 2018 5:32 pm I heard bigger net was coming anyway. I don't think Lczero will reach the level of the big 3 with this size of net, but it might with the next one. Anyway, in one year, Lczero will be the best engine in the world.
Then why did Deep Mind stop with 20x256?

Why not? the 20x256 net did the job they wanted from it, which was to beat a somewhat recent Stockfish under test conditions. Why should they have expended much more resources to train an even bigger and stronger net? Google can find better uses for those TPUs, they're not that invested in board games except as a means to an end (i.e. commercially useful AI applications). They published their article and made their point, and it is now up to the Leela dev team and other chess programmers to take this as far as possible.

Posted: **Mon Aug 20, 2018 11:42 pm**

The alternative, less happy, theory is that as the net gets bigger the gains are cancelled out by lower nps.

Posted: **Mon Aug 20, 2018 11:45 pm**

Werewolf wrote: ↑Mon Aug 20, 2018 11:42 pm The alternative, less happy, theory is that as the net gets bigger the gains are cancelled out by lower nps.

Test10 with its 256x20 architecture represented a huge improvement over the 192x15 main net. Deepmind reported that for Alphago Zero, 256x40 ended up far stronger than 256x20. So far, all available data indicates that larger net sizes work, they just increase the initial investment of training such a net by reinforcement learning.

Posted: **Mon Aug 20, 2018 11:50 pm**

jkiliani wrote: ↑Mon Aug 20, 2018 11:45 pm
Werewolf wrote: ↑Mon Aug 20, 2018 11:42 pm The alternative, less happy, theory is that as the net gets bigger the gains are cancelled out by lower nps.
Test10 with its 256x20 architecture represented a huge improvement over the 192x15 main net. Deepmind reported that for Alphago Zero, 256x40 ended up far stronger than 256x20. So far, all available data indicates that larger net sizes work, they just increase the initial investment of training such a net by reinforcement learning.

Why not go directly for 256 x 40 then?
It seems that each new stage throws away the old stuff.

Posted: **Tue Aug 21, 2018 12:19 am**

Dann Corbit wrote: ↑Mon Aug 20, 2018 11:50 pm
jkiliani wrote: ↑Mon Aug 20, 2018 11:45 pm
Werewolf wrote: ↑Mon Aug 20, 2018 11:42 pm The alternative, less happy, theory is that as the net gets bigger the gains are cancelled out by lower nps.
Test10 with its 256x20 architecture represented a huge improvement over the 192x15 main net. Deepmind reported that for Alphago Zero, 256x40 ended up far stronger than 256x20. So far, all available data indicates that larger net sizes work, they just increase the initial investment of training such a net by reinforcement learning.
Why not go directly for 256 x 40 then?
It seems that each new stage throws away the old stuff.

That will happen at some point, but the prevailing sentiment right now is to get a bit more sure about the self-play and training parameters before "going all in". In addition, just today a bug was discovered that prevents all recent (including test10) networks from using the 50-move rule input plane. Once we are confident that everything works fine, I think a "big net" run will be in the cards.

Posted: **Tue Aug 21, 2018 12:22 am**

Re: "I think a "big net" run will be in the cards."

Pun intentional?

Posted: **Tue Aug 21, 2018 1:52 am**

jkiliani wrote: ↑Mon Aug 20, 2018 11:24 pm They published their article and made their point, and it is now up to the Leela dev team and other chess programmers to take this as far as possible.

They didn't publish the article because publishing means that it has been peer-reviewed and accepted by a journal or a conference.
That crappy PR stunt preprint was obviously never accepted for publication anywhere. It was just uploaded to arxiv.org.

TalkChess.com

Lc0 saturation.

Re: Lc0 saturation.

Re: Lc0 saturation.

Re: Lc0 saturation.

Re: Lc0 saturation.

Re: Lc0 saturation.

Re: Lc0 saturation.

Re: Lc0 saturation.

Re: Lc0 saturation.

Re: Lc0 saturation.

Re: Lc0 saturation.