Lc0 saturation.

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Dann Corbit
Posts: 12540
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Lc0 saturation.

Post by Dann Corbit »

What was the size of the alpha zero net?
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
jp
Posts: 1470
Joined: Mon Apr 23, 2018 7:54 am

Re: Lc0 saturation.

Post by jp »

Dann Corbit wrote: Mon Aug 20, 2018 8:31 pm What was the size of the alpha zero net?
20x256.

(So Lc0 10xxx nets are 20x256 too. Older Lc0 9xxx & 4xxx nets are 6x64.)
Werewolf
Posts: 1796
Joined: Thu Sep 18, 2008 10:24 pm

Re: Lc0 saturation.

Post by Werewolf »

JJJ wrote: Mon Aug 20, 2018 5:32 pm I heard bigger net was coming anyway. I don't think Lczero will reach the level of the big 3 with this size of net, but it might with the next one. Anyway, in one year, Lczero will be the best engine in the world.
Then why did Deep Mind stop with 20x256?
jkiliani
Posts: 143
Joined: Wed Jan 17, 2018 1:26 pm

Re: Lc0 saturation.

Post by jkiliani »

Werewolf wrote: Mon Aug 20, 2018 11:12 pm
JJJ wrote: Mon Aug 20, 2018 5:32 pm I heard bigger net was coming anyway. I don't think Lczero will reach the level of the big 3 with this size of net, but it might with the next one. Anyway, in one year, Lczero will be the best engine in the world.
Then why did Deep Mind stop with 20x256?
Why not? the 20x256 net did the job they wanted from it, which was to beat a somewhat recent Stockfish under test conditions. Why should they have expended much more resources to train an even bigger and stronger net? Google can find better uses for those TPUs, they're not that invested in board games except as a means to an end (i.e. commercially useful AI applications). They published their article and made their point, and it is now up to the Leela dev team and other chess programmers to take this as far as possible.
Werewolf
Posts: 1796
Joined: Thu Sep 18, 2008 10:24 pm

Re: Lc0 saturation.

Post by Werewolf »

The alternative, less happy, theory is that as the net gets bigger the gains are cancelled out by lower nps.
jkiliani
Posts: 143
Joined: Wed Jan 17, 2018 1:26 pm

Re: Lc0 saturation.

Post by jkiliani »

Werewolf wrote: Mon Aug 20, 2018 11:42 pm The alternative, less happy, theory is that as the net gets bigger the gains are cancelled out by lower nps.
Test10 with its 256x20 architecture represented a huge improvement over the 192x15 main net. Deepmind reported that for Alphago Zero, 256x40 ended up far stronger than 256x20. So far, all available data indicates that larger net sizes work, they just increase the initial investment of training such a net by reinforcement learning.
Dann Corbit
Posts: 12540
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Lc0 saturation.

Post by Dann Corbit »

jkiliani wrote: Mon Aug 20, 2018 11:45 pm
Werewolf wrote: Mon Aug 20, 2018 11:42 pm The alternative, less happy, theory is that as the net gets bigger the gains are cancelled out by lower nps.
Test10 with its 256x20 architecture represented a huge improvement over the 192x15 main net. Deepmind reported that for Alphago Zero, 256x40 ended up far stronger than 256x20. So far, all available data indicates that larger net sizes work, they just increase the initial investment of training such a net by reinforcement learning.
Why not go directly for 256 x 40 then?
It seems that each new stage throws away the old stuff.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
jkiliani
Posts: 143
Joined: Wed Jan 17, 2018 1:26 pm

Re: Lc0 saturation.

Post by jkiliani »

Dann Corbit wrote: Mon Aug 20, 2018 11:50 pm
jkiliani wrote: Mon Aug 20, 2018 11:45 pm
Werewolf wrote: Mon Aug 20, 2018 11:42 pm The alternative, less happy, theory is that as the net gets bigger the gains are cancelled out by lower nps.
Test10 with its 256x20 architecture represented a huge improvement over the 192x15 main net. Deepmind reported that for Alphago Zero, 256x40 ended up far stronger than 256x20. So far, all available data indicates that larger net sizes work, they just increase the initial investment of training such a net by reinforcement learning.
Why not go directly for 256 x 40 then?
It seems that each new stage throws away the old stuff.
That will happen at some point, but the prevailing sentiment right now is to get a bit more sure about the self-play and training parameters before "going all in". In addition, just today a bug was discovered that prevents all recent (including test10) networks from using the 50-move rule input plane. Once we are confident that everything works fine, I think a "big net" run will be in the cards.
Dann Corbit
Posts: 12540
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Lc0 saturation.

Post by Dann Corbit »

Re: "I think a "big net" run will be in the cards."

Pun intentional?
;-)
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Lc0 saturation.

Post by Milos »

jkiliani wrote: Mon Aug 20, 2018 11:24 pm They published their article and made their point, and it is now up to the Leela dev team and other chess programmers to take this as far as possible.
They didn't publish the article because publishing means that it has been peer-reviewed and accepted by a journal or a conference.
That crappy PR stunt preprint was obviously never accepted for publication anywhere. It was just uploaded to arxiv.org.