Lc0 saturation.

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Dann Corbit
Posts: 9772
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: Lc0 saturation.

Post by Dann Corbit » Mon Aug 20, 2018 6:31 pm

What was the size of the alpha zero net?
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

jp
Posts: 594
Joined: Mon Apr 23, 2018 5:54 am

Re: Lc0 saturation.

Post by jp » Mon Aug 20, 2018 6:34 pm

Dann Corbit wrote:
Mon Aug 20, 2018 6:31 pm
What was the size of the alpha zero net?
20x256.

(So Lc0 10xxx nets are 20x256 too. Older Lc0 9xxx & 4xxx nets are 6x64.)

Werewolf
Posts: 1175
Joined: Thu Sep 18, 2008 8:24 pm

Re: Lc0 saturation.

Post by Werewolf » Mon Aug 20, 2018 9:12 pm

JJJ wrote:
Mon Aug 20, 2018 3:32 pm
I heard bigger net was coming anyway. I don't think Lczero will reach the level of the big 3 with this size of net, but it might with the next one. Anyway, in one year, Lczero will be the best engine in the world.
Then why did Deep Mind stop with 20x256?

jkiliani
Posts: 143
Joined: Wed Jan 17, 2018 12:26 pm

Re: Lc0 saturation.

Post by jkiliani » Mon Aug 20, 2018 9:24 pm

Werewolf wrote:
Mon Aug 20, 2018 9:12 pm
JJJ wrote:
Mon Aug 20, 2018 3:32 pm
I heard bigger net was coming anyway. I don't think Lczero will reach the level of the big 3 with this size of net, but it might with the next one. Anyway, in one year, Lczero will be the best engine in the world.
Then why did Deep Mind stop with 20x256?
Why not? the 20x256 net did the job they wanted from it, which was to beat a somewhat recent Stockfish under test conditions. Why should they have expended much more resources to train an even bigger and stronger net? Google can find better uses for those TPUs, they're not that invested in board games except as a means to an end (i.e. commercially useful AI applications). They published their article and made their point, and it is now up to the Leela dev team and other chess programmers to take this as far as possible.

Werewolf
Posts: 1175
Joined: Thu Sep 18, 2008 8:24 pm

Re: Lc0 saturation.

Post by Werewolf » Mon Aug 20, 2018 9:42 pm

The alternative, less happy, theory is that as the net gets bigger the gains are cancelled out by lower nps.

jkiliani
Posts: 143
Joined: Wed Jan 17, 2018 12:26 pm

Re: Lc0 saturation.

Post by jkiliani » Mon Aug 20, 2018 9:45 pm

Werewolf wrote:
Mon Aug 20, 2018 9:42 pm
The alternative, less happy, theory is that as the net gets bigger the gains are cancelled out by lower nps.
Test10 with its 256x20 architecture represented a huge improvement over the 192x15 main net. Deepmind reported that for Alphago Zero, 256x40 ended up far stronger than 256x20. So far, all available data indicates that larger net sizes work, they just increase the initial investment of training such a net by reinforcement learning.

Dann Corbit
Posts: 9772
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: Lc0 saturation.

Post by Dann Corbit » Mon Aug 20, 2018 9:50 pm

jkiliani wrote:
Mon Aug 20, 2018 9:45 pm
Werewolf wrote:
Mon Aug 20, 2018 9:42 pm
The alternative, less happy, theory is that as the net gets bigger the gains are cancelled out by lower nps.
Test10 with its 256x20 architecture represented a huge improvement over the 192x15 main net. Deepmind reported that for Alphago Zero, 256x40 ended up far stronger than 256x20. So far, all available data indicates that larger net sizes work, they just increase the initial investment of training such a net by reinforcement learning.
Why not go directly for 256 x 40 then?
It seems that each new stage throws away the old stuff.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

jkiliani
Posts: 143
Joined: Wed Jan 17, 2018 12:26 pm

Re: Lc0 saturation.

Post by jkiliani » Mon Aug 20, 2018 10:19 pm

Dann Corbit wrote:
Mon Aug 20, 2018 9:50 pm
jkiliani wrote:
Mon Aug 20, 2018 9:45 pm
Werewolf wrote:
Mon Aug 20, 2018 9:42 pm
The alternative, less happy, theory is that as the net gets bigger the gains are cancelled out by lower nps.
Test10 with its 256x20 architecture represented a huge improvement over the 192x15 main net. Deepmind reported that for Alphago Zero, 256x40 ended up far stronger than 256x20. So far, all available data indicates that larger net sizes work, they just increase the initial investment of training such a net by reinforcement learning.
Why not go directly for 256 x 40 then?
It seems that each new stage throws away the old stuff.
That will happen at some point, but the prevailing sentiment right now is to get a bit more sure about the self-play and training parameters before "going all in". In addition, just today a bug was discovered that prevents all recent (including test10) networks from using the 50-move rule input plane. Once we are confident that everything works fine, I think a "big net" run will be in the cards.

Dann Corbit
Posts: 9772
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: Lc0 saturation.

Post by Dann Corbit » Mon Aug 20, 2018 10:22 pm

Re: "I think a "big net" run will be in the cards."

Pun intentional?
;-)
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

Milos
Posts: 3347
Joined: Wed Nov 25, 2009 12:47 am

Re: Lc0 saturation.

Post by Milos » Mon Aug 20, 2018 11:52 pm

jkiliani wrote:
Mon Aug 20, 2018 9:24 pm
They published their article and made their point, and it is now up to the Leela dev team and other chess programmers to take this as far as possible.
They didn't publish the article because publishing means that it has been peer-reviewed and accepted by a journal or a conference.
That crappy PR stunt preprint was obviously never accepted for publication anywhere. It was just uploaded to arxiv.org.

Post Reply