There is likely a problem with the training pipeline, i.e. something with the setup always-promote networks, cyclic learning rate schedule with current values, step counts used etc. is not working very well. There's a lot of discussion going on as to the likely reasons why, and we can expect to see procedural changes soon (which everyone hopes will work)mirek wrote:Oh no, any explanation why is this happening? Is it already a reason for concern or is it still OK? And at which point if the trend continues will it become reason for concern? Shouldn't have the larger net just skyrocketed in performance? Could it be that if trained on the larger net all the way from the beginning the progress could have been higher?Laskos wrote: Here is a recent result of Carlos Canavessi till ID138, it also shows a significant regression.
LCZero: Progress and Scaling. Relation to CCRL Elo
Moderators: hgm, Rebel, chrisw
-
- Posts: 143
- Joined: Wed Jan 17, 2018 1:26 pm
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
-
- Posts: 213
- Joined: Thu Dec 16, 2010 4:39 pm
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
Indeed, no progress anymore when AlphaZero was progressing smoothly.
I'm aware Google is indirectly supporting this project by authorizing people to create several clients aiming at a freely available TPU (in shared mode).
However, it might be more useful to have some DeepMind insider involved with the still controversial Chess AlphaZero results to be authorized to review and comment the Leela code to pinpoint eventual implementations errors.
It's now almost 6 months that we had that much-hyped preprint article about AlphaZero over arxiv and still no update.
Time for DeepMind/Google to somewhat open up this technology to make it really credible, IMHO.
I'm aware Google is indirectly supporting this project by authorizing people to create several clients aiming at a freely available TPU (in shared mode).
However, it might be more useful to have some DeepMind insider involved with the still controversial Chess AlphaZero results to be authorized to review and comment the Leela code to pinpoint eventual implementations errors.
It's now almost 6 months that we had that much-hyped preprint article about AlphaZero over arxiv and still no update.
Time for DeepMind/Google to somewhat open up this technology to make it really credible, IMHO.
Per ardua ad astra
-
- Posts: 143
- Joined: Wed Jan 17, 2018 1:26 pm
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
We can't rely on Deepmind to solve our problems for us. The experience with them is that they don't give out any information except what they publish. The differences between AlphaZero and Alphago Zero in the training pipeline are actually substantial enough that there are a number of things that could be tried from the Alphago Zero setup. I'm confident we'll find a solution to these roadblocks by ourselves soon enough, Leela Zero (Go) is doing just fine for example, following in their footsteps is still a viable path if the current setup has problems.melajara wrote:Indeed, no progress anymore when AlphaZero was progressing smoothly.
I'm aware Google is indirectly supporting this project by authorizing people to create several clients aiming at a freely available TPU (in shared mode).
However, it might be more useful to have some DeepMind insider involved with the still controversial Chess AlphaZero results to be authorized to review and comment the Leela code to pinpoint eventual implementations errors.
It's now almost 6 months that we had that much-hyped preprint article about AlphaZero over arxiv and still no update.
Time for DeepMind/Google to somewhat open up this technology to make it really credible, IMHO.
-
- Posts: 1142
- Joined: Thu Dec 28, 2017 4:06 pm
- Location: Argentina
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
The gauntlet is currently at 54/127, way lower than the 12x networks I tested before. We'll see how Leela evolves from here.
Follow my tournament and some Leela gauntlets live at http://twitch.tv/ccls
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
In Dutch we say "meten is weten" (to measure is to know).There is likely a problem with the training pipeline, i.e. something with the setup always-promote networks, cyclic learning rate schedule with current values, step counts used etc. is not working very well. There's a lot of discussion going on as to the likely reasons why, and we can expect to see procedural changes soon (which everyone hopes will work)
Before you do anything you should obtain reliable elo measurements....
For classical chess engines self play elo correlates very well with elo against foreign opponents. From the test results that are posted here it is not clear that this is also true for LC0.
I would propose that the matches are done against some version of SF and that the results are considered with proper error bars. Since the matches are not used for validation, less frequent matches could be organized with higher resolution.
Several TC's should be tried to get an idea about the scaling behaviour of LC0 (it is also not clear that this is the same as for classical engines).
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
I don't know what they bootstrap and tune to get new nets there, but the latest ID140 is again among the weakest, significantly weaker than ID124, a 100+ Elo loss compared to ID124 (the second "bignet", which improved greatly over ID123, the first "bignet" based solely on "smallnet" weights). And that after more than a million games since ID124. My tests now are at short TC and on CPU, but they seem to be of some representativeness.Michel wrote:In Dutch we say "meten is weten" (to measure is to know).There is likely a problem with the training pipeline, i.e. something with the setup always-promote networks, cyclic learning rate schedule with current values, step counts used etc. is not working very well. There's a lot of discussion going on as to the likely reasons why, and we can expect to see procedural changes soon (which everyone hopes will work)
Before you do anything you should obtain reliable elo measurements....
For classical chess engines self play elo correlates very well with elo against foreign opponents. From the test results that are posted here it is not clear that this is also true for LC0.
I would propose that the matches are done against some version of SF and that the results are considered with proper error bars. Since the matches are not used for validation, less frequent matches could be organized with higher resolution.
Several TC's should be tried to get an idea about the scaling behaviour of LC0 (it is also not clear that this is the same as for classical engines).
There were some questions put about the openings, whether LC0 really plays better the openings compared to later parts of the games. In the past, with v0.4, I was able to test EPD suites, and the "smallnet" showed excellent results on positional opening suite, close to top engines. Also showed deplorable results in tactical testsuites, worse than very weak engines. With v0.6 and "bignets", I am unable to test EPD suites in Polyglot or GUIs, it seems PV is not outputted in standard form (at fixed time, at least). But to the question whether LC0 plays better in openings (general play, i.e positional + tactical), I can answer:
I took ID124 "bignet" with v0.6 client, one of the best nets, and performed 2 tests: from 3-mover balanced opening suite and from 8-mover balanced opening suite (5 moves difference). I pitted at short TC (1s/move) ID124 against similar in strength (somewhat stronger at this TC) stable standard engine Jabba 1.0.
3-mover result:
Code: Select all
Games Completed = 500 of 500 (Avg game length = 103.231 sec)
Settings = Gauntlet/64MB/1000ms per move/M 5500cp for 30 moves, D 150 moves/EPD:C:\LittleBlitzer\3moves_GM_04.epd(817)
Time = 14275 sec elapsed, 0 sec remaining
1. LCZero CPU ID124 172.5/500 116-271-113 (L: m=271 t=0 i=0 a=0) (D: r=82 i=13 f=13 s=2 a=3) (tpm=948.5 d=12.49 nps=230)
2. Jabba 1.0 327.5/500 271-116-113 (L: m=116 t=0 i=0 a=0) (D: r=82 i=13 f=13 s=2 a=3) (tpm=802.5 d=9.11 nps=0)
8-mover result:
Code: Select all
Games Completed = 500 of 500 (Avg game length = 90.792 sec)
Settings = Gauntlet/64MB/1000ms per move/M 9000cp for 30 moves, D 150 moves/EPD:C:\LittleBlitzer\8moves_v7.epd(21067)
Time = 12713 sec elapsed, 0 sec remaining
1. LCZero CPU ID124 142.5/500 89-304-107 (L: m=304 t=0 i=0 a=0) (D: r=88 i=11 f=5 s=2 a=1) (tpm=949.2 d=12.49 nps=180)
2. Jabba 1.0 357.5/500 304-89-107 (L: m=89 t=0 i=0 a=0) (D: r=88 i=11 f=5 s=2 a=1) (tpm=803.0 d=9.10 nps=0)
So, although Jabba 1.0 is stronger generally at this TC, LC0 in just 5 more moves in the opening left alone against Jabba, gains a whopping 50 Elo points (about 20 Elo points standard deviation of the difference). It is remarkable, as those 5 moves in the opening are not all positional, there are some tactics involved too, at which LC0 is notoriously weak. So, yes, LC0 performs (significantly) better in openings compared to later parts of the game.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
Again, no real progress from ID160 to ID173, in 700,000 games. Official graph also shows the stall, although many times it is not representative of the real progress.
The scaling in time (and hardware) of LC0 is much better than that of A/B engines. So, if one talks of LC0 CCRL Elo, one should specify the time control and hardware. For example, on 1 CPU core at 1s/move, it is about 2000 CCRL Elo points, but on a GPU like Nvidia 1080 Ti at LTC (say 2 minutes/move) it might be 2750 CCRL Elo points. I tried see how this better scaling can be seen in two cases, using two very different test-suites:
Scaling:
1/ Tactical middlegame ECM200:
==========
LC0 ID160:
2s:
score=56/200 [averages on correct positions: depth=10.6 time=0.46 nodes=123]
20s:
score=75/200 [averages on correct positions: depth=13.2 time=3.15 nodes=1107]
+19
==========
GreKo 6.5 (2330 CCRL)
2s:
score=110/200 [averages on correct positions: depth=5.8 time=0.19 nodes=454689]
20s:
score=143/200 [averages on correct positions: depth=7.3 time=1.91 nodes=4718200]
+33
==========
Predateur 2.2.1 (1786 CCRL)
2s:
score=88/200 [averages on correct positions: depth=7.0 time=0.30 nodes=923547]
20s:
score=107/200 [averages on correct positions: depth=8.0 time=2.15 nodes=6568689]
+19
==========
LC0 doesn't seem to scale better than standard A/B engines on tactical middlegame test-suite.
2/ Positional opening suite (200 positions)
==========
LC0 ID160
2s:
score=92/200 [averages on correct positions: depth=8.3 time=0.16 nodes=41]
20s:
score=117/200 [averages on correct positions: depth=11.0 time=2.18 nodes=694]
+25
==========
GreKo 6.5 (2330 CCRL)
2s:
score=72/200 [averages on correct positions: depth=4.8 time=0.17 nodes=325872]
20s:
score=78/200 [averages on correct positions: depth=6.9 time=1.64 nodes=3262982]
+6
==========
Andscacs 0.93 (3308 CCRL)
2s:
score=113/200 [averages on correct positions: depth=9.9 time=0.23 nodes=924945]
20s:
score=126/200 [averages on correct positions: depth=13.0 time=2.26 nodes=8718487]
+13
==========
LC0 seems to scale significantly better than standard A/B engines on positional opening test-suite.
=================================================================
The result somehow puzzles me. It seems that letting analyzing (searching) for longer time on stronger hardware improves more the positional understanding of LC0 than tactical one. Somebody can explain what happens, or these results are meaningless? It would seem that monstrous hardware wouldn't help by extremely much the serious tactical deficiency of LC0. It would help, but it seems to help even more the positional understanding. If A0 was so strong on tactics, then positionally it might have been a monster, if my results here mean something. Well, OTOH, the newer nets might improve dramatically tactically (it doesn't seem to be the case up to now, but who knows), so it's a speculation.
The scaling in time (and hardware) of LC0 is much better than that of A/B engines. So, if one talks of LC0 CCRL Elo, one should specify the time control and hardware. For example, on 1 CPU core at 1s/move, it is about 2000 CCRL Elo points, but on a GPU like Nvidia 1080 Ti at LTC (say 2 minutes/move) it might be 2750 CCRL Elo points. I tried see how this better scaling can be seen in two cases, using two very different test-suites:
Scaling:
1/ Tactical middlegame ECM200:
==========
LC0 ID160:
2s:
score=56/200 [averages on correct positions: depth=10.6 time=0.46 nodes=123]
20s:
score=75/200 [averages on correct positions: depth=13.2 time=3.15 nodes=1107]
+19
==========
GreKo 6.5 (2330 CCRL)
2s:
score=110/200 [averages on correct positions: depth=5.8 time=0.19 nodes=454689]
20s:
score=143/200 [averages on correct positions: depth=7.3 time=1.91 nodes=4718200]
+33
==========
Predateur 2.2.1 (1786 CCRL)
2s:
score=88/200 [averages on correct positions: depth=7.0 time=0.30 nodes=923547]
20s:
score=107/200 [averages on correct positions: depth=8.0 time=2.15 nodes=6568689]
+19
==========
LC0 doesn't seem to scale better than standard A/B engines on tactical middlegame test-suite.
2/ Positional opening suite (200 positions)
==========
LC0 ID160
2s:
score=92/200 [averages on correct positions: depth=8.3 time=0.16 nodes=41]
20s:
score=117/200 [averages on correct positions: depth=11.0 time=2.18 nodes=694]
+25
==========
GreKo 6.5 (2330 CCRL)
2s:
score=72/200 [averages on correct positions: depth=4.8 time=0.17 nodes=325872]
20s:
score=78/200 [averages on correct positions: depth=6.9 time=1.64 nodes=3262982]
+6
==========
Andscacs 0.93 (3308 CCRL)
2s:
score=113/200 [averages on correct positions: depth=9.9 time=0.23 nodes=924945]
20s:
score=126/200 [averages on correct positions: depth=13.0 time=2.26 nodes=8718487]
+13
==========
LC0 seems to scale significantly better than standard A/B engines on positional opening test-suite.
=================================================================
The result somehow puzzles me. It seems that letting analyzing (searching) for longer time on stronger hardware improves more the positional understanding of LC0 than tactical one. Somebody can explain what happens, or these results are meaningless? It would seem that monstrous hardware wouldn't help by extremely much the serious tactical deficiency of LC0. It would help, but it seems to help even more the positional understanding. If A0 was so strong on tactics, then positionally it might have been a monster, if my results here mean something. Well, OTOH, the newer nets might improve dramatically tactically (it doesn't seem to be the case up to now, but who knows), so it's a speculation.
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
Here it seems to suggest that id170
https://docs.google.com/spreadsheets/d/ ... edit#gid=0
is of a similar level as Fruit 2.1 at 1min+1s.
The error bars are quite big but it seems there was real progress from id160 to id170.
https://docs.google.com/spreadsheets/d/ ... edit#gid=0
is of a similar level as Fruit 2.1 at 1min+1s.
The error bars are quite big but it seems there was real progress from id160 to id170.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
Hmm, interesting. He has a good GPU (still probably not 1080 Ti), but with these nps and time control (1'+ 1'') I expected a 2450 CCRL Elo performance or so. Isn't his 2685 CCRL Elo too much? I think in TCEC conditions, or at TCEC time control on 1080 Ti driven by 4 cores, if that performance was true, it would go close to 3000 CCRL Elo points. But in TCEC itself, an old, but still strong LC0 performs 300-400 Elo points weaker than 3000-3050 CCRL Elo level engines.Michel wrote:Here it seems to suggest that id170
https://docs.google.com/spreadsheets/d/ ... edit#gid=0
is of a similar level as Fruit 2.1 at 1min+1s.
The error bars are quite big but it seems there was real progress from id160 to id170.
His error margins (2SD?) are some 50 Elo points, my would be 30, if combining my results for 160 and 163, compared to combined 170 and 173. All in all, might be a progress, but I believe not by much.
-
- Posts: 1627
- Joined: Thu Mar 09, 2006 12:35 pm
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
For ID 170, watching its games is a big step forward!Laskos wrote: Hmm, interesting. He has a good GPU (still probably not 1080 Ti), but with these nps and time control (1'+ 1'') I expected a 2450 CCRL Elo performance or so. Isn't his 2685 CCRL Elo too much?
Look for example here also.
2700 CCRL ELO seems very likely on good GPU.
https://docs.google.com/spreadsheets/d/ ... edit#gid=0
Code: Select all
v7 slowmover125 id157 laser 1.0 2728 ccrl 40/1 Score of lczero v7 id157 slowmover 125 vs Laser-1_0: 11 - 19 - 13 [0.407]
Elo difference: -65.40 +/- 89.61
v7 slowmover125 id160 laser 1.0 40/1 Score of lczero v7 id160 slowmover 125 vs Laser-1_0: 5 - 10 - 6 [0.381]
Elo difference: -84.34 +/- 135.12
v7 slowmover125 id162 laser 1.0 40/1 Score of lczero v7 id162 slowmover 125 vs Laser-1_0: 32 - 40 - 29 [0.460]
Elo difference: -27.58 +/- 57.77
v7 slowmover120 id164 laser 1.0 40/1 Score of lczero v7 id164 slowmover 120 vs Laser-1_0: 26 - 25 - 8 [0.508]
Elo difference: 5.89 +/- 83.91
v7 slowmover120 id170 laser 1.0 40/1 Score of lczero v7 id170 slowmover 120 vs Laser-1_0: 27 - 21 - 14 [0.548]
Elo difference: 33.73 +/- 77.54
v7 slowmover120 id171 laser 1.0 4cpu 2800 ccrl 40/1 Score of lczero v7 id171 slowmover 120 vs Laser-1_0 4cpu: 19 - 30 - 10 [0.407]
Elo difference: -65.54 +/- 83.56
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
"Is it a boy or girl?"
YES! He replied.....