LCZero: Progress and Scaling. Relation to CCRL Elo

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Laskos »

JJJ wrote:Thanks you Kai for running your test and confirming the progress. Any graph soon to have a better picture ?
I collected some data I have, the dots represent an average of 2 versions around that ID. The red lines are 1 standard deviation in the results, up and down.
The first data point is for ID124 and ID125, which were still based on "smallnet" weights of ID122. ID123 was the first "bignet", but it was weak. The dip (regression) until ID143 was due to v0.5 and v0.6 bug, the progress since ID143 or so was with the introduction of v0.7. From ID160 to ID170-172 I was thinking that it flattens, but it is not so.

Image
duncan
Posts: 12038
Joined: Mon Jul 07, 2008 10:50 pm

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by duncan »

Laskos wrote:Now (just tested ID184 in 200 games, it came as the strongest, but the error margins are pretty large), combining IDs 160 and 163 compared to combined 182 and 184, the progress is clear and significant, 50-60 +/- 30 Elo points. In just 3 days. The good work is going on, and there seem to be no additional large bugs with v0.7.

I wanted to see the scaling with STS 1-15 (1500 position) "Strategic Test Suite". It still contains some tactics, and in my view is computer over-analyzed, but is a good guide to see the scaling in positional strength. So I performed tests at 1s/position and 4s/position with comparable in strength in these conditions LC0 on 4 cores and standard A/B engines Greko 6.5 (2330 CCRL) and Fruit 2.1 (2685 CCRl).

Greko 6.5
1s/position:
score=835/1500 [averages on correct positions: depth=4.4 time=0.10 nodes=231938]
4s/position:
score=888/1500 [averages on correct positions: depth=5.1 time=0.32 nodes=733864]
+53 points improvement

Fruit 2.1
1s/position:
score=1047/1500 [averages on correct positions: depth=4.2 time=0.08 nodes=173742]
4s/position:
score=1102/1500 [averages on correct positions: depth=5.1 time=0.28 nodes=637360]
+55 points improvement


LC0 ID182
1s/position:
score=767/1500 [averages on correct positions: depth=9.2 time=0.19 nodes=46]
4s/position:
score=949/1500 [averages on correct positions: depth=10.6 time=0.63 nodes=170]
+182 points improvement


LC0 is a completely different animal, it scales much better on STS than standard A/B engines.
so to draw level with fruit 2.1 it would need about 18 sec ?
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Laskos »

duncan wrote:
Laskos wrote:Now (just tested ID184 in 200 games, it came as the strongest, but the error margins are pretty large), combining IDs 160 and 163 compared to combined 182 and 184, the progress is clear and significant, 50-60 +/- 30 Elo points. In just 3 days. The good work is going on, and there seem to be no additional large bugs with v0.7.

I wanted to see the scaling with STS 1-15 (1500 position) "Strategic Test Suite". It still contains some tactics, and in my view is computer over-analyzed, but is a good guide to see the scaling in positional strength. So I performed tests at 1s/position and 4s/position with comparable in strength in these conditions LC0 on 4 cores and standard A/B engines Greko 6.5 (2330 CCRL) and Fruit 2.1 (2685 CCRl).

Greko 6.5
1s/position:
score=835/1500 [averages on correct positions: depth=4.4 time=0.10 nodes=231938]
4s/position:
score=888/1500 [averages on correct positions: depth=5.1 time=0.32 nodes=733864]
+53 points improvement

Fruit 2.1
1s/position:
score=1047/1500 [averages on correct positions: depth=4.2 time=0.08 nodes=173742]
4s/position:
score=1102/1500 [averages on correct positions: depth=5.1 time=0.28 nodes=637360]
+55 points improvement


LC0 ID182
1s/position:
score=767/1500 [averages on correct positions: depth=9.2 time=0.19 nodes=46]
4s/position:
score=949/1500 [averages on correct positions: depth=10.6 time=0.63 nodes=170]
+182 points improvement


LC0 is a completely different animal, it scales much better on STS than standard A/B engines.
so to draw level with fruit 2.1 it would need about 18 sec ?
Yes, about 20s, on this hardware (CPU LC0). With a good GPU, it might be 2s on this suite.
whereagles
Posts: 565
Joined: Thu Nov 13, 2014 12:03 pm

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by whereagles »

Great job by Kai, as always. But what's up with that new pic? You look like a guy who just finished filming an A-team blow-up scene :D
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Milos »

Laskos wrote:
JJJ wrote:Thanks you Kai for running your test and confirming the progress. Any graph soon to have a better picture ?
I collected some data I have, the dots represent an average of 2 versions around that ID. The red lines are 1 standard deviation in the results, up and down.
The first data point is for ID124 and ID125, which were still based on "smallnet" weights of ID122. ID123 was the first "bignet", but it was weak. The dip (regression) until ID143 was due to v0.5 and v0.6 bug, the progress since ID143 or so was with the introduction of v0.7. From ID160 to ID170-172 I was thinking that it flattens, but it is not so.

Image
Problem is your testing is too unreliable. For example I tested 2 points extensively (1000 games, proper testing condition LC0 500 playout vs. SF with fixed depth 8, not some puny engine, some questionable test sets or ridiculously small TCs) and difference between ID150 and ID160 is exactly 5% i.e. 35Elo.
In your case it is close to 100Elo so your data is most certainly largely inflated.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Laskos »

whereagles wrote:Great job by Kai, as always. But what's up with that new pic? You look like a guy who just finished filming an A-team blow-up scene :D
:lol:

That IIRC was taken by my girlfriend last Summer-early Autumn in Corfu town. I have to admit, like you, some people did look at me with some caution :).

Meanwhile, I tentatively checked where the development comes from ID160 to ID187.

20s/position on 4 CPU cores in all below cases.

Positional opening test-suite (200 positions)
ID160: 117/200
ID187: 118/200
No significant improvement, and they are both at above 3200 CCRL Elo level standard engines.

Tactical middlegame test suite ECM200 (200 positions)
ID160: 75/200
ID187: 88/200
Significant improvement, but still below 2100 CCRL Elo level standard engines.

It seems LC0 now learns tactics.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Laskos »

Milos wrote:
Laskos wrote:
JJJ wrote:Thanks you Kai for running your test and confirming the progress. Any graph soon to have a better picture ?
I collected some data I have, the dots represent an average of 2 versions around that ID. The red lines are 1 standard deviation in the results, up and down.
The first data point is for ID124 and ID125, which were still based on "smallnet" weights of ID122. ID123 was the first "bignet", but it was weak. The dip (regression) until ID143 was due to v0.5 and v0.6 bug, the progress since ID143 or so was with the introduction of v0.7. From ID160 to ID170-172 I was thinking that it flattens, but it is not so.

Image
Problem is your testing is too unreliable. For example I tested 2 points extensively (1000 games, proper testing condition LC0 500 playout vs. SF with fixed depth 8, not some puny engine, some questionable test sets or ridiculously small TCs) and difference between ID150 and ID160 is exactly 5% i.e. 35Elo.
In your case it is close to 100Elo so your data is most certainly largely inflated.
That's correct. I use one weak engine as an opponent, and 1 core at 1s/move, which would be 100-200 playouts per move. Also, data points have generally only 400 games (2 different IDs with 200 games each). The possibility of rating difference inflation is there. I am looking mainly for trends than for absolute Elo. Observe also that red lines are 1SD, not 2SD. Also, one particular ID might be off the general curves.
User avatar
Guenther
Posts: 4607
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Guenther »

Michel wrote:Here it seems to suggest that id170

https://docs.google.com/spreadsheets/d/ ... edit#gid=0

is of a similar level as Fruit 2.1 at 1min+1s.

The error bars are quite big but it seems there was real progress from id160 to id170.

This list has a lot of untrustworthy conditions, I would have liked to copy all of them, but the copy function for the sheet is disabled...

Just a few:

- very bad adjudication rules (draw after 100 moves, win adjudication after +3 from both sides - hint: LCZero is/was known for being able to lose from very high positive score)
- no real calculations available just a general hint to +-50 error bars
- just a vague guess who played whom
- no games available
- no benchmarks available
- cerebellum book (10 plies) as start position, which is known for extremely low variation
https://rwbc-chess.de

trollwatch:
Talkchess nowadays is a joke - it is full of trolls/idiots/people stuck in the pleistocene > 80% of the posts fall into this category...
JJJ
Posts: 1346
Joined: Sat Apr 19, 2014 1:47 pm

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by JJJ »

I m not sure the FIDE estimate is accurate either, since no human could defeat recent version of LCzero, so probably none can defeat the ID 190 version.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Laskos »

The latest progress, to ID196:

Image

Only that I got a problem with ID195 and ID196 (I combined them for the last datapoint):

Code: Select all

Games Completed = 200 of 200 (Avg game length = 106.079 sec)
Settings = Gauntlet/64MB/1000ms per move/M 9000cp for 30 moves, D 150 moves/EPD:C:\LittleBlitzer\3moves_GM_04.epd(817)
Time = 5628 sec elapsed, 0 sec remaining
 1.  LCZero CPU ID195         	120.0/200	98-58-44  	(L: m=58 t=0 i=0 a=0)	(D: r=33 i=4 f=3 s=2 a=2)	(tpm=954.4 d=12.54 nps=183)
 2.  Jabba 1.0                	80.0/200	58-98-44  	(L: m=98 t=0 i=0 a=0)	(D: r=33 i=4 f=3 s=2 a=2)	(tpm=803.9 d=9.21 nps=0)

Games Completed = 200 of 200 (Avg game length = 100.590 sec)
Settings = Gauntlet/64MB/1000ms per move/M 9000cp for 30 moves, D 150 moves/EPD:C:\LittleBlitzer\3moves_GM_04.epd(817)
Time = 5358 sec elapsed, 0 sec remaining
 1.  LCZero CPU ID196         	96.0/200	81-89-30  	(L: m=89 t=0 i=0 a=0)	(D: r=27 i=1 f=0 s=1 a=1)	(tpm=952.2 d=12.59 nps=106)
 2.  Jabba 1.0                	104.0/200	89-81-30  	(L: m=81 t=0 i=0 a=0)	(D: r=27 i=1 f=0 s=1 a=1)	(tpm=803.4 d=9.02 nps=0)
Standard deviation in the _difference_ between results is about 9 points (not Elo points). Here it is 120 - 96 for two consecutive IDs and is equal to 24. Either my results for consecutive versions show a 2.7 standard deviations fluke (in their self-games they have even a progression), or their nets can play possibly significantly worse while showing progression. I combined these two inconsistent results for IDs as an average for the last datapoint.