LCZero: Progress and Scaling. Relation to CCRL Elo

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Laskos »

George Tsavdaris wrote:
Laskos wrote: Hmm, interesting. He has a good GPU (still probably not 1080 Ti), but with these nps and time control (1'+ 1'') I expected a 2450 CCRL Elo performance or so. Isn't his 2685 CCRL Elo too much?
For ID 170, watching its games is a big step forward!

Look for example here also.
2700 CCRL ELO seems very likely on good GPU.
https://docs.google.com/spreadsheets/d/ ... edit#gid=0

Code: Select all

v7 slowmover125	id157	laser 1.0 2728 ccrl 40/1	Score of lczero v7 id157 slowmover 125 vs Laser-1_0: 11 - 19 - 13 [0.407]
Elo difference: -65.40 +/- 89.61
		
v7 slowmover125	id160	laser 1.0		40/1	Score of lczero v7 id160 slowmover 125 vs Laser-1_0: 5 - 10 - 6 [0.381]	
Elo difference: -84.34 +/- 135.12
		
v7 slowmover125	id162	laser 1.0		40/1	Score of lczero v7 id162 slowmover 125 vs Laser-1_0: 32 - 40 - 29 [0.460]
Elo difference: -27.58 +/- 57.77		

v7 slowmover120	id164	laser 1.0		40/1	Score of lczero v7 id164 slowmover 120 vs Laser-1_0: 26 - 25 - 8 [0.508]
Elo difference: 5.89 +/- 83.91		

v7 slowmover120	id170	laser 1.0		40/1	Score of lczero v7 id170 slowmover 120 vs Laser-1_0: 27 - 21 - 14 [0.548]
Elo difference: 33.73 +/- 77.54		

v7 slowmover120	id171	laser 1.0 4cpu	2800 ccrl 40/1	Score of lczero v7 id171 slowmover 120 vs Laser-1_0 4cpu: 19 - 30 - 10 [0.407]
Elo difference: -65.54 +/- 83.56			
Ok, I am maybe messing up something having crap GPU, and using 4 core i7 CPU with games. Then, I haven't looked at TCEC conditions. First, I was comparing 4 core CCRL rating of standard engines with 1080 Ti GPU LC0 (which might be even stronger than that which runs on TCEC CPU). But TCEC for standard engines is not on 4 cores, it is on 43 cores. Then, TCEC time control now is rapid instead of LTC.

So, yes, in absolute ratings on a good GPU I am probably off, which would mean that LC0 scales even better than I measured. On my CPU (400-800 nps) at 10s/move (so, 4000-8000 playouts per move), it is rated at about 2400 CCRL Elo (against 2 standard engines of similar strength). With 1080 Ti, LC0 nps can be 4k-8k, and at 1'+ 1'' it can mean 10000+ playouts.

But I think relative improvement I still can measure, and it doesn't seem to be very much higher since ID160 (say 40 Elo points or so).
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Laskos »

Laskos wrote:
Michel wrote:Here it seems to suggest that id170

https://docs.google.com/spreadsheets/d/ ... edit#gid=0

is of a similar level as Fruit 2.1 at 1min+1s.

The error bars are quite big but it seems there was real progress from id160 to id170.
Hmm, interesting. He has a good GPU (still probably not 1080 Ti), but with these nps and time control (1'+ 1'') I expected a 2450 CCRL Elo performance or so. Isn't his 2685 CCRL Elo too much? I think in TCEC conditions, or at TCEC time control on 1080 Ti driven by 4 cores, if that performance was true, it would go close to 3000 CCRL Elo points. But in TCEC itself, an old, but still strong LC0 performs 300-400 Elo points weaker than 3000-3050 CCRL Elo level engines.

His error margins (2SD?) are some 50 Elo points, my would be 30, if combining my results for 160 and 163, compared to combined 170 and 173. All in all, might be a progress, but I believe not by much.
Yes, the list you showed seems reliable as absolute strength goes, I didn't realized it. I mimicked on my CPU the playouts played by LC0 and nodes played by Fruit 2.1 in his conditions, it takes some time with my meager CPU means, the result came as +3 -2 =2 for LC0 ID174, which puts LC0 in the same ballpark with Fruit 2.1 in his conditions. About improvement, the last nets seem indeed (for 173 and 176) some 40 Elo points above 160 and 163, almost within error margins, but still significant, albeit slow progress.

The most interesting to see was the way in the above conditions, LC0 outplayed Fruit 2.1, giving away by a blunder a favorable or winning position for a draw or even a loss. It often disrespects material, and often is putting pressure, sometimes even unsoundly or barely soundly by Stockfish 9 analysis, but probably in self-games it works, as in such a pressure, a very precise tactical line is needed to defend. Funny engine to watch.

Here is a draw with LC0 as black sacking a Knight for an attack and sustained pressure on the King side, with a possible winning line 29... Qg3 instead of played by LC0 29... Be3

[pgn][White "Fruit 2.1"]
[Black "LCZero CPU ID174 4 cores"]
[Result "1/2-1/2"]
[SetUp "1"]
[FEN "rnbqk1nr/pp1pppbp/6p1/2p5/3PPP2/8/PPP3PP/RNBQKBNR w KQkq - 51 0"]

1. d5 d6 2. Nf3 Nf6 3. Nc3 O-O 4. Be2 e6 5. dxe6 Bxe6 6. Ng5 Nc6 7. O-O Qd7 8. Nxe6 fxe6 9. Be3 Rad8 10. Qd2 Qe7 11. Bf3 Nd4 12. Bxd4 cxd4 13. Qxd4 Nd5 14. Qxa7 Nxf4 15. Kh1 Be5 16. Qe3 Kg7 17. Rad1 Rc8 18. Qd2 b6 19. Nb5 Nh5 20. g3 Nxg3+ 21. hxg3 Bxg3 22. Kg1 Bf4 23. Qd3 Qh4 24. Rf2 Rf6 25. Rg2 Rc5 26. Nd4 Rg5 27. Rxg5 Qxg5+ 28. Kf1 e5 29. Nb5 Be3 30. Ke2 Rxf3 31. Kxf3 Qf4+ 32. Kg2 Qf2+ 33. Kh1 d5 34. Qf1 Qh4+ 35. Kg2 Qg4+ 36. Kh1 Qh4+ 37. Kg2 Qg4+ 38. Kh1 dxe4 39. Nc3 Qh4+ 40. Kg2 Qg4+ 41. Kh1 Qh5+ 42. Kg2 Qg4+ 43. Kh1 1/2-1/2[/pgn]


Here is a slowly outplayed Fruit 2.1, with one blunder in the endgame by LC0 as shown by SF9 analysis, although it maybe would have been a draw anyway (the variation can be seen in the PGN):

[pgn][Round "?"]
[White "Fruit 2.1"]
[Black "LCZero CPU ID174 4 cores"]
[Result "1/2-1/2"]
[Annotator "Kai"]
[SetUp "1"]
[FEN "r1bqkb1r/pppp1ppp/2n2n2/4p3/4P3/2P2N2/PP1P1PPP/RNBQKB1R w KQkq - 0 1"]
[PlyCount "196"]

1. Bb5 Nxe4 2. O-O Nf6 3. Re1 Be7 4. Bxc6 bxc6 5. Nxe5 O-O 6. d4 c5 7. Qb3 a5
8. Bg5 h6 9. Bxf6 Bxf6 10. dxc5 a4 11. Qd5 Rb8 12. Nd2 Rxb2 13. Ndc4 Rb5 14.
Rab1 Rxb1 15. Rxb1 d6 16. Nc6 Be6 17. Qe4 Qd7 18. cxd6 cxd6 19. Nb6 Qc7 20. c4
Kh8 21. a3 g6 22. Nd5 Qxc6 23. Nxf6 Qxc4 24. Qe3 Kg7 25. Rc1 Qb5 26. Ne4 Qd5
27. Qc3+ Qe5 28. Nxd6 Qxc3 29. Rxc3 Rd8 30. Rc6 Rb8 31. Rc1 Rb3 32. Ra1 Rb6 33.
Ne8+ Kf8 34. Nf6 Rc6 35. f3 Bb3 36. Ng4 Kg7 37. Ne3 Rd6 38. Nf1 Rc6 39. Ne3 h5
40. Kf2 f5 41. g3 Kf7 ({-0.08 Stockfish 9 64 BMI2:} 41... Bf7 42. Ke2 Rc3 43.
Kd2 Rb3 44. h4 Be8 45. Rc1 Rxa3 46. f4 Rb3 47. Rc7+ Kf8 48. Rc3 Rb2+ 49. Kd3
Rb4 50. Nc2 Bb5+ 51. Kd2 Re4 52. Rc5 Bc4 53. Ra5 Re2+ 54. Kc3 Bb3 55. Ra8+ Ke7
56. Nd4 Re3+ 57. Kb4 Rxg3 58. Ka3 Kf6 { eval -1.38/36}) 42. Ke2 Ke6 43. Re1
Ke5 44. Kd3 Rd6+ 45. Kc3 Kf6 46. h4 Kf7 47. Rb1 Rf6 48. f4 Rd6 49. Kb4 Rd3 50.
Re1 Ke6 51. Kc5 Kf7 52. Kb4 Rd4+ 53. Kb5 Re4 54. Kc5 Kg7 55. Re2 Kg8 56. Kc6
Kf7 57. Kc5 Bc2 58. Kd5 Bd3 59. Re1 Ke7 60. Kc6 Kf8 61. Kc5 Kf7 62. Kd5 Bc4+
63. Kc5 Bb3 64. Re2 Ke6 65. Re1 Kd7 66. Re2 Kc8 67. Re1 Kd7 68. Re2 Ke6 69. Re1
Kf6 70. Re2 Kg7 71. Kd6 Rd4+ 72. Kc5 Rd3 73. Kb4 Kg8 74. Kb5 Kf7 75. Re1 Rc3
76. Kb4 Rc8 77. Kb5 Re8 78. Kc5 Rd8 79. Kb5 Rc8 80. Kb4 Rc7 81. Rb1 Rc6 82. Kb5
Re6 83. Re1 Re7 84. Kc5 Rd7 85. Kb5 Re7 86. Kb4 Rb7+ 87. Ka5 Rd7 88. Kb4 Rd3
89. Kb5 Ke6 90. Kc5 Kd7 91. Kb5 Kd6 92. Kb4 Kc6 93. Re2 Kb6 94. Nf1 Rf3 95. Re1
Ba2 96. Rc1 Bb3 97. Re1 Ba2 98. Ra1 Bb3 1/2-1/2[/pgn]

Funny engine, like a very strong, romantic human.
duncan
Posts: 12038
Joined: Mon Jul 07, 2008 10:50 pm

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by duncan »

Laskos wrote: Yes, the list you showed seems reliable as absolute strength goes, I didn't realized it.
so any guess why it's strength does not show in current tcec ?
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Laskos »

duncan wrote:
Laskos wrote: Yes, the list you showed seems reliable as absolute strength goes, I didn't realized it.
so any guess why it's strength does not show in current tcec ?
TCEC generally with standard engines is on 43 cores, LC0 is also on 43 cores and not GPU, and the games now are rapid. Hard to derive from TCEC "ratings" relation to 1 core or 4 core CCRL ratings of standard engines with LC0 on strong GPU, never mind FIDE ratings.

With LC0 is an additional complication that it scales very differently from standard engines, so even CCRL 40/4 and 40/40 ratings will look differently.
Uri Blass
Posts: 10297
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Uri Blass »

Laskos wrote:
duncan wrote:
Laskos wrote: Yes, the list you showed seems reliable as absolute strength goes, I didn't realized it.
so any guess why it's strength does not show in current tcec ?
TCEC generally with standard engines is on 43 cores, LC0 is also on 43 cores and not GPU, and the games now are rapid. Hard to derive from TCEC "ratings" relation to 1 core or 4 core CCRL ratings of standard engines with LC0 on strong GPU, never mind FIDE ratings.

With LC0 is an additional complication that it scales very differently from standard engines, so even CCRL 40/4 and 40/40 ratings will look differently.
43 cores is better hardware then the hardware that people usually use for LC0 and if I understand correctly better than 1 GPU(I remember reading that 1 GPU may be equivalent to 10 CPU)

30+10 is longer time control then the time control that people usually use for LC0 and if I understand correctly should be equivalent to longer time control than 40/40 CCRL for 1 GPU.

If with all these advantages it score only 0.5/26 in TCEC so far(I do not count win by crash) then it seems that LC0 does not scale well(or at least the version TCEC used).
IQ
Posts: 162
Joined: Thu Dec 17, 2009 10:46 am

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by IQ »

Uri Blass wrote: 43 cores is better hardware then the hardware that people usually use for LC0 and if I understand correctly better than 1 GPU(I remember reading that 1 GPU may be equivalent to 10 CPU)

30+10 is longer time control then the time control that people usually use for LC0 and if I understand correctly should be equivalent to longer time control than 40/40 CCRL for 1 GPU.

If with all these advantages it score only 0.5/26 in TCEC so far(I do not count win by crash) then it seems that LC0 does not scale well(or at least the version TCEC used).
The main reason is that they used a very vary old net ID125.... the performance in TCEC was within the margins of what was expected for that net. Leela has much improved since then, but it will take a long time and a larger net for it to become competitive in TCEC division 4. It slowly will get there....

Scaling (getting stronger with longer time controls in relation to its competitors) cannot be determined from the TCEC result alone. But judging from other results, Leela does indeed scale well - but overall strength is just not there yet.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Laskos »

Uri Blass wrote:
Laskos wrote:
duncan wrote:
Laskos wrote: Yes, the list you showed seems reliable as absolute strength goes, I didn't realized it.
so any guess why it's strength does not show in current tcec ?
TCEC generally with standard engines is on 43 cores, LC0 is also on 43 cores and not GPU, and the games now are rapid. Hard to derive from TCEC "ratings" relation to 1 core or 4 core CCRL ratings of standard engines with LC0 on strong GPU, never mind FIDE ratings.

With LC0 is an additional complication that it scales very differently from standard engines, so even CCRL 40/4 and 40/40 ratings will look differently.
43 cores is better hardware then the hardware that people usually use for LC0 and if I understand correctly better than 1 GPU(I remember reading that 1 GPU may be equivalent to 10 CPU)

30+10 is longer time control then the time control that people usually use for LC0 and if I understand correctly should be equivalent to longer time control than 40/40 CCRL for 1 GPU.

If with all these advantages it score only 0.5/26 in TCEC so far(I do not count win by crash) then it seems that LC0 does not scale well(or at least the version TCEC used).
I will reply pretty hastily, correct me if I am wrong. From NPS I saw for LC0 v0.7 on TCEC, it performs no better than equipped with 1080Ti GPU. And that's not considering 43 threads (which might hurt performance compared to single GPU), only NPS.

We know form the list given by Michel, that ID125 is about 2550 CCRL 40/4' level on strong GPU at 1'+1'' (or its TCEC level at 1'+1''). I took several CCRL 40/4 1 core performances of TCEC participants there, they were in the 2900-3000 ballpark, with 2 of unknown 3000+ strength (newer versions, not listed in CCRL, above older 3000 rated). So, even conservative estimate is about 2950 single core CCRL 40/4 Elo level average. On 43 cores and 1.5 times stronger per core benchmark than CCRL, there are 7 doublings, or about 350 Elo points or more. So, TCEC engines (aside LC0) in TCEC conditions would come at 3300+ CCRL 40/4 list on average. LC0 ID125 is 2550 CCRL 40/4 at 1'+1'', I repeat.

Against 3300+ CCRL 40/4 Elo engines, LC0 is performing as 1 point out of 27 games, or 560 Elo points weaker. It performs at 2750+ Elo CCRL 40/4 level (compared to that 3300+ average)

So, from ID125 2550 Elo level (1'+1'') it does improve to 2750+ Elo level (TCEC 30'+10''), although errors are large (mainly from that 1/27 score). Improvement (better scaling than A/B engines) seems to be at least 200 Elo points at 30x time control, which is pretty normal (about 40+ Elo points from one doubling, I got 64 Elo points, but everywhere errors are large). Scaling seem fine.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Laskos »

Now (just tested ID184 in 200 games, it came as the strongest, but the error margins are pretty large), combining IDs 160 and 163 compared to combined 182 and 184, the progress is clear and significant, 50-60 +/- 30 Elo points. In just 3 days. The good work is going on, and there seem to be no additional large bugs with v0.7.

I wanted to see the scaling with STS 1-15 (1500 position) "Strategic Test Suite". It still contains some tactics, and in my view is computer over-analyzed, but is a good guide to see the scaling in positional strength. So I performed tests at 1s/position and 4s/position with comparable in strength in these conditions LC0 on 4 cores and standard A/B engines Greko 6.5 (2330 CCRL) and Fruit 2.1 (2685 CCRl).

Greko 6.5
1s/position:
score=835/1500 [averages on correct positions: depth=4.4 time=0.10 nodes=231938]
4s/position:
score=888/1500 [averages on correct positions: depth=5.1 time=0.32 nodes=733864]
+53 points improvement

Fruit 2.1
1s/position:
score=1047/1500 [averages on correct positions: depth=4.2 time=0.08 nodes=173742]
4s/position:
score=1102/1500 [averages on correct positions: depth=5.1 time=0.28 nodes=637360]
+55 points improvement


LC0 ID182
1s/position:
score=767/1500 [averages on correct positions: depth=9.2 time=0.19 nodes=46]
4s/position:
score=949/1500 [averages on correct positions: depth=10.6 time=0.63 nodes=170]
+182 points improvement


LC0 is a completely different animal, it scales much better on STS than standard A/B engines.
JJJ
Posts: 1346
Joined: Sat Apr 19, 2014 1:47 pm

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by JJJ »

Thanks you Kai for running your test and confirming the progress. Any graph soon to have a better picture ?
Dann Corbit
Posts: 12541
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Dann Corbit »

That is a really interesting result.
I wonder if the marvelous scaling will hold with future generations.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.