LCZero: Progress and Scaling. Relation to CCRL Elo

Damir · Post by **Damir** » Fri Apr 06, 2018 6:56 pm

It would be nice for Leela to contain a resign button on the opponent side...

Laskos · Post by **Laskos** » Fri Apr 06, 2018 8:21 pm

MonteCarlo wrote:Thanks for the update and that position, Kai!

The biggest problem in that position is just that black's mate (in the line 136.Kh4 g5+ 137.Kxh5 Qf3+ 138.Kh6 Qh1+ 139.Qh3 Qxh3#) has a series of only winning moves, and the net assigns all of them but the last a very low probability of being played (4%, 4.5%, and 5%), so it just doesn't really get explored.

Combined with the fact that even after those 3 moves have been played, the raw net gives 139.Qh3+ an expected score of 55%, we can really see that the net has a lot of room for improvement

It's essentially the same as a case of bad pruning in a traditional engine; it's ordering some moves far too low on the list, and as a result is missing a critical line (it's just this example is much easier than the ones we're used to seeing from traditional engines, because the line is so shallow).

It's also similar to those traditional cases in that once 136... g5+ is forced, it takes only 3 seconds on a single thread on my slow laptop for it to find the loss.

If the net were even slightly better with its assignment of probabilities to any of g5+, Qf3+, and Qh1+, it would find this quickly.

We'll just have to see how it develops. It might be that the net eventually learns to do really well on move ordering so that these examples go away, or it might not. That latter case is when we'd have to start considering making changes to the search.

Having said all of that, even, there are still some known bugs that haven't yet been removed (it's actually kind of funny, the net has seemingly learned to correct for them, but presumably at some cost to what else it's able to represent).

Time will tell

Thanks for analysing the position. That 55% after 139.Qh3 is funny, so there is a lot of room for improvement of the network (value sub-component, at least). Maybe lots of learning ahead, let's see.

In any case, the achievement is already impressive for slightly over a month project, I didn't expect that. After two weeks or so I was skeptical that it will go much further. For fun, I let play 10 games LC0 at 10s/move against Fruit 2.1 at 1s/move, a 2700 Elo level engine. That would be equal time control for a unit equipped with a couple of top GPUs. Fruit won +6 -3 =1, so on a well GPU equipped unit, LC0 is already almost competitive with Fruit 2.1. Very impressive. Here is a nice game which LC0 won as Black:

[pgn][White "Fruit 2.1 (2685)"]
[Black "LCZero CPU 4 threads"]
[Result "0-1"]
[SetUp "1"]
[FEN "rnbqk1nr/pp1pppbp/2p3p1/8/4P3/3P1N2/PPP2PPP/RNBQKB1R w KQkq - 51 0"]

1. Be2 Nf6 2. e5 Nd5 3. O-O d6 4. c4 dxe5 5. cxd5 cxd5 6. Nc3 Nc6 7. Qb3 e6 8. Bg5 f6 9. Bd2 g5 10. Qa3 g4 11. Ne1 f5 12. Nc2 Bf8 13. b4 d4 14. Nd1 a6 15. Re1 h5 16. f3 g3 17. hxg3 h4 18. g4 h3 19. Nf2 Qh4 20. Nxh3 Qg3 21. Nf2 Rh2 22. Bf1 Rh8 23. Be2 Ne7 24. Qa5 Nd5 25. gxf5 Be7 26. Rac1 Bd7 27. fxe6 Bxe6 28. Ng4 b5 29. Nh6 Bh4 30. Rf1 Bd8 31. Qa3 Nf4 32. Bxf4 Qxf4 33. Rfd1 Qxh6 34. Bf1 Bh4 35. g4 Bg3 36. Bg2 Bxg4 37. Qb3 Bxf3 38. Qe6+ Qxe6 39. Rf1 Bxg2 40. Kxg2 Rh2+ 41. Kg1 Qh3 42. Rf8+ Kxf8 43. Rf1+ Kg8 44. Rf8+ Rxf8 45. Ne3 Rh1# 0-1[/pgn]

Observe the pressure LC0 positionally puts on Fruit, being a large part of the game in material inferiority (a minor piece for a long time). Under this pressure Fruit blundered (by its own "blunder check" analysis at 10s/move). Here is the position where it blundered (move 27.fxe6):

[D]r3k2r/1p1bb3/p3p3/Q2npP2/1P1p4/3P1Pq1/P1NBBNP1/2R1R1K1 w q - 0 27

Here, under pressure, Fruit played 27.fxe6. With 27.f4, Fruit might have saved the game. Nice, very peculiar kind of playing by LC0.

Laskos · Post by **Laskos** » Sat Apr 07, 2018 1:32 pm

The official list progress as seen here http://lczero.org/networks being statistically inconsistent, I decided to test the "best" up to now ID83 (4694) against the latest ID99 (4635), an apparent regression after a whopping 700,000 games. To test not directly, but on some more objective measures.

First, test suites:

Positional suite of openings (200 openings):
ID99: 109/200
ID83: 109/200

Tactical suite of middlegames (879 positions)
ID99: 188/879
ID83: 173/879

It seems LCZero did improve a bit tactically.

Matches of 200 games each against Predateur 2.2.1 stable engine, 0.25s/move.
ID99: 75.5/200
ID83: 69.0/200

It seems, although shown as a significant regression, about 25 Elo points were gained from ID83 to ID99, probably mostly due to improved tactics.

Dann Corbit · Post by **Dann Corbit** » Sat Apr 07, 2018 1:53 pm

Laskos wrote:The official list progress as seen here http://lczero.org/networks being statistically inconsistent, I decided to test the "best" up to now ID83 (4694) against the latest ID99 (4635), an apparent regression after a whopping 700,000 games. To test not directly, but on some more objective measures.

First, test suites:

Positional suite of openings (200 openings):
ID99: 109/200
ID83: 109/200

Tactical suite of middlegames (879 positions)
ID99: 188/879
ID83: 173/879

It seems LCZero did improve a bit tactically.

Matches of 200 games each against Predateur 2.2.1 stable engine, 0.25s/move.
ID99: 75.5/200
ID83: 69.0/200

It seems, although shown as a significant regression, about 25 Elo points were gained from ID83 to ID99, probably mostly due to improved tactics.

Within the error bars, I guess we do not know if regression or progression.

It is clear that progress has slowed.

AdminX · Post by **AdminX** » Sat Apr 07, 2018 1:57 pm

Fun Draw with Chess Genius

[pgn]
[Event "07:51"]
[Site "Play.LCZero.Org"]
[Date "2018.04.07"]
[Round "?"]
[White "LCZero ID 82 (Hard)"]
[Black "Chess Genius Android SG6 (1 Sec / Ponder Off)"]
[Result "1/2-1/2"]

1. e4 e5 2. Nf3 f5 3. exf5 e4 4. Nd4 Qf6 5. Nb5 Qe5 6. Qe2 Nc6 7. N1c3 Nf6 8. f4 Qxf4 9. d4 Qxf5 10. Nxc7+ Kd8 11. Nxa8 Nxd4 12. Qf2 Qxf2+ 13. Kxf2 Nxc2 14. Rb1 Bc5+ 15. Ke2 Nd4+ 16. Kd1 d5 17. Be3 Ng4 18. Kd2 Nxe3 19. Kxe3 Rf8 20. Nxd5 Nb3+ 21. Kxe4 Bf5+ 22. Ke5 Bd4+ 23. Kd6 Bc5+ 24. Ke5 Bd4+ 25. Kd6 Bc5+ 26. Ke5 1/2-1/2
[/pgn]

Laskos · Post by **Laskos** » Sat Apr 07, 2018 8:38 pm

Dann Corbit wrote:
Laskos wrote:The official list progress as seen here http://lczero.org/networks being statistically inconsistent, I decided to test the "best" up to now ID83 (4694) against the latest ID99 (4635), an apparent regression after a whopping 700,000 games. To test not directly, but on some more objective measures.

First, test suites:

Positional suite of openings (200 openings):
ID99: 109/200
ID83: 109/200

Tactical suite of middlegames (879 positions)
ID99: 188/879
ID83: 173/879

It seems LCZero did improve a bit tactically.

Matches of 200 games each against Predateur 2.2.1 stable engine, 0.25s/move.
ID99: 75.5/200
ID83: 69.0/200

It seems, although shown as a significant regression, about 25 Elo points were gained from ID83 to ID99, probably mostly due to improved tactics.
Within the error bars, I guess we do not know if regression or progression.

It is clear that progress has slowed.

This is already significant, and a high Elo gain, although ID101 still shows some 30 Elo points regression compared to ID83 in their "ratings" http://lczero.org/networks .

Code: Select all

Games Completed = 800 of 1000 &#40;Avg game length = 9.345 sec&#41;
Settings = Gauntlet/64MB/100ms per move/M 2500cp for 3 moves, D 150 moves/EPD&#58;C&#58;\LittleBlitzer\3moves_GM_04.epd&#40;817&#41;
Time = 11990 sec elapsed, 2997 sec remaining
 1.  Predateur 2.2.1           	515.5/800	484-253-63  	&#40;L&#58; m=253 t=0 i=0 a=0&#41;	&#40;D&#58; r=51 i=9 f=1 s=1 a=1&#41;	&#40;tpm=154.3 d=13.91 nps=2970004&#41;
 2.  LCZero CPU ID101 4 threads	158.5/400	144-227-29  	&#40;L&#58; m=227 t=0 i=0 a=0&#41;	&#40;D&#58; r=27 i=2 f=0 s=0 a=0&#41;	&#40;tpm=65.4 d=12.36 nps=1726&#41;
 3.  LCZero CPU ID83 4 threads 	126.0/400	109-257-34  	&#40;L&#58; m=257 t=0 i=0 a=0&#41;	&#40;D&#58; r=24 i=7 f=1 s=1 a=1&#41;	&#40;tpm=65.4 d=12.30 nps=1905&#41;

Highly controlled conditions.

62 Elo points advantage of ID101 over ID83 in just 3.5 days. Standard deviation of the difference is 23 Elo points. They have to be careful with their self-games ratings.

CMCanavessi · Post by **CMCanavessi** » Sat Apr 07, 2018 8:42 pm

I've made a graph with my results and gauntlets which clearly shows that there's still no stall in the strenght's grow:

Elo progression:

Gauntlet score progression:

gladius · Post by **gladius** » Sat Apr 07, 2018 8:50 pm

Laskos wrote:
Dann Corbit wrote:
Laskos wrote:The official list progress as seen here http://lczero.org/networks being statistically inconsistent, I decided to test the "best" up to now ID83 (4694) against the latest ID99 (4635), an apparent regression after a whopping 700,000 games. To test not directly, but on some more objective measures.

First, test suites:

Positional suite of openings (200 openings):
ID99: 109/200
ID83: 109/200

Tactical suite of middlegames (879 positions)
ID99: 188/879
ID83: 173/879

It seems LCZero did improve a bit tactically.

Matches of 200 games each against Predateur 2.2.1 stable engine, 0.25s/move.
ID99: 75.5/200
ID83: 69.0/200

It seems, although shown as a significant regression, about 25 Elo points were gained from ID83 to ID99, probably mostly due to improved tactics.
Within the error bars, I guess we do not know if regression or progression.

It is clear that progress has slowed.
This is already significant, and a high Elo gain, although ID101 still shows some 30 Elo points regression compared to ID83 in their "ratings" http://lczero.org/networks .
Code: Select all
Games Completed = 800 of 1000 &#40;Avg game length = 9.345 sec&#41;
Settings = Gauntlet/64MB/100ms per move/M 2500cp for 3 moves, D 150 moves/EPD&#58;C&#58;\LittleBlitzer\3moves_GM_04.epd&#40;817&#41;
Time = 11990 sec elapsed, 2997 sec remaining
 1.  Predateur 2.2.1           	515.5/800	484-253-63  	&#40;L&#58; m=253 t=0 i=0 a=0&#41;	&#40;D&#58; r=51 i=9 f=1 s=1 a=1&#41;	&#40;tpm=154.3 d=13.91 nps=2970004&#41;
 2.  LCZero CPU ID101 4 threads	158.5/400	144-227-29  	&#40;L&#58; m=227 t=0 i=0 a=0&#41;	&#40;D&#58; r=27 i=2 f=0 s=0 a=0&#41;	&#40;tpm=65.4 d=12.36 nps=1726&#41;
 3.  LCZero CPU ID83 4 threads 	126.0/400	109-257-34  	&#40;L&#58; m=257 t=0 i=0 a=0&#41;	&#40;D&#58; r=24 i=7 f=1 s=1 a=1&#41;	&#40;tpm=65.4 d=12.30 nps=1905&#41;
Highly controlled conditions.

62 Elo points advantage of ID101 over ID83 in just 3.5 days. Standard deviation of the difference is 23 Elo points. They have to be careful with their self-games ratings.

Yeah, the self-game ratings are not representative currently, because they end up playing all the same opening. In one way it's cool to see the networks opening choices evolve over time (it loves playing the sicilian now, but pretty much one specific line). We are going to add some variety to the move choices in the opening to fix this hopefully, and that should make things more representative.

Laskos · Post by **Laskos** » Sun Apr 08, 2018 3:17 am

gladius wrote:
Laskos wrote:
62 Elo points advantage of ID101 over ID83 in just 3.5 days. Standard deviation of the difference is 23 Elo points. They have to be careful with their self-games ratings.
Yeah, the self-game ratings are not representative currently, because they end up playing all the same opening. In one way it's cool to see the networks opening choices evolve over time (it loves playing the sicilian now, but pretty much one specific line). We are going to add some variety to the move choices in the opening to fix this hopefully, and that should make things more representative.

I don't think LCZero is strong enough yet to decide which openings are the best, so it must be some local optima and are to avoid. It is strong positionally in the openings, but not yet there, and anyway, it has to be trained on a wide variety of openings, so noise should be added in that part.

The match was finished with even more impressive result from ID101 against ID83: +74 Elo points with 20 Elo points standard deviation.

Laskos · Post by **Laskos** » Sun Apr 08, 2018 3:19 am

CMCanavessi wrote:I've made a graph with my results and gauntlets which clearly shows that there's still no stall in the strenght's grow:

Elo progression:

Gauntlet score progression:

Thanks, very interesting, I hope you make the people involved aware of your work.

LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo