LCZero: Progress and Scaling. Relation to CCRL Elo

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Laskos »

CMCanavessi wrote: Mon May 07, 2018 7:15 pm Kai, can you try some nets in the 231-236 range? Particularly 231, 232 and 236. Those are the ones that several of us consider the strongest.
Yes, I also found that after the nets in the 240 region, strength probably decreased a bit, but recent ones are recovering a bit. Hard to say clearly, error margins are hard to squash, but I will test 236 in 800 games for curiosity.

I compared on positional opening suite ID258, it comes at some 3300 CCRL Elo level. I adjusted time/position to mimic GTX 1060 GPU, although I have only a good CPU (6s for LC0 and 1s for the rest of engines, per poistion). CCRL numbers are from 40/4' rating list.

Code: Select all

Openings200 positional test-suite (200 positions)

Komodo 11.3.1    (3513)      126
Stockfish 9      (3561)      117
Deep Shredder 13 (3328)      116

LCZero ID258                 111

Andscacs 0.93    (3318)      103
Texel 1.07       (3211)       98
Fruit 2.1        (2684)       82
OTOH, on WAC tactical shots suite, LC0 performs so miserably, below 1800 level, that it's even hard to compare to a regular AB engine.
User avatar
CMCanavessi
Posts: 1142
Joined: Thu Dec 28, 2017 4:06 pm
Location: Argentina

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by CMCanavessi »

To all the guys testing tactics, remember that the results will be pointless if you don't provide at least 8 moves, as Leela needs that to fill her history planes. If you just feed a fen podition to her, she won't perform at all.
Follow my tournament and some Leela gauntlets live at http://twitch.tv/ccls
Nay Lin Tun
Posts: 708
Joined: Mon Jan 16, 2012 6:34 am

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Nay Lin Tun »

CMCanavessi wrote: Tue May 08, 2018 3:18 am To all the guys testing tactics, remember that the results will be pointless if you don't provide at least 8 moves, as Leela needs that to fill her history planes. If you just feed a fen podition to her, she won't perform at all.
Leela's blenders are sometimes really odd, even 1 ply blunder. 31. Nd5 ??
https://lichess.org/6G7xZ5JO
JohnS
Posts: 215
Joined: Sun Feb 24, 2008 2:08 am

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by JohnS »

Nay Lin Tun wrote: Tue May 08, 2018 4:08 am
Leela's blenders are sometimes really odd, even 1 ply blunder. 31. Nd5 ??
https://lichess.org/6G7xZ5JO
Still she is certainly learning the openings well. Checking with Megabase 2018, the first new move was 12...Qb6. That's a solid performance.
Nay Lin Tun
Posts: 708
Joined: Mon Jan 16, 2012 6:34 am

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Nay Lin Tun »

I got this elo graph from someone who come from "Future" https://ibb.co/cxzk67 :mrgreen: :shock:
Jhoravi
Posts: 291
Joined: Wed May 08, 2013 6:49 am

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Jhoravi »

Nay Lin Tun wrote: Tue May 08, 2018 5:05 am I got this elo graph from someone who come from "Future" https://ibb.co/cxzk67 :mrgreen: :shock:
"I came from zero.. my ultimate goal is to go back to where I came from"
--Leela
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Laskos »

Nay Lin Tun wrote: Tue May 08, 2018 5:05 am I got this elo graph from someone who come from "Future" https://ibb.co/cxzk67 :mrgreen: :shock:
The plot seem a bit funny :).

I tested also ID236 and ID261, they don't come out as the strongest. My plot is here for 15x192 net, with several datapoints. Red lines are one standard deviation lines. Fast games on CPU against a standard engine, might be not very representative for GPU and longer time controls. All with v0.8 binary.

Image

Something seems to not be working well, the total improvement from ID227 to ID261 is meager, slightly above 2SD error margins.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Laskos »

Daniel Shawul wrote: Mon May 07, 2018 7:28 pm
Laskos wrote: Mon May 07, 2018 6:40 pm
Laskos wrote: Mon May 07, 2018 5:13 pm
Yes, v0.8.
And, now ID258 has 110/300. Seems consistent with worsening on WAC.
Interesting, in positional opening suite, the trend is exactly the opposite. I show the results for the first 15x192 net compared to the las tone:

LCZero v0.8
6s/position 4 CPU threads (equivalent to 1s on GTX 1060)

WAC300 tactical:
ID227: 120/300
ID258: 110/300
performance below 1800 Elo points AB engines, worsening

Openings200 positional:
ID227: 98/200
ID258: 111/200
performance above 3200 Elo points AB engines, improving


There seem to be some conflict between these two aspects, at least in the net+search part.
It is going to be a massive heartbreak for many who believe the NN is going to solve tactics :)

Hardware + cherry-picking seems to be the only explanation left so far ...

syzygy also gets it: judge only based on the evidence presented so far on tactics -- which is none.

Daniel
I am curious how A0 was at tactics. Especially WAC type of tactics. These things occur not that often in games, and the games presented are too few to have a picture. LC0 still manages to be at 3000 Elo level in CCRL conditions with good GPU, from normal openings. But give it tactically involved openings, and it performs very much lower.
mirek
Posts: 52
Joined: Sat Mar 24, 2018 4:18 pm

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by mirek »

I wonder if it's just an accident that the regression started around the time v8 was released?
jkiliani
Posts: 143
Joined: Wed Jan 17, 2018 1:26 pm

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by jkiliani »

mirek wrote: Tue May 08, 2018 11:21 am I wonder if it's just an accident that the regression started around the time v8 was released?
There are multiple competing hypotheses about what caused this regression. At the moment we're trying a reset of the training to an earlier checkpoint (although with the new games) to see whether there's an overfitting problem with the value head. Also, several people suspect FPU reduction, or a problem with the neural net cache key. We'll investigate the issue and hope to find the cause soon.