LCZero: Progress and Scaling. Relation to CCRL Elo

Laskos · Post by **Laskos** » Mon May 07, 2018 7:34 pm

CMCanavessi wrote: ↑Mon May 07, 2018 7:15 pm Kai, can you try some nets in the 231-236 range? Particularly 231, 232 and 236. Those are the ones that several of us consider the strongest.

Yes, I also found that after the nets in the 240 region, strength probably decreased a bit, but recent ones are recovering a bit. Hard to say clearly, error margins are hard to squash, but I will test 236 in 800 games for curiosity.

I compared on positional opening suite ID258, it comes at some 3300 CCRL Elo level. I adjusted time/position to mimic GTX 1060 GPU, although I have only a good CPU (6s for LC0 and 1s for the rest of engines, per poistion). CCRL numbers are from 40/4' rating list.

Code: Select all

Openings200 positional test-suite (200 positions)

Komodo 11.3.1    (3513)      126
Stockfish 9      (3561)      117
Deep Shredder 13 (3328)      116

LCZero ID258                 111

Andscacs 0.93    (3318)      103
Texel 1.07       (3211)       98
Fruit 2.1        (2684)       82

OTOH, on WAC tactical shots suite, LC0 performs so miserably, below 1800 level, that it's even hard to compare to a regular AB engine.

CMCanavessi · Post by **CMCanavessi** » Tue May 08, 2018 3:18 am

To all the guys testing tactics, remember that the results will be pointless if you don't provide at least 8 moves, as Leela needs that to fill her history planes. If you just feed a fen podition to her, she won't perform at all.

Nay Lin Tun · Post by **Nay Lin Tun** » Tue May 08, 2018 4:08 am

CMCanavessi wrote: ↑Tue May 08, 2018 3:18 am To all the guys testing tactics, remember that the results will be pointless if you don't provide at least 8 moves, as Leela needs that to fill her history planes. If you just feed a fen podition to her, she won't perform at all.

Leela's blenders are sometimes really odd, even 1 ply blunder. 31. Nd5 ??
https://lichess.org/6G7xZ5JO

JohnS · Post by **JohnS** » Tue May 08, 2018 4:42 am

Nay Lin Tun wrote: ↑Tue May 08, 2018 4:08 am
Leela's blenders are sometimes really odd, even 1 ply blunder. 31. Nd5 ??
https://lichess.org/6G7xZ5JO

Still she is certainly learning the openings well. Checking with Megabase 2018, the first new move was 12...Qb6. That's a solid performance.

Nay Lin Tun · Post by **Nay Lin Tun** » Tue May 08, 2018 5:05 am

I got this elo graph from someone who come from "Future" https://ibb.co/cxzk67

Jhoravi · Post by **Jhoravi** » Tue May 08, 2018 6:32 am

Nay Lin Tun wrote: ↑Tue May 08, 2018 5:05 am I got this elo graph from someone who come from "Future" https://ibb.co/cxzk67

"I came from zero.. my ultimate goal is to go back to where I came from"
--Leela

Laskos · Post by **Laskos** » Tue May 08, 2018 9:59 am

Nay Lin Tun wrote: ↑Tue May 08, 2018 5:05 am I got this elo graph from someone who come from "Future" https://ibb.co/cxzk67

The plot seem a bit funny

.

I tested also ID236 and ID261, they don't come out as the strongest. My plot is here for 15x192 net, with several datapoints. Red lines are one standard deviation lines. Fast games on CPU against a standard engine, might be not very representative for GPU and longer time controls. All with v0.8 binary.

Something seems to not be working well, the total improvement from ID227 to ID261 is meager, slightly above 2SD error margins.

Laskos · Post by **Laskos** » Tue May 08, 2018 10:15 am

Daniel Shawul wrote: ↑Mon May 07, 2018 7:28 pm
Laskos wrote: ↑Mon May 07, 2018 6:40 pm
Laskos wrote: ↑Mon May 07, 2018 5:13 pm
Yes, v0.8.
And, now ID258 has 110/300. Seems consistent with worsening on WAC.
Interesting, in positional opening suite, the trend is exactly the opposite. I show the results for the first 15x192 net compared to the las tone:

LCZero v0.8
6s/position 4 CPU threads (equivalent to 1s on GTX 1060)

WAC300 tactical:
ID227: 120/300
ID258: 110/300
performance below 1800 Elo points AB engines, worsening

Openings200 positional:
ID227: 98/200
ID258: 111/200
performance above 3200 Elo points AB engines, improving

There seem to be some conflict between these two aspects, at least in the net+search part.
It is going to be a massive heartbreak for many who believe the NN is going to solve tactics

Hardware + cherry-picking seems to be the only explanation left so far ...

syzygy also gets it: judge only based on the evidence presented so far on tactics -- which is none.

Daniel

I am curious how A0 was at tactics. Especially WAC type of tactics. These things occur not that often in games, and the games presented are too few to have a picture. LC0 still manages to be at 3000 Elo level in CCRL conditions with good GPU, from normal openings. But give it tactically involved openings, and it performs very much lower.

mirek · Post by **mirek** » Tue May 08, 2018 11:21 am

I wonder if it's just an accident that the regression started around the time v8 was released?

jkiliani · Post by **jkiliani** » Tue May 08, 2018 12:30 pm

mirek wrote: ↑Tue May 08, 2018 11:21 am I wonder if it's just an accident that the regression started around the time v8 was released?

There are multiple competing hypotheses about what caused this regression. At the moment we're trying a reset of the training to an earlier checkpoint (although with the new games) to see whether there's an overfitting problem with the value head. Also, several people suspect FPU reduction, or a problem with the neural net cache key. We'll investigate the issue and hope to find the cause soon.

LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo