LCZero: Progress and Scaling. Relation to CCRL Elo

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Laskos »

CMCanavessi wrote:I get about 1kn/s, and tc is 1 min + 1 sec so it's very similar to the match games leela plays between networks.

The scaling, i don't know, haven't made any tests.
LC0 scales significantly better than standard engines of similar strength.

I am bit puzzled by their ratings here http://lczero.org/networks . I am not even sure how they decide to promote a network with such "ratings". Although my current methodology is very flawed, only against one standard similar in strength engine and at very short time control, I have the following:

My matches are of 500 games.

ID101 compared to ID83
Their rating in self-games: -30 Elo points
My rating: +75 Elo points (20 Elo points standard deviation)

ID109 compared to ID101
Their rating in self games: +100 Elo points
My rating: +0 Elo points (20 Elo points standard deviation)

I use a varied, but solid short opening suite 3moves_GM. They seem to overfit on certain openings, finding local optima, playing most of the games on them, and training less on many other viable openings. I don't know whether they will soon find it problematic what to promote or not. 300,000 games with +0 Elo against a standard engine is not that good.
jkiliani
Posts: 143
Joined: Wed Jan 17, 2018 1:26 pm

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by jkiliani »

Laskos wrote:I use a varied, but solid short opening suite 3moves_GM. They seem to overfit on certain openings, finding local optima, playing most of the games on them, and training less on many other viable openings. I don't know whether they will soon find it problematic what to promote or not. 300,000 games with +0 Elo against a standard engine is not that good.
Lc0 currently has the problem that randomness in match games is provided only by the relatively small perturbations of the root node scores from Dirichlet noise. This can only shift the PV when two moves are similar already in evaluated quality, and leads to sometimes long opening lines played in almost every game within a particular match.

I wrote a solution to this issue in https://github.com/glinscott/leela-chess/pull/267. Once this is used in match games, openings will be much more varied again. It will probably happen in a couple of days, after the next forced version upgrade.
User avatar
George Tsavdaris
Posts: 1627
Joined: Thu Mar 09, 2006 12:35 pm

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by George Tsavdaris »

jkiliani wrote: Lc0 currently has the problem that randomness in match games is provided only by the relatively small perturbations of the root node scores from Dirichlet noise. This can only shift the PV when two moves are similar already in evaluated quality, and leads to sometimes long opening lines played in almost every game within a particular match.
Is this "problem" affecting at all the strengthening/training part of Lc0? I guess not right? It's just affecting the measuring ELO progress i guess.
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
jkiliani
Posts: 143
Joined: Wed Jan 17, 2018 1:26 pm

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by jkiliani »

George Tsavdaris wrote:
jkiliani wrote: Lc0 currently has the problem that randomness in match games is provided only by the relatively small perturbations of the root node scores from Dirichlet noise. This can only shift the PV when two moves are similar already in evaluated quality, and leads to sometimes long opening lines played in almost every game within a particular match.
Is this "problem" affecting at all the strengthening/training part of Lc0? I guess not right? It's just affecting the measuring ELO progress i guess.
This only affects strength measurement, not training. For training, we always use move selection proportional to visit count, which leads to great variety of positions but also a lot of blunders (which don't hurt the training).

Apart from Elo measurement, it does however have another side effect: It makes Leela predictable, which is bad in an engine used to spar with as a human. The now merged code provides greater variety in move selection while avoiding most of the blunders which happen in training games.
User avatar
George Tsavdaris
Posts: 1627
Joined: Thu Mar 09, 2006 12:35 pm

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by George Tsavdaris »

jkiliani wrote: Apart from Elo measurement, it does however have another side effect: It makes Leela predictable, which is bad in an engine used to spar with as a human.
It's no different than other engines. All engines will play the same move if they think the same time and the only non deterministic part on them is the randomness that is being introduced due to using parallel search(using multiple CPu/cores).
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
jkiliani
Posts: 143
Joined: Wed Jan 17, 2018 1:26 pm

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by jkiliani »

George Tsavdaris wrote:
jkiliani wrote: Apart from Elo measurement, it does however have another side effect: It makes Leela predictable, which is bad in an engine used to spar with as a human.
It's no different than other engines. All engines will play the same move if they think the same time and the only non deterministic part on them is the randomness that is being introduced due to using parallel search(using multiple CPu/cores).
This may be true, but determinism is less than ideal for Leela particularly since it has only the neural net search, no opening book. Without some move randomisation, there wouldn't be a sensible way to measure its strength.
User avatar
George Tsavdaris
Posts: 1627
Joined: Thu Mar 09, 2006 12:35 pm

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by George Tsavdaris »

jkiliani wrote:
George Tsavdaris wrote:
jkiliani wrote: Apart from Elo measurement, it does however have another side effect: It makes Leela predictable, which is bad in an engine used to spar with as a human.
It's no different than other engines. All engines will play the same move if they think the same time and the only non deterministic part on them is the randomness that is being introduced due to using parallel search(using multiple CPu/cores).
This may be true, but determinism is less than ideal for Leela particularly since it has only the neural net search, no opening book. Without some move randomisation, there wouldn't be a sensible way to measure its strength.
There is. Predefined opening variations. There are countless opening suites out there and it's more objective from version to version since you test the same positions every time.
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
jkiliani
Posts: 143
Joined: Wed Jan 17, 2018 1:26 pm

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by jkiliani »

George Tsavdaris wrote:
jkiliani wrote: This may be true, but determinism is less than ideal for Leela particularly since it has only the neural net search, no opening book. Without some move randomisation, there wouldn't be a sensible way to measure its strength.
There is. Predefined opening variations. There are countless opening suites out there and it's more objective from version to version since you test the same positions every time.
An opening book that's not really integrated into the engine (i.e., one that just start from an arbitrary FEN, without replaying all opening moves) would not work that well with Leela, since she needs the move history. If you like an opening suite for testing purposes, you could implement it and open a pull request on https://github.com/glinscott/leela-chess/.

Personally I find that letting the neural net choose its own openings from the promising candidate moves measures its capabilities better, but I know that not everyone shares that opinion. But if you like an opening book, you might have to code the option yourself (or wait until someone else does it I suppose).
Robert Pope
Posts: 558
Joined: Sat Mar 25, 2006 8:27 pm

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Robert Pope »

jkiliani wrote:
George Tsavdaris wrote:
jkiliani wrote: This may be true, but determinism is less than ideal for Leela particularly since it has only the neural net search, no opening book. Without some move randomisation, there wouldn't be a sensible way to measure its strength.
There is. Predefined opening variations. There are countless opening suites out there and it's more objective from version to version since you test the same positions every time.
An opening book that's not really integrated into the engine (i.e., one that just start from an arbitrary FEN, without replaying all opening moves) would not work that well with Leela, since she needs the move history. If you like an opening suite for testing purposes, you could implement it and open a pull request on https://github.com/glinscott/leela-chess/.

Personally I find that letting the neural net choose its own openings from the promising candidate moves measures its capabilities better, but I know that not everyone shares that opinion. But if you like an opening book, you might have to code the option yourself (or wait until someone else does it I suppose).
Many/most tournament managers like cutechess can play out the actual openings, not just send a FEN.

And you can make the same argument about letting an engine choose its own openings about any engine, not just neural net ones.
User avatar
lantonov
Posts: 216
Joined: Sun Apr 13, 2014 5:19 pm

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by lantonov »

Laskos wrote: I am bit puzzled by their ratings here http://lczero.org/networks.
I was puzzled too before M. van der Bergh explained in https://groups.google.com/d/msg/lczero/ ... IqeuQsBwAJ that rating measurements as done for the various networks are not reliable.
As I understand, the networks are matched against the highest rated previous network and therefore the points on the network graph on the main page are not additive.
Additionally, the openings in matches are too uniform to allow reliable Elo comparisons.