LCZero: Progress and Scaling. Relation to CCRL Elo

Laskos · Post by **Laskos** » Mon Apr 09, 2018 10:03 am

CMCanavessi wrote:I get about 1kn/s, and tc is 1 min + 1 sec so it's very similar to the match games leela plays between networks.

The scaling, i don't know, haven't made any tests.

LC0 scales significantly better than standard engines of similar strength.

I am bit puzzled by their ratings here http://lczero.org/networks . I am not even sure how they decide to promote a network with such "ratings". Although my current methodology is very flawed, only against one standard similar in strength engine and at very short time control, I have the following:

My matches are of 500 games.

ID101 compared to ID83
Their rating in self-games: -30 Elo points
My rating: +75 Elo points (20 Elo points standard deviation)

ID109 compared to ID101
Their rating in self games: +100 Elo points
My rating: +0 Elo points (20 Elo points standard deviation)

I use a varied, but solid short opening suite 3moves_GM. They seem to overfit on certain openings, finding local optima, playing most of the games on them, and training less on many other viable openings. I don't know whether they will soon find it problematic what to promote or not. 300,000 games with +0 Elo against a standard engine is not that good.

jkiliani · Post by **jkiliani** » Mon Apr 09, 2018 10:11 am

Laskos wrote:I use a varied, but solid short opening suite 3moves_GM. They seem to overfit on certain openings, finding local optima, playing most of the games on them, and training less on many other viable openings. I don't know whether they will soon find it problematic what to promote or not. 300,000 games with +0 Elo against a standard engine is not that good.

Lc0 currently has the problem that randomness in match games is provided only by the relatively small perturbations of the root node scores from Dirichlet noise. This can only shift the PV when two moves are similar already in evaluated quality, and leads to sometimes long opening lines played in almost every game within a particular match.

I wrote a solution to this issue in https://github.com/glinscott/leela-chess/pull/267. Once this is used in match games, openings will be much more varied again. It will probably happen in a couple of days, after the next forced version upgrade.

George Tsavdaris · Post by **George Tsavdaris** » Mon Apr 09, 2018 1:26 pm

jkiliani wrote: Lc0 currently has the problem that randomness in match games is provided only by the relatively small perturbations of the root node scores from Dirichlet noise. This can only shift the PV when two moves are similar already in evaluated quality, and leads to sometimes long opening lines played in almost every game within a particular match.

Is this "problem" affecting at all the strengthening/training part of Lc0? I guess not right? It's just affecting the measuring ELO progress i guess.

jkiliani · Post by **jkiliani** » Mon Apr 09, 2018 2:04 pm

George Tsavdaris wrote:
jkiliani wrote: Lc0 currently has the problem that randomness in match games is provided only by the relatively small perturbations of the root node scores from Dirichlet noise. This can only shift the PV when two moves are similar already in evaluated quality, and leads to sometimes long opening lines played in almost every game within a particular match.
Is this "problem" affecting at all the strengthening/training part of Lc0? I guess not right? It's just affecting the measuring ELO progress i guess.

This only affects strength measurement, not training. For training, we always use move selection proportional to visit count, which leads to great variety of positions but also a lot of blunders (which don't hurt the training).

Apart from Elo measurement, it does however have another side effect: It makes Leela predictable, which is bad in an engine used to spar with as a human. The now merged code provides greater variety in move selection while avoiding most of the blunders which happen in training games.

George Tsavdaris · Post by **George Tsavdaris** » Mon Apr 09, 2018 2:21 pm

jkiliani wrote: Apart from Elo measurement, it does however have another side effect: It makes Leela predictable, which is bad in an engine used to spar with as a human.

It's no different than other engines. All engines will play the same move if they think the same time and the only non deterministic part on them is the randomness that is being introduced due to using parallel search(using multiple CPu/cores).

jkiliani · Post by **jkiliani** » Mon Apr 09, 2018 2:58 pm

George Tsavdaris wrote:
jkiliani wrote: Apart from Elo measurement, it does however have another side effect: It makes Leela predictable, which is bad in an engine used to spar with as a human.
It's no different than other engines. All engines will play the same move if they think the same time and the only non deterministic part on them is the randomness that is being introduced due to using parallel search(using multiple CPu/cores).

This may be true, but determinism is less than ideal for Leela particularly since it has only the neural net search, no opening book. Without some move randomisation, there wouldn't be a sensible way to measure its strength.

George Tsavdaris · Post by **George Tsavdaris** » Mon Apr 09, 2018 3:18 pm

jkiliani wrote:
George Tsavdaris wrote:
jkiliani wrote: Apart from Elo measurement, it does however have another side effect: It makes Leela predictable, which is bad in an engine used to spar with as a human.
It's no different than other engines. All engines will play the same move if they think the same time and the only non deterministic part on them is the randomness that is being introduced due to using parallel search(using multiple CPu/cores).
This may be true, but determinism is less than ideal for Leela particularly since it has only the neural net search, no opening book. Without some move randomisation, there wouldn't be a sensible way to measure its strength.

There is. Predefined opening variations. There are countless opening suites out there and it's more objective from version to version since you test the same positions every time.

jkiliani · Post by **jkiliani** » Mon Apr 09, 2018 3:30 pm

George Tsavdaris wrote:
jkiliani wrote: This may be true, but determinism is less than ideal for Leela particularly since it has only the neural net search, no opening book. Without some move randomisation, there wouldn't be a sensible way to measure its strength.
There is. Predefined opening variations. There are countless opening suites out there and it's more objective from version to version since you test the same positions every time.

An opening book that's not really integrated into the engine (i.e., one that just start from an arbitrary FEN, without replaying all opening moves) would not work that well with Leela, since she needs the move history. If you like an opening suite for testing purposes, you could implement it and open a pull request on https://github.com/glinscott/leela-chess/.

Personally I find that letting the neural net choose its own openings from the promising candidate moves measures its capabilities better, but I know that not everyone shares that opinion. But if you like an opening book, you might have to code the option yourself (or wait until someone else does it I suppose).

Robert Pope · Post by **Robert Pope** » Mon Apr 09, 2018 4:12 pm

jkiliani wrote:
George Tsavdaris wrote:
jkiliani wrote: This may be true, but determinism is less than ideal for Leela particularly since it has only the neural net search, no opening book. Without some move randomisation, there wouldn't be a sensible way to measure its strength.
There is. Predefined opening variations. There are countless opening suites out there and it's more objective from version to version since you test the same positions every time.
An opening book that's not really integrated into the engine (i.e., one that just start from an arbitrary FEN, without replaying all opening moves) would not work that well with Leela, since she needs the move history. If you like an opening suite for testing purposes, you could implement it and open a pull request on https://github.com/glinscott/leela-chess/.

Personally I find that letting the neural net choose its own openings from the promising candidate moves measures its capabilities better, but I know that not everyone shares that opinion. But if you like an opening book, you might have to code the option yourself (or wait until someone else does it I suppose).

Many/most tournament managers like cutechess can play out the actual openings, not just send a FEN.

And you can make the same argument about letting an engine choose its own openings about any engine, not just neural net ones.

lantonov · Post by **lantonov** » Tue Apr 10, 2018 9:57 am

Laskos wrote: I am bit puzzled by their ratings here http://lczero.org/networks.

I was puzzled too before M. van der Bergh explained in https://groups.google.com/d/msg/lczero/ ... IqeuQsBwAJ that rating measurements as done for the various networks are not reliable.
As I understand, the networks are matched against the highest rated previous network and therefore the points on the network graph on the main page are not additive.
Additionally, the openings in matches are too uniform to allow reliable Elo comparisons.

LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo