Page 1 of 2

lczero rating

Posted: Mon Apr 02, 2018 5:31 pm
by stavros
hi what is the current elo rating (ccrl elo) of lczero ,also is there any saturation so far?

Re: lczero rating

Posted: Mon Apr 02, 2018 5:43 pm
by Laskos
stavros wrote:hi what is the current elo rating (ccrl elo) of lczero ,also is there any saturation so far?
About 2000 CCRL Elo points at blitz (40/4') on a good GPU or 4-8 core i7 CPU. It seems some saturation already appears to happen, but this can still be corrected by some change in parameters (noise, temperature). Then, another network will be necessary, and starting from zero.
Here is the plot as a number of games:
http://lczero.org/

But it still might improve. LC0 seems to play terrible endgames (not in any way 2000 Elo level), from what I saw with my own eyes, something should be done in this respect.

Also, positionally it can be strong in openings, much above 2000 Elo level. But tactically it is much below 2000 Elo points. And "much" can mean 1000 or so Elo points. Maybe some improvement on the MCTS rollouts can be improvised, Daniel Shawl posted some interesting results.

Re: lczero rating

Posted: Mon Apr 02, 2018 6:15 pm
by jkiliani
Laskos wrote:
stavros wrote:hi what is the current elo rating (ccrl elo) of lczero ,also is there any saturation so far?
About 2000 CCRL Elo points at blitz (40/4') on a good GPU or 4-8 core i7 CPU. It seems some saturation already appears to happen, but this can still be corrected by some change in parameters (noise, temperature). Then, another network will be necessary, and starting from zero.
Here is the plot as a number of games:
http://lczero.org/

But it still might improve. LC0 seems to play terrible endgames (not in any way 2000 Elo level), from what I saw with my own eyes, something should be done in this respect.

Also, positionally it can be strong in openings, much above 2000 Elo level. But tactically it is much below 2000 Elo points. And "much" can mean 1000 or so Elo points. Maybe some improvement on the MCTS rollouts can be improvised, Daniel Shawl posted some interesting results.
It will not be necessary to start from zero once the network stalls. Instead, a larger neural net can simply be trained from existing self-play games, afterward the net can continue to improve.

By the way, I don't find LC0's endgame terrible at all, at least not in match games. It's not optimised to win quickly, sure, but I rarely see it giving away a certain win. Taking too long to convert a won position might be unappealing to humans, but is no sign of weakness as long as the position IS won in the end.

Re: lczero rating

Posted: Mon Apr 02, 2018 6:24 pm
by jpqy

Re: lczero rating

Posted: Mon Apr 02, 2018 6:36 pm
by Laskos
jkiliani wrote:
Laskos wrote:
stavros wrote:hi what is the current elo rating (ccrl elo) of lczero ,also is there any saturation so far?
About 2000 CCRL Elo points at blitz (40/4') on a good GPU or 4-8 core i7 CPU. It seems some saturation already appears to happen, but this can still be corrected by some change in parameters (noise, temperature). Then, another network will be necessary, and starting from zero.
Here is the plot as a number of games:
http://lczero.org/

But it still might improve. LC0 seems to play terrible endgames (not in any way 2000 Elo level), from what I saw with my own eyes, something should be done in this respect.

Also, positionally it can be strong in openings, much above 2000 Elo level. But tactically it is much below 2000 Elo points. And "much" can mean 1000 or so Elo points. Maybe some improvement on the MCTS rollouts can be improvised, Daniel Shawl posted some interesting results.
It will not be necessary to start from zero once the network stalls. Instead, a larger neural net can simply be trained from existing self-play games, afterward the net can continue to improve.

By the way, I don't find LC0's endgame terrible at all, at least not in match games. It's not optimised to win quickly, sure, but I rarely see it giving away a certain win. Taking too long to convert a won position might be unappealing to humans, but is no sign of weakness as long as the position IS won in the end.
Yes, endgames are usually converted, silly ways, but I saw also some elementary misses.

Compare opening positional versus middlegame tactical abilities:

On positional opening suite:
Openings200beta07 (200 positions, 20s/position)

Code: Select all

[Search parameters: MaxDepth=99   MaxTime=20.0   DepthDelta=2   MinDepth=7   MinTime=0.1] 

Engine                         : Correct  TotalPos  Corr%  AveT(s)  MaxT(s)  TestFile 
      
Komodo 10.2 64-bit             :     145       200   72.5      2.0     20.0  openings200beta07.epd 
Houdini 5.01 Pro x64           :     144       200   72.0      2.4     20.0  openings200beta07.epd    
Stockfish 8 64 BMI2            :     141       200   70.5      2.0     20.0  openings200beta07.epd 
Houdini 5.01 Pro x64 Tactical  :     139       200   69.5      2.3     20.0  openings200beta07.epd      
Deep Shredder 13 x64           :     128       200   64.0      2.7     20.0  openings200beta07.epd    
Houdini 4 Pro x64              :     126       200   63.0      1.8     20.0  openings200beta07.epd    
Andscacs 0.88n                 :     123       200   61.5      2.4     20.0  openings200beta07.epd 
Houdini 4 Pro x64 Tactical     :     120       200   60.0      1.6     20.0  openings200beta07.epd 
Nirvanachess 2.3               :     119       200   59.5      1.8     20.0  openings200beta07.epd 
Fire 5 x64                     :     110       200   55.0      3.0     20.0  openings200beta07.epd    
Texel 1.06 64-bit              :     110       200   55.0      1.6     20.0  openings200beta07.epd    
Fritz 15       (3227 CCRL)     :     102       200   51.0      1.9     20.0  openings200beta07.epd  

LCZero  *************  ID69    :      98       200   49.0      2.7     20.0  openings200beta07.epd 
  
Fruit 2.1      (2685 CCRL)     :      91       200   45.5      1.5     20.0  openings200beta07.epd  
Sjaak II 1.3.1 (2194 CCRL)     :      75       200   37.5      4.0     20.0  openings200beta07.epd    
BikJump v2.01  (2098 CCRL)     :      74       200   37.0      1.6     20.0  openings200beta07.epd


On tactical middlegame suite:
ECM (879 positions, 1s/position)

Code: Select all

BikJump v2.01    (2098 CCRL Elo)
score=574/879 [averages on correct positions: depth=4.6 time=0.19 nodes=467671]

Predateur 2.2.1  (1786 CCRL Elo)
score=486/879 [averages on correct positions: depth=6.1 time=0.13 nodes=409596]

LCZero (ID69)
score=171/879 [averages on correct positions: depth=13.5 time=0.25 nodes=318]
Terrible tactical abilities of LC0, even hard to estimate Elo-wise on CCRL scale compared to stable standard engine.

Re: lczero rating

Posted: Mon Apr 02, 2018 7:17 pm
by stavros
correct me if iam wrong but even google Alphazero progress saturated after 700000
steps https://arxiv.org/pdf/1712.01815.pdf#page=4
i cant imagine lczero to match the latests top emgines.
already latest sd dv+cerebelum book is close to aplhazero

Re: lczero rating

Posted: Mon Apr 02, 2018 8:12 pm
by CMCanavessi
After Leela finally beat TSCP I had to get a newer, stronger opponent and this time it was Vice 1.1, which is around 300 elo stronger than TSCP.

I matched Leela ID 69 vs Vice 1.1 in a 40/40 10-game match (one of the games went on for more than 7 hours!!) and the end result was a surprising 5.5-4.5, in favor of Vice. I would have expected Vice to dominate much more, but looks like Leela is learning tricks really fast.


Here's the 10-game match PGN: http://www.mediafire.com/file/l1sltwy5k ... 20Vice.pgn

Re: lczero rating

Posted: Mon Apr 02, 2018 8:50 pm
by George Tsavdaris
stavros wrote:correct me if iam wrong but even google Alphazero progress saturated after 700000
steps https://arxiv.org/pdf/1712.01815.pdf#page=4
i cant imagine lczero to match the latests top emgines.
already latest sd dv+cerebelum book is close to aplhazero
What is "steps"?

Re: lczero rating

Posted: Mon Apr 02, 2018 8:57 pm
by George Tsavdaris
jkiliani wrote: It will not be necessary to start from zero once the network stalls. Instead, a larger neural net can simply be trained from existing self-play games, afterward the net can continue to improve.
What is the ratio of time of generating self-play games to training from these games? If it is 10:1 for example then creating a bigger NN and training it again then no harm is done once you have the self-played games.

BUT since these self-played games have been played by a smaller(and weaker) NN, by training from them a bigger NN, doesn't this creates an non optimum procedure?

Re: lczero rating

Posted: Mon Apr 02, 2018 9:24 pm
by stavros
George Tsavdaris wrote:
stavros wrote:correct me if iam wrong but even google Alphazero progress saturated after 700000
steps https://arxiv.org/pdf/1712.01815.pdf#page=4
i cant imagine lczero to match the latests top emgines.
already latest sd dv+cerebelum book is close to aplhazero
What is "steps"?
from : https://arxiv.org/pdf/1712.01815.pdf#page=4

"We trained a separate instance of
AlphaZero
for each game. Training proceeded
for 700,000 steps (mini-batches of size 4,096) starting from randomly initialised parameters,
using 5,000 first-generation TPUs (
15
) to generate self-play games and 64 second-generation
TPUs to train the neural networks.
1
Further details of the training procedure are provided in the
Methods."