LCZero: Progress and Scaling. Relation to CCRL Elo

hgm · Post by **hgm** » Tue Apr 03, 2018 8:37 pm

whereagles wrote:

Leela is black.. 2.36% chance to win on a lone king??

Close enough to 0 to make no difference even at 3500 Elo.

George Tsavdaris · Post by **George Tsavdaris** » Tue Apr 03, 2018 9:15 pm

whereagles wrote:

Leela is black.. 2.36% chance to win on a lone king??

How is this possible?? I mean how do the rollouts result in black wins in order black to have wins in its statistics??

Nay Lin Tun · Post by **Nay Lin Tun** » Tue Apr 03, 2018 9:26 pm

@Kai, is it possible to share your opening test suit?

Daniel Shawul · Post by **Daniel Shawul** » Tue Apr 03, 2018 9:26 pm

She averages her tree like no other, and sucks because of that

What is happening is that white will sometimes give up its queen in the tree and the score becomes close to a draw. When that is averaged with the non-loosing lines you get this non-existant winning probability.

Tried it on scorpioMCTS with averaging and minimax backups

With averaging like leela's, a score of -627 is like 2% winning chance

Code: Select all

15 -570 5 118404  Ka6-b6 Kd3-c4 Kb6-b7 Qc2-b2
15 -545 7 158627  Ka6-b5 Qc2-g2 Kb5-a4 Qg2-g4 Ka4-b5 Qg4-c4 Kb5-a5 Qc4-c2 Ka5-b4 Qc2-f2 Kb4-b3
16 -569 7 167639  Ka6-b5 Qc2-g2 Kb5-a4 Qg2-d5 Ka4-b4 Qd5-c4 Kb4-a5 Qc4-c2 Ka5-b4 Qc2-f2 Kb4-b3
16 -595 8 177007  Ka6-b5 Qc2-g2 Kb5-a4 Qg2-f2 Ka4-b5 Qf2-f5 Kb5-b4
16 -611 9 219766  Ka6-b5 Qc2-g2 Kb5-a4 Qg2-g7 Ka4-b3 Qg7-b7 Kb3-a4 Qb7-f3
17 -627 9 227292  Ka6-b5 Qc2-g2 Kb5-a4 Kd3-c3 Ka4-b5 Qg2-d5

With minimaxing a score of -2023 is a 0% winning chance

Code: Select all

2 -2023 0 554  Ka6-b7 Qc2-b2 Kb7-c8
3 -2034 0 3588  Ka6-b5 Qc2-c4 Kb5-a5 Qc4-c5 Ka5-a4
4 -2044 0 12593  Ka6-b6 Qc2-a4 Kb6-c7 Qa4-f4 Kc7-d7
5 -2056 0 23814  Ka6-b6 Qc2-b2 Kb6-c5 Qb2-e5 Kc5-b4
6 -2063 2 47803  Ka6-b6 Kd3-c3 Kb6-a6 Qc2-a2 Ka6-b6
7 -2064 3 91732  Ka6-b6 Kd3-c3 Kb6-c6 Qc2-a2 Kc6-b6

Laskos · Post by **Laskos** » Tue Apr 03, 2018 9:40 pm

Nay Lin Tun wrote:@Kai, is it possible to share your opening test suit?

Sure, the link in this post should work:
http://www.talkchess.com/forum/viewtopi ... 5&start=14

tmokonen · Post by **tmokonen** » Tue Apr 03, 2018 9:55 pm

George Tsavdaris wrote: How is this possible?? I mean how do the rollouts result in black wins in order black to have wins in its statistics??

L0 takes both wins and draws into consideration, so there's a small residual score from rollouts that result in draws.

Uri Blass · Post by **Uri Blass** » Tue Apr 03, 2018 11:30 pm

George Tsavdaris wrote:
whereagles wrote:

Leela is black.. 2.36% chance to win on a lone king??
How is this possible?? I mean how do the rollouts result in black wins in order black to have wins in its statistics??

I think it means expected outcome of 2.36% that may be probability of 4.72% for a draw and 95.28% to lose.

MonteCarlo · Post by **MonteCarlo** » Wed Apr 04, 2018 11:36 pm

Indeed. Leela's output is an expected score, not actually a win%.

The wording on the site has since been changed, it seems, although now it includes this "50%=draw" bit in the legend, which caused some debate in the discord.

Probably should just say "50%=equal chances" or some such thing ("expected score" is pretty self-explanatory, so could probably do away with the legend altogether), but not a big deal.

Last net to pass is actually fairly reasonable now. It'll be interesting to see where we are a week from now (it's not even been a week since the last big bug was fixed).

Laskos · Post by **Laskos** » Thu Apr 05, 2018 12:19 am

Large improvement from ID69 to ID83.

Now, in only 100 games gauntlets against Zurichess Bern (2232 Elo CCRL) and BikJump v2.01 (2098 Elo CCRL), it performs at about 2200 Elo level at 1s/move and at about 2300 Elo level at 10s/move. On a full 4 core i7 CPU.

On my positional opening test suite of 200 positions, it is firmly settled amongst strong engines (20s/position):

Code: Select all

&#91;Search parameters&#58; MaxDepth=99   MaxTime=20.0   DepthDelta=2   MinDepth=7   MinTime=0.1&#93; 

Engine                         &#58; Correct  TotalPos  Corr%  AveT&#40;s&#41;  MaxT&#40;s&#41;  TestFile 
      
Komodo 10.2 64-bit             &#58;     145       200   72.5      2.0     20.0  openings200beta07.epd 
Houdini 5.01 Pro x64           &#58;     144       200   72.0      2.4     20.0  openings200beta07.epd    
Stockfish 8 64 BMI2            &#58;     141       200   70.5      2.0     20.0  openings200beta07.epd 
Houdini 5.01 Pro x64 Tactical  &#58;     139       200   69.5      2.3     20.0  openings200beta07.epd      
Deep Shredder 13 x64           &#58;     128       200   64.0      2.7     20.0  openings200beta07.epd    
Houdini 4 Pro x64              &#58;     126       200   63.0      1.8     20.0  openings200beta07.epd    
Andscacs 0.88n                 &#58;     123       200   61.5      2.4     20.0  openings200beta07.epd 
Houdini 4 Pro x64 Tactical     &#58;     120       200   60.0      1.6     20.0  openings200beta07.epd 
Nirvanachess 2.3               &#58;     119       200   59.5      1.8     20.0  openings200beta07.epd 
Fire 5 x64     &#40;3341 CCRL&#41;     &#58;     110       200   55.0      3.0     20.0  openings200beta07.epd    
Texel 1.06     &#40;3162 CCRL&#41;     &#58;     110       200   55.0      1.6     20.0  openings200beta07.epd    

LCZero  *************  ID83    &#58;     109       200   54.5      1.1     20.0  openings200beta07.epd

Fritz 15       &#40;3227 CCRL&#41;     &#58;     102       200   51.0      1.9     20.0  openings200beta07.epd  

LCZero  *************  ID69    &#58;      98       200   49.0      2.7     20.0  openings200beta07.epd 
  
Fruit 2.1      &#40;2685 CCRL&#41;     &#58;      91       200   45.5      1.5     20.0  openings200beta07.epd  
Sjaak II 1.3.1 &#40;2194 CCRL&#41;     &#58;      75       200   37.5      4.0     20.0  openings200beta07.epd    
BikJump v2.01  &#40;2098 CCRL&#41;     &#58;      74       200   37.0      1.6     20.0  openings200beta07.epd

It improved significantly positionally from ID69 (in only 3 days).

Tactically it seems very weak. On ECM tactical middlegame suite of 879 positions, it performs very badly (1s/position):

Code: Select all

Bik Jump 2.01    &#40;2098 CCRL Elo&#41;
score=574/879 &#91;averages on correct positions&#58; depth=4.6 time=0.19 nodes=467671&#93;

Predateur 2.2.1  &#40;1786 CCRL Elo&#41;
score=486/879 &#91;averages on correct positions&#58; depth=6.1 time=0.13 nodes=409596&#93;

LCZero &#40;ID83&#41;
score=173/879 &#91;averages on correct positions&#58; depth=13.6 time=0.24 nodes=312&#93;

LCZero &#40;ID69&#41;
score=171/879 &#91;averages on correct positions&#58; depth=13.5 time=0.25 nodes=318&#93;

And doesn't seem to improve at all. The estimated CCRL Elo level on this tactical suite is about that of Stockfish at depth=3, or maybe 1400 CCRL Elo points. Something has to be done with its MCTS search, maybe on the lines outlined by Daniel Shawl.

MonteCarlo · Post by **MonteCarlo** » Thu Apr 05, 2018 4:27 am

Thanks for the update Kai!

On the one hand, it's quite possible that a fundamental change to its MCTS implementation will be required at some point if it wants to compete at the highest level, and the work Daniel Shawul has done with Scorpio could prove quite useful in that case (well, it's fantastic work in any case; it's just in this case that it would benefit LC0

).

On the other hand, unless you subscribe to some form of conspiracy theory around the A0 results, we're nowhere near the limits of this sort of approach, so I wouldn't worry too much about that just yet.

Right now there are still a bunch of bugs being worked out, the network is still rather small, and the project is rather young (barely a month old, and it's barely been a week since the last major bug was discovered and fixed).

Some patience is required. It might turn out that switching to a new implementation of MCTS is required; it might also turn out that the NN at some level gives good enough prior probabilities for moves that even MCTS with averaging is good tactically.

We'll just have to give it some time

LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo