Something goes wrong with lc0 since yesterday?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
Laskos
Posts: 9242
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Something goes wrong with lc0 since yesterday?

Post by Laskos » Fri Aug 31, 2018 4:09 am

CMCanavessi wrote:
Thu Aug 30, 2018 8:32 pm
There's an interesting ongoing match between lc0 on 1 P100 and SFdev on 57 threads here: http://tcecbeta.chessdb.cn/bonusbeta/live.html

So far after 11 games, SF is +1.
Not counting the disconnect at the start of the 15th game, which was counted as loss for SF, after 14 games: +2 -0 =12 for SF dev. Still, very good result for Leela, as P100 is maybe only 2 times faster than 1080ti due to fp16 (or less than 2 times?). So, maybe Leela still scales well at big time x hardware?

As the 1xxxx run seems to have have stopped right when I returned from vacation, I did some tests. My results seem to indicate that not all is smooth there. First, real games (no adjudications, as I saw Leela drawing completely won endgames and losing completely drawn endgames) against SF8. Lc0 on GTX 1060, SF8 on one 3.8GHz i7 core.
Time control: 2 minutes + 2 seconds increment.

Code: Select all

ID10774
Score of lc0_v16 10774 (GTX 1060) vs SF 8 (1 core): 10 - 5 - 25 [0.563]
Elo difference: 43.66 +/- 66.31

40 of 40 games finished.
And a much newer higher rated ID:

Code: Select all

ID11199
Score of lc0_v17 11199 (GTX 1060) vs SF 8 (1 core): 5 - 8 - 27 [0.463]
Elo difference: -26.11 +/- 61.76

40 of 40 games finished.
I expected a different result. Say, not 160 Elo points higher (as shown in their self-games rating), but some 50. Yes, too few games, but the likelihood that ID11199 is not stronger than ID10774 is 93%. The likelihood that it is not 50 Elo points stronger is 99.5%. So, it seems not much progress, if not a regress, at successive lowerings of LR, which is unexpected and bad.
==================

Scaling:

I don't have neither big hardware nor can play LTC games. But test-suites, although indirectly, can show some scaling results at long time control in shorter total amount of time.

I tried first STS1500 test suite. Leela performs pretty miserably at it, weaker than Fruit 2.1 even at 20s/position, probably because called "Strategical", it still contains too much tactics. The result on STS shows Lc0 performing about 800 Elo points weaker than in real games. So, I left that suite aside to check the scaling, and took my own very positional suite of 200 positions, Openings200.epd. On it, Leela performs close to the strength shown in real games.

I am confident that from 0.1s/move to 1s/move, Leela does scale much better than any AB regular engine.
Openings200.epd results comparing Leela ID11199 on GTX 1060 to SF dev (one i7 core):

Lc0 ID11199:

Code: Select all

0.1s
score=51/200 [averages on correct positions: depth=1.9 time=0.06 nodes=45]

1.0s
score=105/200 [averages on correct positions: depth=3.8 time=0.30 nodes=392]

+54
Improvement by 54 solved positions.


SF dev 1 core:

Code: Select all

0.1s
score=92/200 [averages on correct positions: depth=8.4 time=0.02 nodes=41226]

1.0s
score=121/200 [averages on correct positions: depth=10.4 time=0.17 nodes=307632]

+29
Improvement by 29 solved positions.

The scaling at STC does indeed seem to be much better for Leela compared to Stockfish.


My issues in the past were the scaling from 4s/move to 20s/move. I seem to get pretty bad results for Leela in real games, but in too few games (I cannot play many LTC games). Let's see the scaling on this testsuite.
Openings200.epd results comparing Leela ID11199 on GTX 1060 to SF dev (one i7 core):

Lc0 ID11199:

Code: Select all

4.0s
score=125/200 [averages on correct positions: depth=4.9 time=1.30 nodes=2839]

20.0s
score=137/200 [averages on correct positions: depth=5.5 time=3.23 nodes=8414]

+12
Improvement by 12 solved positions.


SF dev 1 core:

Code: Select all

4.0s
score=131/200 [averages on correct positions: depth=13.2 time=0.67 nodes=1125191]

20.0s
score=153/200 [averages on correct positions: depth=14.4 time=1.93 nodes=2993124]

+22
Improvement by 22 solved positions.

So, from 4s/position to 20s/position (scaling at longer time control), scaling seems to be better for Stockfish than Leela.
I am not sure if this result is in any way conclusive. After all, this is just a test suite, not real games.

User avatar
George Tsavdaris
Posts: 1557
Joined: Thu Mar 09, 2006 11:35 am

Re: Something goes wrong with lc0 since yesterday?

Post by George Tsavdaris » Fri Aug 31, 2018 11:54 am

Laskos wrote:
Fri Aug 31, 2018 4:09 am
CMCanavessi wrote:
Thu Aug 30, 2018 8:32 pm
There's an interesting ongoing match between lc0 on 1 P100 and SFdev on 57 threads here: http://tcecbeta.chessdb.cn/bonusbeta/live.html

So far after 11 games, SF is +1.
Not counting the disconnect at the start of the 15th game, which was counted as loss for SF, after 14 games: +2 -0 =12 for SF dev.
I guess he counted the win of Leela 2 days before in a similar 10 match also where Leela 4xP100 versus Stockfish Dev 43 cores, was on +1 -0 =9 in favor of Leela(!!), after a very good win for Leela where Stockfish(black player) is completely blind here about g4 and played Rf7 and went on to lose after g4 g5 g6 that Leela played:

[d]rn1q1rk1/1b5p/pp2p2B/3p1p1Q/4p3/2P4P/P1P1B1P1/R4R1K b - - 0 19

So all in all Leela was versus the 43 core(1st day)/57 threads(the 2nd day) Stockfish +1 -2 =21 a -14 ± 50 Elo performance!
I find this incredible to be honest because of the huge hardware for Stockfish! :shock:

As the 1xxxx run seems to have have stopped right when I returned from vacation, I did some tests. My results seem to indicate that not all is smooth there. First, real games (no adjudications, as I saw Leela drawing completely won endgames and losing completely drawn endgames) against SF8. Lc0 on GTX 1060, SF8 on one 3.8GHz i7 core.
Time control: 2 minutes + 2 seconds increment.

Code: Select all

ID10774
Score of lc0_v16 10774 (GTX 1060) vs SF 8 (1 core): 10 - 5 - 25 [0.563]
Elo difference: 43.66 +/- 66.31

40 of 40 games finished.
And a much newer higher rated ID:

Code: Select all

ID11199
Score of lc0_v17 11199 (GTX 1060) vs SF 8 (1 core): 5 - 8 - 27 [0.463]
Elo difference: -26.11 +/- 61.76

40 of 40 games finished.
I expected a different result. Say, not 160 Elo points higher (as shown in their self-games rating), but some 50. Yes, too few games, but the likelihood that ID11199 is not stronger than ID10774 is 93%. The likelihood that it is not 50 Elo points stronger is 99.5%. So, it seems not much progress, if not a regress, at successive lowerings of LR, which is unexpected and bad.
I will have to test these 111xx and 112xx nets myself to see, but with so many nets i have to be careful what to choose. Since from net to e.g net+5 big deviations may happen.
Till now i had for CCRL 40/4 performance for the same guanlet:

Lc0v16 Test10 10520 after 100 games had 3480±48 CCRL 40/4 Elo
Lc0v16 Test10 10800 after 100 games had 3397±44 CCRL 40/4 Elo
Lc0v16 Test10 10815 after 100 games had 3489±53 CCRL 40/4 Elo
Lc0v17 Test10 11089 after 100 games had 3460±46 CCRL 40/4 Elo

e.g:

Code: Select all

  Program          CCRL Elo   Error(cl 95%)          Games            Score 
Lc0v17 11089       3459.8       ±45.8            100 (+49,=45,-6)     71.5 %

   vs.                           :  games (  +,  =, -),   (%) :    Diff,    SD, CFS (%)
   Stockfish 8                   :     20 (  6, 12, 2),  60.0 :   +36.8,  23.4,   94.3
   Fire 7.1                      :     20 (  8, 12, 0),  70.0 :  +118.8,  23.4,  100.0
   Booot 6.3.1                   :     20 ( 12,  6, 2),  75.0 :  +197.8,  23.4,  100.0
   Andscacs 9.3                  :     20 ( 13,  5, 2),  77.5 :  +252.8,  23.4,  100.0
   Ethereal10.81-x64-pext        :     20 ( 10, 10, 0),  75.0 :  +259.8,  23.4,  100.0
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....

User avatar
Laskos
Posts: 9242
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Something goes wrong with lc0 since yesterday?

Post by Laskos » Fri Aug 31, 2018 10:44 pm

George Tsavdaris wrote:
Fri Aug 31, 2018 11:54 am

I will have to test these 111xx and 112xx nets myself to see, but with so many nets i have to be careful what to choose. Since from net to e.g net+5 big deviations may happen.
Till now i had for CCRL 40/4 performance for the same guanlet:

Lc0v16 Test10 10520 after 100 games had 3480±48 CCRL 40/4 Elo
Lc0v16 Test10 10800 after 100 games had 3397±44 CCRL 40/4 Elo
Lc0v16 Test10 10815 after 100 games had 3489±53 CCRL 40/4 Elo
Lc0v17 Test10 11089 after 100 games had 3460±46 CCRL 40/4 Elo
I tested LTC 3600'' + 36'' games from standard opening position in two games (side and reversed) the Lco ID11199 on GTX 1060 against SF8 on 1 core. They are about equal at 120'' + 2''. The result was one draw as White for Lc0 and one loss as Black. As White, it seems Lc0 missed a possible win, in middlegame and endgame. Generally, after 30+ million trainings games, endgames are still very bad, as I will exemplify. First, 2 LCT games:



I adjudicated it as an obvious draw, avoiding maybe 2 hours of shuffling then exchanging a pawn. Lc0 shows at some times +3.50 advantage, and even SF8 had shown in middle game 1.60 advantage for Lc0, but Lc0 failed to convert, mostly in endgame.

The reverse game was a clear win of SF8:



So, not a very good result for Lc0 at LTC.

Now, about endgames. From 5-men not so easy wins starting positions, at 60'+ 1'' tc, Lc0 converted 4/20 of them against SF8, the rest being draws, SF8 converted 19/20 against Lc0, and 1 was draw. Really poor performance of Lc0 in late endgame. Here is the PGN:
http://s000.tinyupload.com/?file_id=057 ... 1538372275

PS No Syzygy TBs were used

Post Reply