leela is official(?) better than sf9

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Javier Ros
Posts: 181
Joined: Fri Oct 12, 2012 10:48 am
Location: Seville (SPAIN)
Full name: Javier Ros

Re: leela is official(?) better than sf9

Post by Javier Ros » Thu Sep 13, 2018 11:32 am

Laskos wrote:
Thu Sep 13, 2018 8:01 am

Wow, that's VERY interesting! So, you get about equal results against SF8 on 4 cores from positions used in AlphaZero paper, and +0 -9 =11 result, or some -170 Elo points from my 3-mover balanced positions? My positions are balanced and collected from GM and IM human games, so they are not some nutty positions from random 2-mover openings of SF framework. I have about 900 3-mover positions in my suite, and I am quite sure it's a good set to test on. The conclusion would be: AlphaZero team choose on purpose openings suitable to AlphaZero to boost significantly the performance, by more than 100 Elo points. That's remarkable, as I didn't expect some regular 3-movers to change dramatically the result. One has to play diverse openings, because, first, other engines might not reply the way Lc0 plays, second, other engines might use a book. In fact CCCC results, which show, like me, Lc0 the level of Fire 7.1, from initial standard opening position, show that other engines do not play the openings as Lc0 would like them to play.

But anyway, I don't understand the total nodes issue, the depth issue (this is not only about initial depth, but the depths reached after say 10s), NPS issue. Your NPS seem very stable from the beginning, I even saw peaks in NPS at 1-2s in a Youtube video, larger than after 10s, while my NPS are increasing slowly, being almost twice higher after 10s search than after less than 1s search. Hash issue as observed by Milos seemed quite plausible to me.

Interesting, and my advice would be to not use AlphaZero openings, they seem to be chosen not quite fairly against SF8.
I don't know what the hell is going on with the amount of nodes, if it's the fault of the engine or Arena, but not mine.

"Hash issue as observed by Milos seemed quite plausible to me." I have shown that repeating the games doesn't help much lc0, as accidentaly has happened in my last test. You can see from the above score

Lc01711261 - Stockfish_8_x64_bmi2 : 5,5/20 0-9-11 (=0 0= =0 =0 =0 == 00 =0 == 0=) 28%
Stockfish_8_x64_bmi2 - Lc01711261 : 14,5/20 9-0-11 (=1 1= =1 =1 =1 == 11 =1 == 1=) 73%

First two games Lc0 0.5-Sto8 1.5, were replayed in games 3-4 with the same result Lc0 0.5-Sto8 1.5, but different colours,
Games 5-6, Lc0 0.5-Sto8 1.5, replayed in games 7-8 with the same result Lc0 0.5-Sto8 1.5and colours,
Games 9-10, Lc0 0.5-Sto8 1.5, replayed in games 11-12 Lc0 1-Sto8 1, here lc0 improves,
Games 13-14, Lc0 0-Sto8 2, replayed in games 15-16 Lc0 0.5-Sto8 1.5, here lc0 improves,
Games 17-18, Lc0 1-Sto8 1, replayed in games 19-20 Lc0 0.5-Sto8 1.5, here Stockfish8 improves.

"The conclusion would be: AlphaZero team choose on purpose openings suitable to AlphaZero to boost significantly the performance, by more than 100 Elo points."

No, the conclusion is that some initial positions lead more probably than others to tactical play, where lc0 fails. Until now, we don't know about the tactical abilities of AlphaZero. We must see the complete 100 games and the 1200 with the other initial positions. These 12 initial positions were chosen because they were the most played, as they said in their paper:

"12 most popular human openings (played more than 100,000 times in an online database)"

From my experience with lc0, if you put complicated tactical initial positions the better Stochfish scores, if you put starting position or quiet positional position the better lc0 scores.

leela is official(?) better than sf9??

Positionally probably, tactically of course not and globally either.

But it is very interesting to see how with its low NPS, lc0 is able to stop Stockfish deep search in some games.

Recommendation for immature Stockfish fanboys, they just have to look for balanced positions with a lot of tactic so that Stockfish scores much more than lc0, even 100% I can assure.

Now I am testing lc011261 against Stockfish 8, same conditions as before, starting from the 10 Nunn positions

After 12 games of the 20, the score is

Engine Score Lc
1: Stockfish_8_x64_bmi2 8,0/12 ===1==1101=1
2: Lc01711261 4,0/12 ············

You can see some games at youtube:

https://youtu.be/3fogVPj0RRk

https://youtu.be/Fh-QF07oiXQ


The love relationship between a chess engine tester and his computer can be summarized in one sentence:
Until heat do us part.

Javier Ros
Posts: 181
Joined: Fri Oct 12, 2012 10:48 am
Location: Seville (SPAIN)
Full name: Javier Ros

Re: leela is official(?) better than sf9

Post by Javier Ros » Thu Sep 13, 2018 11:34 am

jp wrote:
Thu Sep 13, 2018 10:18 am
Javier Ros wrote:
Mon Sep 10, 2018 5:42 pm
Using the following set of 12 positions of AlphaZero

[ECO "A10"][PlyCount "1"] 1. c4 *
[ECO "D06"][PlyCount "3"] 1. d4 d5 2. c4 *
[ECO "A46"][PlyCount "3"] 1. d4 Nf6 2. Nf3 *
[ECO "A50"][PlyCount "4"] 1. d4 Nf6 2. c4 e6 *
[ECO "E61"][PlyCount "5"] 1. d4 Nf6 2. c4 g6 3. Nc3 *
[ECO "C01"][PlyCount "4"] 1. e4 e6 2. d4 d5 *
[ECO "B50"][PlyCount "4"] 1. e4 c5 2. Nf3 d6 *
[ECO "B30"][PlyCount "4"] 1. e4 c5 2. Nf3 Nc6 *
[ECO "B40"][PlyCount "4"] 1. e4 c5 2. Nf3 e6 *
[ECO "C68"][PlyCount "6"] 1. e4 e5 2. Nf3 Nc6 3. Bb5 a6 *
[ECO "B10"][PlyCount "2"] 1. e4 c6 *
[ECO "A05"][PlyCount "2"] 1. Nf3 Nf6 *
So you mean the 12 positions in the diagrams in their Table 2.
Don't know about others, but in my web browser it didn't show these in your original post, so I couldn't see the positions you meant.

frankp wrote:
Thu Sep 13, 2018 8:45 am
I thought there was no opening book at all.
Which meant, as with the other aspects of the match, SF8 was not playing optimally.
There was never an opening book but in the Table 2 games they started from positions in Table 2, where they claimed win/draw/loss: w 242/353/5, b 48/533/19. What they called their match was 100 games from the initial position.



The love relationship between a chess engine tester and his computer can be summarized in one sentence:
Until heat do us part.

Javier Ros
Posts: 181
Joined: Fri Oct 12, 2012 10:48 am
Location: Seville (SPAIN)
Full name: Javier Ros

Re: leela is official(?) better than sf9

Post by Javier Ros » Thu Sep 13, 2018 1:51 pm

Last 8 games of the test and final score:


Lc01711261 - Stockfish_8_x64_bmi2 : 7,5/20 2-7-11 (===0==0010=0=01==0==) 38%
Stockfish_8_x64_bmi2 - Lc01711261 : 12,5/20 7-2-11 (===1==1101=1=10==1==) 63%



The love relationship between a chess engine tester and his computer can be summarized in one sentence:
Until heat do us part.

User avatar
Laskos
Posts: 9324
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: leela is official(?) better than sf9

Post by Laskos » Thu Sep 13, 2018 2:29 pm

Javier Ros wrote:
Thu Sep 13, 2018 11:32 am
Laskos wrote:
Thu Sep 13, 2018 8:01 am

Wow, that's VERY interesting! So, you get about equal results against SF8 on 4 cores from positions used in AlphaZero paper, and +0 -9 =11 result, or some -170 Elo points from my 3-mover balanced positions? My positions are balanced and collected from GM and IM human games, so they are not some nutty positions from random 2-mover openings of SF framework. I have about 900 3-mover positions in my suite, and I am quite sure it's a good set to test on. The conclusion would be: AlphaZero team choose on purpose openings suitable to AlphaZero to boost significantly the performance, by more than 100 Elo points. That's remarkable, as I didn't expect some regular 3-movers to change dramatically the result. One has to play diverse openings, because, first, other engines might not reply the way Lc0 plays, second, other engines might use a book. In fact CCCC results, which show, like me, Lc0 the level of Fire 7.1, from initial standard opening position, show that other engines do not play the openings as Lc0 would like them to play.

But anyway, I don't understand the total nodes issue, the depth issue (this is not only about initial depth, but the depths reached after say 10s), NPS issue. Your NPS seem very stable from the beginning, I even saw peaks in NPS at 1-2s in a Youtube video, larger than after 10s, while my NPS are increasing slowly, being almost twice higher after 10s search than after less than 1s search. Hash issue as observed by Milos seemed quite plausible to me.

Interesting, and my advice would be to not use AlphaZero openings, they seem to be chosen not quite fairly against SF8.
I don't know what the hell is going on with the amount of nodes, if it's the fault of the engine or Arena, but not mine.

"Hash issue as observed by Milos seemed quite plausible to me." I have shown that repeating the games doesn't help much lc0, as accidentaly has happened in my last test. You can see from the above score

Lc01711261 - Stockfish_8_x64_bmi2 : 5,5/20 0-9-11 (=0 0= =0 =0 =0 == 00 =0 == 0=) 28%
Stockfish_8_x64_bmi2 - Lc01711261 : 14,5/20 9-0-11 (=1 1= =1 =1 =1 == 11 =1 == 1=) 73%

First two games Lc0 0.5-Sto8 1.5, were replayed in games 3-4 with the same result Lc0 0.5-Sto8 1.5, but different colours,
Games 5-6, Lc0 0.5-Sto8 1.5, replayed in games 7-8 with the same result Lc0 0.5-Sto8 1.5and colours,
Games 9-10, Lc0 0.5-Sto8 1.5, replayed in games 11-12 Lc0 1-Sto8 1, here lc0 improves,
Games 13-14, Lc0 0-Sto8 2, replayed in games 15-16 Lc0 0.5-Sto8 1.5, here lc0 improves,
Games 17-18, Lc0 1-Sto8 1, replayed in games 19-20 Lc0 0.5-Sto8 1.5, here Stockfish8 improves.

"The conclusion would be: AlphaZero team choose on purpose openings suitable to AlphaZero to boost significantly the performance, by more than 100 Elo points."

No, the conclusion is that some initial positions lead more probably than others to tactical play, where lc0 fails. Until now, we don't know about the tactical abilities of AlphaZero. We must see the complete 100 games and the 1200 with the other initial positions. These 12 initial positions were chosen because they were the most played, as they said in their paper:

"12 most popular human openings (played more than 100,000 times in an online database)"

From my experience with lc0, if you put complicated tactical initial positions the better Stochfish scores, if you put starting position or quiet positional position the better lc0 scores.
The fact is that playing with provided by DeepMind team 12 positions, Lc0 scores way better against SF8 than when playing from usual opening suites. I do think that A0 using the same MCTS with pure averaging was often weak tactically, maybe not as bad as Lc0 though. I do think that provided even a short, diverse 3-mover set of reasonable opening positions, A0 would have fared significantly worse against SF8. They used these 12 positions for 1200 games match and 1 standard opening position for 100 games match. The results of both might be very skewed, this is usually called "bad practice" when measuring the strength. 100 games from the same position? They are highly correlated, no way to avoid it, parallelization randomization whatever.
I was surprised to find 100+ Elo points difference in our results, and now I am leaning to the opinion that a book should have been given to SF8, even if a general one, like performance.bin. Never mind sharp, deep books of Playchess engine room, they would probably massacre Lc0, and be very dangerous even for A0.
So, I don't know if they intentionally choose advantageous for A0 opening positions, but the fact is that regular testing shows a significantly worse performance of Lc0 than the performance from their positions.

leela is official(?) better than sf9??

Positionally probably, tactically of course not and globally either.

But it is very interesting to see how with its low NPS, lc0 is able to stop Stockfish deep search in some games.

Recommendation for immature Stockfish fanboys, they just have to look for balanced positions with a lot of tactic so that Stockfish scores much more than lc0, even 100% I can assure.

Now I am testing lc011261 against Stockfish 8, same conditions as before, starting from the 10 Nunn positions

After 12 games of the 20, the score is

Engine Score Lc
1: Stockfish_8_x64_bmi2 8,0/12 ===1==1101=1
2: Lc01711261 4,0/12 ············

You can see some games at youtube:

https://youtu.be/3fogVPj0RRk

https://youtu.be/Fh-QF07oiXQ



Javier Ros
Posts: 181
Joined: Fri Oct 12, 2012 10:48 am
Location: Seville (SPAIN)
Full name: Javier Ros

Re: leela is official(?) better than sf9

Post by Javier Ros » Thu Sep 13, 2018 10:48 pm

I have repeated the test using the 10 Nunn positions, this time with tablebases support, 5 men syzygy for both engines.

Lc0 version 0.17 with net 11261 on Asus Geforce GTX 1060 3 Gb, ponder Off, Hash tables 1 Gb
Stockfish 9 with 6 threads on i7 4790S (4 cores, 8 threads), ponder Off, Hash tables 1 Gb.
Game in 5 minutes plus 3 sec

Stockfish_9_x64_syzygy - Lc01711261syzygy : 11,5/20 5-2-13 (===1010==1===1===1==) 58%

Lc01711261syzygy - Stockfish_9_x64_syzygy : 8,5/20 2-5-13 (===0101==0===0===0==) 43%

First 12 games, next 12 in following message.

The love relationship between a chess engine tester and his computer can be summarized in one sentence:
Until heat do us part.

Javier Ros
Posts: 181
Joined: Fri Oct 12, 2012 10:48 am
Location: Seville (SPAIN)
Full name: Javier Ros

Re: leela is official(?) better than sf9

Post by Javier Ros » Thu Sep 13, 2018 10:52 pm

Games 13-20

The love relationship between a chess engine tester and his computer can be summarized in one sentence:
Until heat do us part.

Javier Ros
Posts: 181
Joined: Fri Oct 12, 2012 10:48 am
Location: Seville (SPAIN)
Full name: Javier Ros

Re: leela is official(?) better than sf9

Post by Javier Ros » Fri Sep 14, 2018 1:50 pm

I have done the test using the 12 AlphaZero positions, this time with tablebases support, 5 men syzygy for both engines.

Lc0 version 0.17 with net 11261 on Asus Geforce GTX 1060 3 Gb, ponder Off, Hash tables 1 Gb
Stockfish 9 with 6 threads on i7 4790S (4 cores, 8 threads), ponder Off, Hash tables 1 Gb.
Game in 5 minutes plus 3 sec

Stockfish_9_x64_syzygy - Lc01711261syzygy : 14,5/24 6-1-17 (=11===1==10=========11==) 60%

Lc01711261syzygy - Stockfish_9_x64_syzygy : 9,5/24 1-6-17 (=00===0==01=========00==) 40%

If we compare with the test of 10 Nunn positions where lc0 got 43%, one may think that lc0 has obtained better score with the more complex positions against my theory, but one must take into account how complex Leela Chess Zero is and that predicting its tactical failures is not an easy task because it depends on the randomness too.

On the other hand, I'm currently running another test with the 30 Noomen Sharp Gambit 2015 positions

http://rebel13.nl/download/books.html

in order to clarify my hypothesis, that the greater the tactical complexity of the starting positions, the lower the performance of lc0.

After 27 games played of 120, the score is

Stockfish_9_x64_bmi2 - Lc01711261 : 20,0/27 15-2-10 (=1===11==111=101==10111111=) 74% +182
Lc01711261 - Stockfish_9_x64_bmi2 : 7,0/27 2-15-10 (=0===00==000=010==01000000=) 26% -182

Although the test has not finished, this 74% is absolutely clear!!

First 15 games, next 12 in following message.

Last edited by Javier Ros on Fri Sep 14, 2018 1:55 pm, edited 1 time in total.
The love relationship between a chess engine tester and his computer can be summarized in one sentence:
Until heat do us part.

Javier Ros
Posts: 181
Joined: Fri Oct 12, 2012 10:48 am
Location: Seville (SPAIN)
Full name: Javier Ros

Re: leela is official(?) better than sf9

Post by Javier Ros » Fri Sep 14, 2018 1:52 pm

Games 16-27 of the above message.


The love relationship between a chess engine tester and his computer can be summarized in one sentence:
Until heat do us part.

Javier Ros
Posts: 181
Joined: Fri Oct 12, 2012 10:48 am
Location: Seville (SPAIN)
Full name: Javier Ros

Re: leela is official(?) better than sf9

Post by Javier Ros » Fri Sep 14, 2018 10:21 pm

Score after 58 games played

Stockfish_9_x64_bmi2 - Lc01711261 : 40,0/58 25-3-30 (=1===11==111=101==10111111=0=1=======1=1=======1=1=1=1=111) 69% +139
Lc01711261 - Stockfish_9_x64_bmi2 : 18,0/58 3-25-30 (=0===00==000=010==01000000=1=0=======0=0=======0=0=0=0=000) 31% -139


Games 28-46

The love relationship between a chess engine tester and his computer can be summarized in one sentence:
Until heat do us part.

Javier Ros
Posts: 181
Joined: Fri Oct 12, 2012 10:48 am
Location: Seville (SPAIN)
Full name: Javier Ros

Re: leela is official(?) better than sf9

Post by Javier Ros » Sat Sep 15, 2018 9:28 am

Laskos wrote:
Wed Sep 12, 2018 7:36 pm
Javier Ros wrote:
Wed Sep 12, 2018 6:38 pm
Laskos wrote:
Wed Sep 12, 2018 6:02 pm

I don't understand the discrepancy. Are you sure your SF9 uses 6 threads and not 1? This result would indicate 1 thread is used, but maybe I am doing something very wrong. OTOH, CCCC result, which I expected to come close to my result, because the CPU effective speed-up is about 8 compared to my 4 i7 cores, and GPU speed-up is again about 8 compared to to my GTX 1060, shows, like in my case, that Lc0 is the level of Fire 7.1. I use adjudication only for very long games, above 100 moves with evals in [-20cp, 20cp] range (usually very drawish long endgames). I use no win adjudication.
http://talkchess.com/forum3/download/file.php?id=97

I am sure Stockfish uses 6 threads of the 8 threads (4 cores), you can see the 74% in the task manager for Stockfish 9 and 0% for lc0 due to Ponder Off and also the NPS counter is 8.7 million.

I use adjudication by Arena at -9.00

I have read in this link that the net 11120 is better than the rest of 11xxx and I am testing it right now. At this moment Stockfish 6.5-lc017 11120 6.5

https://groups.google.com/forum/#!topic ... IUjoNZAxVw

Version NN ELO Perf W D L Fire Komodo SF Score Games % Fire % Komodo % SF % Total
0.16 600 3428 56 186 118 61 48.5 39.5 149 360 50.83 40.42 32.92 41.39
0.16 643 3428 56 186 118 58.5 52 38.5 149 360 48.75 43.33 32.08 41.39
0.16 809 3428 56 188 116 56 51.5 42.5 150 360 46.67 42.92 35.42 41.67
0.16 695 3435 69 169 122 66 45.5 42 153.5 360 55 37.92 35 42.64
0.16 776 3435 61 187 112 59.5 54 41 154.5 360 49.58 45 34.17 42.92
0.16 928 3442 67 179 114 61.5 54.5 40.5 156.5 360 51.25 45.42 33.75 43.47
0.16 840 3456 64 198 98 67 53.5 42.5 163 360 55.83 44.58 35.42 45.28
0.17 1066 3463 77 184 99 70.5 51 47.5 169 360 58.75 42.5 39.58 46.94
0.17 1120 3470 77 189 94 68.5 54 49 171.5 360 57.08 45 40.83 47.64
0.17 1186 3449 68 183 109 59 57 43.5 159.5 360 49.17 47.5 36.25 44.31
In this Arena output, the third row is the nodes searched, and the fourth is the speed? If yes, then why after 1 second you have 36,000 nodes searched with 2,600 nodes per second? Second thing, my NPS both in Arena (3.51) and from command line at 1 second are about 2,000 and increasing slowly to 4,000 in similar position to 30s. My NPS are increasing pretty slowly, is it the case with your NPS too? Maybe I have some issues with drivers or CUDA/CUDNN software? I refreshed them one month ago or so.
I have done another experiment with lc0 ver. 0.17 11261 on another computer with GTX 1070, i7 7700H 2.8 GHz and Cutechess Gui, trying to observe the NPS and total amount of nodes of lc0 as you said.
As you can see in the following videos

https://youtu.be/nJdAT9w4LVI

lc0 with black, in the first move the node counter starts from 0, in the second move starts from 8683, in the third move from 13790, in the 4th from 5923, in the 5th from 19525 ...

https://youtu.be/xmPkY3mdVKY
another example with the same behaviour

I think this is due to the cache of lc0 that, in my case, has the default values of the installation.
Perhaps you have something wrong with the installation of lc0 or the CUDA drivers, I have installed them only once time and never updated them.
The love relationship between a chess engine tester and his computer can be summarized in one sentence:
Until heat do us part.

Post Reply