Stockfish vs. Lc0: IMHO disappointing result for Lc0

supersharp77 · Post by **supersharp77** » Mon Jul 29, 2019 6:27 am

Guenther wrote: ↑Sun Jul 28, 2019 9:20 am
mvanthoor wrote: ↑Sat Jul 27, 2019 10:06 pm Today I ran a short 20-game match between Stockfish 10 and Lc0. Specs of the match:

Stockfish 10 x64 BMI2 on Intel i7-6700K, 4 threads, 8GB hashtable
Lc0 0.21.3, w42850 on GTX 1070. 4 threads, everything else default.
Syzygy 5 men tablebase, 8 move Performance.bin opening book.
Adjucation by GUI bo overwhelming material advantage or Syzygy when win/draw/loss in endgame.

...

In some games, Leela makes exceedingly weird moves, and lost game 1 in 21 moves because of a blunder

...
Are the games available somewhere and also the other 20 games mentioned later in this thread?
(If possible old pgn format w/o gimmicks just eval/depth time)
I would like to look at them.

Wow...Took a quick look and games seemed to me to be of a very low quality...many of the evaluations seemed to be not correct...(probably due to very fast time control) reviewed games pgn here..

http://s000.tinyupload.com/?file_id=027 ... 6221434746

Here's one of the games...let me know what you think of this "beauty"

[pgn][Event "SF10x64_Lc0-0.21.3-JH.T6.532"]
[Site "Roermond"]
[Date "2019.07.28"]
[Round "5"]
[White "Stockfish 10 x64 4T20"]
[Black "Lc0 0.21.3_JH.T6.532"]
[Result "1-0"]
[ECO "B97"]
[WhiteElo "2800"]
[BlackElo "2800"]
[Annotator "Ross/SF SE3"]
[PlyCount "55"]
[EventDate "2019.??.??"]
[TimeControl "40/85:0"]

1. e4 {book} c5 {book} 2. Nf3 {book} d6 {book} 3. d4 {book} cxd4 {book} 4. Nxd4
{book} Nf6 {book} 5. Nc3 {book} a6 {book} 6. Bg5 {book} e6 {book} 7. f4 {book}
Qb6 {book} 8. Qd2 {book} Qxb2 {book} 9. Rb1 {+0.49/22 3.9s} Qa3 {-0.02/10 1.3s}
10. Bxf6 {+0.19/23 5.3s} gxf6 {0.00/11 1.6s} 11. Be2 {+0.04/21 1.2s} Nc6 {
+0.05/11 1.6s} 12. Nxc6 {0.00/22 1.7s} bxc6 {+0.04/10 1.4s} 13. O-O {
+0.09/20 1.0s} Be7 {+0.11/9 4.7s} 14. Kh1 {+0.01/22 1.4s} Qa5 {+0.15/8 1.2s}
15. Qd3 {0.00/23 6.2s} Ra7 {+0.26/8 4.4s} (15... h5 $1) (15... Qc7) (15... d5
$5 16. f5 $5 Ra7 $13) 16. Rb8 {-0.19/24 4.0s} Qc7 {+0.42/9 2.5s} 17. Rfb1 {
0.00/23 0.65s} O-O {+0.46/10 1.6s} 18. Qh3 {0.00/23 1.4s} Kh8 {+0.59/10 2.0s}
19. Bd3 {0.00/23 1.6s} c5 {+0.57/13 2.4s} 20. e5 {+1.93/25 2.0s} f5 {
-0.36/11 3.2s} 21. Nd5 $5 $14 {+2.03/25 0.65s} Qd7 {-0.29/11 2.2s} 22. R1b6 $6
{+2.44/24 1.1s} (22. Ne3 dxe5) 22... c4 {-2.11/9 4.8s} 23. Bxf5 {+4.40/25 3.2s}
exf5 {-1.13/8 2.4s} 24. Rxd6 $1 {+4.59/24 0.66s} Rb7 $4 {-2.38/7 3.6s} (24...
Rg8 25. Rxd7 Rxd7 26. Qf3 $16) (24... f6 $5 25. Rxd7 Rxd7 26. Ne3 fxe5) 25. Rh6
{+M13/46 0.91s} Kg8 {-5.83/5 3.8s} 26. Rxh7 {+M11/55 1.2s} Bh4 {-9.80/3 2.5s}
27. Qg3+ {+M9/63 1.4s} Bxg3 {-128.00/2 2.6s} 28. Nf6# {+M1/127 0.002s, White
mates...One of the worse engine games (top level) I've seen in a while...} 1-0
[/pgn]

jp · Post by jp » Mon Jul 29, 2019 7:45 am

supersharp77 wrote: ↑Mon Jul 29, 2019 6:27 am Here's one of the games...let me know what you think of this "beauty"

There are some TCEC games where Leela loses to SF in a similar style.

Fulvio · Post by **Fulvio** » Mon Jul 29, 2019 8:29 am

supersharp77 wrote: ↑Mon Jul 29, 2019 6:27 am Here's one of the games...let me know what you think of this "beauty"

I don't understand the hate.
It's a beautiful game, original, interesting and with a spectacular ending.
Lc0 just blundered with 19.. c5 due to the slow GPU (it created the d6 weakness, which was exploited with the incredible maneuver to transfer the tower to h6 with Rb6 and Rxd6).

Ovyron · Post by **Ovyron** » Mon Jul 29, 2019 3:35 pm

mvanthoor wrote: ↑Mon Jul 29, 2019 12:39 am Then I'll run an Lc0 vs. SF10 match with 40/4 for both of them

Good. Just remember that if you want to see Leela reach/surpass Stockfish level, you'd want to play ten times less games, but with ten times more the time on the clock.

Lest, you're just going to do all that to get another disappointing Leela result, because without playing any games, everyone knows it sucks at blitz (another way to not be disappointed is knowing that Leela is going to lose in advance, because of the time control, and nothing else).

Modern Times · Post by **Modern Times** » Mon Jul 29, 2019 4:38 pm

Lc0 is not at its best at blitz, and I've certainly seen bad moves that it simply wouldn't have played with a longer time control or on a faster GPU. Still, that statement probably applies to some A/B engines as well. As long as the testing conditions are the same for each - and that is a big "if" given that hardware equivalence is in my opinion impossible given the radically different architectures of GPU and CPU - then the results are valid.

I do believe though that Lc0 benefits hugely from stronger hardware, so if you want it to play at an extremely high level (i.e. superiority over Stockfish) you need at least an RTX 2060 from what I've seen from various postings and results.

Ovyron · Post by **Ovyron** » Mon Jul 29, 2019 4:54 pm

Modern Times wrote: ↑Mon Jul 29, 2019 4:38 pm As long as the testing conditions are the same for each - and that is a big "if" given that hardware equivalence is in my opinion impossible given the radically different architectures of GPU and CPU - then the results are valid.

Yes, but mvanthoor is going to spend a lot of time and effort showing how bad lc0 is at blitz, which we already know, so it'll just be time and effort wasted.

The hypothesis is that at some time control, slow enough, Leela catches Stockfish in strength (regardless of hardware!), and if you make it a bit slower, Leela passes Stockfish. But we don't know if this is true or what is this time control (relative to CCRL time control), so if one is going to test something, this might not be a waste of time.

dragontamer5788 · Post by **dragontamer5788** » Mon Jul 29, 2019 5:10 pm

M ANSARI wrote: ↑Sun Jul 28, 2019 10:05 am You have to realize that GPU graphic cards that have the ability to do AI are in the first generation. So if you look at CPU power, this would be sort of like running SF using a single core 386 CPU. Of course a GPU card like 2080Ti is expensive today, but so was a 386 when it first came out. My guess is that GPU's that can do AI will quickly get much more powerful and much cheaper. There is no doubt that AI will transform every thing in our life and maybe it will be a transformation similar to when humanity discovered electricity. Lc0 is only competitive once you have reasonably good hardware. I don't think you need a 2080Ti card for that and most likely the new 2070 Super cards are very competitive with SF. This hardware pricing of cards that can do AI will probably change exponentially with much more powerful cards coming out at a fraction of today's prices. Also Lc0 will probably patch many of its weaknesses (tactical and endgame weakness) via software … remember Lc0 is only a little over a year old.

The "Tensor Cores" of GPUs are an interesting trick, but the fundamental SIMD-architecture of GPUs is well-researched and well-discussed by the graphics community for the last 20 years. Modern GPUs represent the sum of decades of research and development.

Case in point: a 2080 Ti has 11 Trillion-Operations/second worth of compute on 616GB/s main-memory speed. True, the "Tensor-ops" allow 100+ Trillion "16-bit floating point Multiplications" (aka: neural network operations), but that's only useful when you have an algorithm dominated by neural-nets. And frankly, I'm not convinced that the whole tensor-op methodology is working out too well.

Take a nice CPU, like the AMD Ryzen 3950x: 16 cores (with 256-bit AVX2) x 4.7 GHz. AVX2 is 8x 32-bit operations per core x 16 cores x 4.7 GHz == 0.6 Trillion operations/second, on only 50GBps main-memory speed. Actually, most chess engines avoid AVX2, and instead stick with 64-bit operations. If you're using 64-bit bitboards on traditional 64-bit operations, your CPU-algorithm only has access to 0.075 Trillion operations/second.

In any case: a CPU is operating with 10% of the main-memory bandwidth and 1% of the raw compute power. The real questions people need to be asking themselves are:

1: Why are CPUs able to play chess so well, despite the hugely deficient compute and memory resources?
2: Why do you require so many Trillions-of-ops before Neural Nets become useful? Leela-Zero is on a machine capable of performing 100-Trillion operations / second. Shouldn't we be expecting it to perform better?
3: Are there other algorithms to be discovered that can take advantage of the CPU-algorithms, and port them to use the massively improved compute and memory available to GPUs?

-------

GPU-algorithms must take advantage of parallel compute resources. Its hard to think in parallel, especially if you've been doing sequential programming for years. But I analyze the parallel-algorithms in the CPU-world and they simply are insufficient for GPU translation: YBWC, ABDADA, etc. etc. All of these are designed for low-core count machines (maybe 20 or 50 cores), and will fall apart with 4000+ cores of a 2080 TI. Every thread will visit every node in YBWC or ABDADA. In most cases, a "visit" is pining the transposition table and sharing the work done from other threads, but you will run out of main-memory bandwidth very quickly when 4000+ cores are pinging the TT so hard.

Other GPU-programmers have proven that elements of a chess engine can be ported to a GPU. With over 20-Billion nodes/second perft on an ancient 780 Ti GPU, part of the GPU-programming problem has already been solved. The only remaining part is the search algorithm.

supersharp77 · Post by **supersharp77** » Mon Jul 29, 2019 10:37 pm

supersharp77 wrote: ↑Mon Jul 29, 2019 6:27 am
Guenther wrote: ↑Sun Jul 28, 2019 9:20 am
mvanthoor wrote: ↑Sat Jul 27, 2019 10:06 pm Today I ran a short 20-game match between Stockfish 10 and Lc0. Specs of the match:

Stockfish 10 x64 BMI2 on Intel i7-6700K, 4 threads, 8GB hashtable
Lc0 0.21.3, w42850 on GTX 1070. 4 threads, everything else default.
Syzygy 5 men tablebase, 8 move Performance.bin opening book.
Adjucation by GUI bo overwhelming material advantage or Syzygy when win/draw/loss in endgame.

...

In some games, Leela makes exceedingly weird moves, and lost game 1 in 21 moves because of a blunder

...
Are the games available somewhere and also the other 20 games mentioned later in this thread?
(If possible old pgn format w/o gimmicks just eval/depth time)
I would like to look at them.
Wow...Took a quick look and games seemed to me to be of a very low quality...many of the evaluations seemed to be not correct...(probably due to very fast time control) reviewed games pgn here..

http://s000.tinyupload.com/?file_id=027 ... 6221434746

Here's one of the games...let me know what you think of this "beauty"

[pgn][Event "SF10x64_Lc0-0.21.3-JH.T6.532"]
[Site "Roermond"]
[Date "2019.07.28"]
[Round "5"]
[White "Stockfish 10 x64 4T20"]
[Black "Lc0 0.21.3_JH.T6.532"]
[Result "1-0"]
[ECO "B97"]
[WhiteElo "2800"]
[BlackElo "2800"]
[Annotator "Ross/SF SE3"]
[PlyCount "55"]
[EventDate "2019.??.??"]
[TimeControl "40/85:0"]

1. e4 {book} c5 {book} 2. Nf3 {book} d6 {book} 3. d4 {book} cxd4 {book} 4. Nxd4
{book} Nf6 {book} 5. Nc3 {book} a6 {book} 6. Bg5 {book} e6 {book} 7. f4 {book}
Qb6 {book} 8. Qd2 {book} Qxb2 {book} 9. Rb1 {+0.49/22 3.9s} Qa3 {-0.02/10 1.3s}
10. Bxf6 {+0.19/23 5.3s} gxf6 {0.00/11 1.6s} 11. Be2 {+0.04/21 1.2s} Nc6 {
+0.05/11 1.6s} 12. Nxc6 {0.00/22 1.7s} bxc6 {+0.04/10 1.4s} 13. O-O {
+0.09/20 1.0s} Be7 {+0.11/9 4.7s} 14. Kh1 {+0.01/22 1.4s} Qa5 {+0.15/8 1.2s}
15. Qd3 {0.00/23 6.2s} Ra7 {+0.26/8 4.4s} (15... h5 $1) (15... Qc7) (15... d5
$5 16. f5 $5 Ra7 $13) 16. Rb8 {-0.19/24 4.0s} Qc7 {+0.42/9 2.5s} 17. Rfb1 {
0.00/23 0.65s} O-O {+0.46/10 1.6s} 18. Qh3 {0.00/23 1.4s} Kh8 {+0.59/10 2.0s}
19. Bd3 {0.00/23 1.6s} c5 {+0.57/13 2.4s} 20. e5 {+1.93/25 2.0s} f5 {
-0.36/11 3.2s} 21. Nd5 $5 $14 {+2.03/25 0.65s} Qd7 {-0.29/11 2.2s} 22. R1b6 $6
{+2.44/24 1.1s} (22. Ne3 dxe5) 22... c4 {-2.11/9 4.8s} 23. Bxf5 {+4.40/25 3.2s}
exf5 {-1.13/8 2.4s} 24. Rxd6 $1 {+4.59/24 0.66s} Rb7 $4 {-2.38/7 3.6s} (24...
Rg8 25. Rxd7 Rxd7 26. Qf3 $16) (24... f6 $5 25. Rxd7 Rxd7 26. Ne3 fxe5) 25. Rh6
{+M13/46 0.91s} Kg8 {-5.83/5 3.8s} 26. Rxh7 {+M11/55 1.2s} Bh4 {-9.80/3 2.5s}
27. Qg3+ {+M9/63 1.4s} Bxg3 {-128.00/2 2.6s} 28. Nf6# {+M1/127 0.002s, White
mates...One of the worse engine games (top level) I've seen in a while...} 1-0
[/pgn]

Fulvio wrote: ↑Mon Jul 29, 2019 8:29 am
supersharp77 wrote: ↑Mon Jul 29, 2019 6:27 am Here's one of the games...let me know what you think of this "beauty"
It's a *beautiful* game, *original*, *interesting* and with a *spectacular ending*.
Lc0 just blundered with 19.. c5 due to the slow GPU (it created the d6 weakness, which was exploited with the incredible maneuver to transfer the tower to h6 with Rb6 and Rxd6).

Er...Excuse me..First 10+ moves were right out of the opening book..You sound like the Guy on Youtube "Hyping LC0" every day with his blog posts! LMAO...

mvanthoor · Post by **mvanthoor** » Fri Feb 14, 2020 7:50 pm

Ovyron wrote: ↑Mon Jul 29, 2019 4:54 pm
Modern Times wrote: ↑Mon Jul 29, 2019 4:38 pm As long as the testing conditions are the same for each - and that is a big "if" given that hardware equivalence is in my opinion impossible given the radically different architectures of GPU and CPU - then the results are valid.
Yes, but mvanthoor is going to spend a lot of time and effort showing how bad lc0 is at blitz, which we already know, so it'll just be time and effort wasted.

The hypothesis is that at some time control, slow enough, Leela catches Stockfish in strength (regardless of hardware!), and if you make it a bit slower, Leela passes Stockfish. But we don't know if this is true or what is this time control (relative to CCRL time control), so if one is going to test something, this might not be a waste of time.

I haven't put any more time into this. In the meantime, I did get Fat Fritz (or more accurately, I finally upgraded Fritz 11 to get a newer GUI as Fritz 11 is slowly starting to fail on High DPI screens), and tested Fat Fritz in the default configuration against Stockfish 11. I ran this on my laptop, which has a CPU comparable in power to my desktop's CPU, but a MUCH slower Nvidia M2000M Quadro graphics card. In 20 games @ 5 minutes + 10 seconds increment, Fat Fritz lost the match, but only with a score of 9 to 11. It won one game, lost 2, and drew the reset.

Had I run that match on the desktop computer using the more powerful GTX 1070, Fat Fritz might have won.

Two possibilities:
1. I made one or more configuration errors for Leela
2. Comparing gpu neural networks to cpu engines using CCRL time control equivalents is a poor idea.
3. Or both 1 and 2.

Guenther · Post by **Guenther** » Sat Feb 15, 2020 9:11 am

mvanthoor wrote: ↑Fri Feb 14, 2020 7:50 pm ...

I haven't put any more time into this. In the meantime, I did get Fat Fritz (or more accurately, I finally upgraded Fritz 11 to get a newer GUI as Fritz 11 is slowly starting to fail on High DPI screens), and tested Fat Fritz in the default configuration against Stockfish 11. I ran this on my laptop, which has a CPU comparable in power to my desktop's CPU, but a MUCH slower Nvidia M2000M Quadro graphics card. In 20 games @ 5 minutes + 10 seconds increment, Fat Fritz lost the match, but only with a score of 9 to 11. It won one game, lost 2, and drew the reset.

Had I run that match on the desktop computer using the more powerful GTX 1070, Fat Fritz might have won.

Two possibilities:
1. I made one or more configuration errors for Leela
2. Comparing gpu neural networks to cpu engines using CCRL time control equivalents is a poor idea.
3. Or both 1 and 2.

4. Leela and her nets are much stronger than in July 2019...
5. You ran only 20 games in your first test, which was a lottery for the +1 =15 -4 result... (and you did it again in your new test - 20 games is nothing)

The result was +4 -1 =15 in favor of Stockfish 10.

Stockfish vs. Lc0: IMHO disappointing result for Lc0

Re: Stockfish vs. Lc0: IMHO disappointing result for Lc0

Re: Stockfish vs. Lc0: IMHO disappointing result for Lc0

Re: Stockfish vs. Lc0: IMHO disappointing result for Lc0

Re: Stockfish vs. Lc0: IMHO disappointing result for Lc0

Re: Stockfish vs. Lc0: IMHO disappointing result for Lc0

Re: Stockfish vs. Lc0: IMHO disappointing result for Lc0

Re: Stockfish vs. Lc0: IMHO disappointing result for Lc0

Re: Stockfish vs. Lc0: IMHO disappointing result for Lc0

Re: Stockfish vs. Lc0: IMHO disappointing result for Lc0

Re: Stockfish vs. Lc0: IMHO disappointing result for Lc0