Stockfish vs. Lc0: IMHO disappointing result for Lc0

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Stockfish vs. Lc0: IMHO disappointing result for Lc0

Post by mvanthoor »

Today I ran a short 20-game match between Stockfish 10 and Lc0. Specs of the match:

Stockfish 10 x64 BMI2 on Intel i7-6700K, 4 threads, 8GB hashtable
Lc0 0.21.3, w42850 on GTX 1070. 4 threads, everything else default.
Syzygy 5 men tablebase, 8 move Performance.bin opening book.
Adjucation by GUI bo overwhelming material advantage or Syzygy when win/draw/loss in endgame.

The result was +4 -1 =15 in favor of Stockfish 10.

To be honest, after all the hype surrounding Lc0, I find the result to be disappointing. I'd expected the result to be the reverse, to be honest.When looking into networks, I found https://www.sp-cc.de/lc0-testing.htm, and the network I used is stronger than the ones used there (+60 ELO).

I haven't looked into things such as Leela Ratio or anything yet. I'm not trying to match one engine against another on the same hardware or anything: I wanted to know: how much stronger or weaker is Lc0, running on a GTX 1070, compared to Stockfish running to the specifications of CCRL 40/4?

I ran the match at a time control of 40 moves in 85 seconds as, on my computer, that is the setting to use for CCRL 40/4. In CCRL 40/4. I wanted to know where a full power Lc0 on GTX1070 would fall in the CCRL 40/4 list. Stockfish has a rating of 3547, and the result of +4-1=15 shows a rating advantage of +52 of Stockfish over Lc0, setting Lc0 at 3495. That is only 6 points above the rating of 3486 which Lc0 attains in the CCRL 40/4 list (al be it with a different network), despite the GTX 1070 being a much more powerful card that the GTX 1050. That seems disappointing.

Also, the games are not very interesting. Often, after 30-35 moves or so, everything has been traded down to an endgame. Also, it's often Stockfish preventing a draw by threefold repetition (because of the default contempt probably), and even so, many games ended in threefold repetition. In some games, Leela makes exceedingly weird moves, and lost game 1 in 21 moves because of a blunder. With regard to Stockfish, I can mostly understand what it's trying to do with a move, but with Lc0, I'm often left guessing. Because Lc0 "only" searches 10K nodes or so in the endgame, while Stockfish is often already into the 10+ million, Stockfish reaches the endgame database much faster. I often see Leela struggling to look beyond 12 ply or so, while Stockfish is soaring into the 40 ply range, reaching the endgame database from the late middle game.

Of course, my expectation wasn't for Lc0 to blow Stockfish out of the water with a 20-0 result, but I did expect it to win with a +2 score or so. Could/should I be using a different network (I've seen some networks that were smaller, faster, and had a higher ELO-rating than the 42850 I used)? Are my expectations wrong, and is a GTX 1070 just not powerful enough?

I don't play a lot of games. I always pick a midrange card; in this case I picked the GTX 1070 in 2016, because of The Witcher 3, but if I don't acquire a newer game that needs a lot more power, this card is likely to also be in my next computer. I do need/use a lot of CPU-power for some of my tasks, so the 6700K will probably be replaced by a 12 core machine, at least. If Stockfish already wins by +4-1, running on an old i7-6700K against Lc0 on a GTX 1070, I shudder to think how it would decimate Lc0 @ GTX 1070 when running on one of the new Zen3 CPU's with 12 or 16 cores if I should get a new computer (but not a new graphics card).

PS: I found the JH.T6.532 net used for the CCRL rating. I'll rerun a longer test. The match will be run at 40 moves in 85 seconds repeating to comply with the CCRL 40/4 list, and Lc0 0.21.3 JH.T6.532 will run full-out on a GTX 1070. That should give an approximation of Elo-difference between the GTX 1050 and 1070, at least for this particular net.

PS2: I have also put the current list into a spreadsheet with a filter on the Elo-field. The strongest network I've found is 10968, from august 22, 2018 (so it's an old one... how can it be so strong? Were all the other networks much weaker... meaning that old network Elo can't be related to new networks? I'll run a test using that network as well.)
Last edited by mvanthoor on Sat Jul 27, 2019 10:24 pm, edited 1 time in total.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
Chessqueen
Posts: 5577
Joined: Wed Sep 05, 2018 2:16 am
Location: Moving
Full name: Jorge Picado

Re: Stockfish vs. Lc0: IMHO disappointing result for Lc0

Post by Chessqueen »

mvanthoor wrote: Sat Jul 27, 2019 10:06 pm Today I ran a short 20-game match between Stockfish 10 and Lc0. Specs of the match:

Stockfish 10 x64 BMI2 on Intel i7-6700K, 4 threads, 8GB hashtable
Lc0 0.21.3, w42850 on GTX 1070. 4 threads, everything else default.
Syzygy 5 men tablebase, 8 move Performance.bin opening book.
Adjucation by GUI bo overwhelming material advantage or Syzygy when win/draw/loss in endgame.

The result was +4 -1 =15 in favor of Stockfish 10.

To be honest, after all the hype surrounding Lc0, I find the result to be disappointing. I'd expected the result to be the reverse, to be honest.When looking into networks, I found https://www.sp-cc.de/lc0-testing.htm, and the network I used is stronger than the ones used there (+60 ELO).

I haven't looked into things such as Leela Ratio or anything yet. I'm not trying to match one engine against another on the same hardware or anything: I wanted to know: how much stronger or weaker is Lc0, running on a GTX 1070, compared to Stockfish running to the specifications of CCRL 40/4?

I ran the match at a time control of 40 moves in 85 seconds as, on my computer, that is the setting to use for CCRL 40/4. In CCRL 40/4. I wanted to know where a full power Lc0 on GTX1070 would fall in the CCRL 40/4 list. Stockfish has a rating of 3547, and the result of +4-1=15 shows a rating advantage of +52 of Stockfish over Lc0, setting Lc0 at 3495. That is only 6 points above the rating of 3486 which Lc0 attains in the CCRL 40/4 list (al be it with a different network), despite the GTX 1070 being a much more powerful card that the GTX 1050. That seems disappointing.

Also, the games are not very interesting. Often, after 30-35 moves or so, everything has been traded down to an endgame. Also, it's often Stockfish preventing a draw by threefold repetition (because of the default contempt probably), and even so, many games ended in threefold repetition. In some games, Leela makes exceedingly weird moves, and lost game 1 in 21 moves because of a blunder. With regard to Stockfish, I can mostly understand what it's trying to do with a move, but with Lc0, I'm often left guessing. Because Lc0 "only" searches 10K nodes or so in the endgame, while Stockfish is often already into the 10+ million, Stockfish reaches the endgame database much faster. I often see Leela struggling to look beyond 12 ply or so, while Stockfish is soaring into the 40 ply range, reaching the endgame database from the late middle game.

Of course, my expectation wasn't for Lc0 to blow Stockfish out of the water with a 20-0 result, but I did expect it to win with a +2 score or so. Could/should I be using a different network (I've seen some networks that were smaller, faster, and had a higher ELO-rating than the 42850 I used)? Are my expectations wrong, and is a GTX 1070 just not powerful enough?

I don't play a lot of games. I always pick a midrange card; in this case I picked the GTX 1070 in 2016, because of The Witcher 3, but if I don't acquire a newer game that needs a lot more power, this card is likely to also be in my next computer. I do need/use a lot of CPU-power for some of my tasks, so the 6700K will probably be replaced by a 12 core machine, at least. If Stockfish already wins by +4-1, running on an old i7-6700K against Lc0 on a GTX 1070, I shudder to think how it would decimate Lc0 @ GTX 1070 when running on one of the new Zen3 CPU's with 12 or 16 cores if I should get a new computer (but not a new graphics card).
You are using an inferior hardware for LCO, the GTX 1070 is too slow in comparison to what Stockfish was using in your test, either you get a RTX 2080 Ti or a slower hardware for Stockfish in order for the match to be even :shock:
Do NOT worry and be happy, we all live a short life :roll:
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Stockfish vs. Lc0: IMHO disappointing result for Lc0

Post by mvanthoor »

Chessqueen wrote: Sat Jul 27, 2019 10:19 pm You are using an inferior hardware for LCO, the GTX 1070. is too slow in comparison to what Stockfish was using in your test, either you get a RTX 2080 Ti or a slower hardware for Stockfish in order for the match to be even :shock:
Truly? So you recommend I get a €1200 graphics card, just to be able to make Lc0 capable of defeating Stockfish 10 running on a currently 4 year old quadcore? That sounds incredibly non-economical. Maybe a GTX 1080 Ti (which is about twice as fast as the GTX 1070 if I remember correctly) would put it on par with the old 6700K, but even though it's older, that GPU still runs €350-400 in the second-hand market.

As I said: I don't game a lot. I only buy a new graphics card if there's a new game I really, really want to play in top quality, such as the Witcher 3. As long as nothing else I want to play comes along, my graphics card will be in the next computer as long as possible. (I've had my GTX 560 Ti for 6 years, and if The Witcher 3 hadn't been released, I would have still had it today.)

I *do* have a use for a lot of CPU power though (at least sometimes), so me getting a 12 or even 16 core CPU in two years or so is highly likely. I'll have to find a chess GUI in which I can set different time controls for each engine to match the CCRL-settings, and then test Stockfish 11 or 12 against 10. Version 10 will be obliterated. Lc0, running on the GTX 1070, won't probably even be able to draw a single game.

If I need a €1200+ GPU to make NN engine a match for an "old" A/B engine running on a €350-500 CPU, I'll be sticking with the A/B-engines for some time to come.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
Chessqueen
Posts: 5577
Joined: Wed Sep 05, 2018 2:16 am
Location: Moving
Full name: Jorge Picado

Re: Stockfish vs. Lc0: IMHO disappointing result for Lc0

Post by Chessqueen »

mvanthoor wrote: Sat Jul 27, 2019 10:40 pm
Chessqueen wrote: Sat Jul 27, 2019 10:19 pm You are using an inferior hardware for LCO, the GTX 1070. is too slow in comparison to what Stockfish was using in your test, either you get a RTX 2080 Ti or a slower hardware for Stockfish in order for the match to be even :shock:
Truly? So you recommend I get a €1200 graphics card, just to be able to make Lc0 capable of defeating Stockfish 10 running on a currently 4 year old quadcore? That sounds incredibly non-economical. Maybe a GTX 1080 Ti (which is about twice as fast as the GTX 1070 if I remember correctly) would put it on par with the old 6700K, but even though it's older, that GPU still runs €350-400 in the second-hand market.
So you went and got yourself a superior hardware to run Stockfish versus LCO and you probably spent as much or more to get that hardware that your stockfish is using, for what reason to show that Stockfish is better than LC0 when we all know what LCO with RTX 2080 Ti running in TCEC did against Stockfish in the SF15, where both hardware were of similar or close to being similar. Sorry but you clearly have a mismatch of GPU vs CPU. :mrgreen: :shock: :mrgreen:
Do NOT worry and be happy, we all live a short life :roll:
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Stockfish vs. Lc0: IMHO disappointing result for Lc0

Post by mvanthoor »

Chessqueen wrote: Sat Jul 27, 2019 10:48 pm So you went and got yourself a superior hardware to run Stockfish versus LCO and you probably spent as much or more to get that hardware that your stockfish is using, for what reason to show that Stockfish is better than LC0 when we all know what LCO with RTX 2080 Ti running in TCEC did against Stockfish in the SF15, where both hardware were of similar or close to being similar. Sorry but you clearly have a mismatch of GPU vs CPU. :mrgreen: :shock: :mrgreen:
No, man... I didn't get any superior hardware for Stockfish. I buy what I need. In 2016, when I bought this computer, I bought a 6700K (for somewhere around €350) because that CPU had the best price/performance for what I wanted to do at the time. I got a GTX 1070 (for around €425 or so) because that card could run The Witcher 3 at the highest settings, and any newer games for the forseeable future.

In a new computer, I'll obviously get a new CPU, probably a 12 or 16 core (between €350 and €500 at this point in time), but the GTX 1070 graphics card possibly stays if there isn't a newer game I want to play.

As I said, if I'd need to buy a graphics card costing €1200+ to be able to make Lc0 (or any other NN-engine) a match for a €400 CPU, then it's not worth it for me. Then I'll happily stick with the A/B CPU-based engines.

It's not about Stockfish or Lc0 being "better" than the other. I don't care. I use what fits my needs.

If a commercial engine overtakes stockfish by 50 or even 100 Elo, I'll still be using Stockfish because it's open source. If another open source overtakes Stockfish but it doesn't have level settings, I'll probably use it for analysis or blunder checks, but I'll still be using Stockfish to play against because of the level settings.

I'm not a Stockfish lover or Lc0 hater. I had just hoped that Lc0, on *THIS* computer, would be able to show me some of the hype. As it runs on a GTX 1070, I had at least expected it to be significantly stronger than the GTX 1050 used in CCRL 40/4, but it doesn't seem to be.

It thought it to be strange, because the GTX 1070 is about 3 times as fast as the GTX 1050. I hoped that the old "60 ELO extra if you double the speed" still would be true for NN engines, so I'd expected Lc0 to achieve around 3575 Elo in this test. (3486 @ 1050. The 1070 is 3 times as fast, so if this statement had been true, I'd expected Lc0 to gain 90 Elo over the 1050.) I'll test it with the same network tonight to make sure.

It seems the It seems Lc0 needs a much more expensive graphics card to make it overtake Stockfish on even an old CPU, so it makes running Lc0 uneconomical for me. Nothing more, nothing less.
Last edited by mvanthoor on Sat Jul 27, 2019 11:30 pm, edited 1 time in total.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Stockfish vs. Lc0: IMHO disappointing result for Lc0

Post by zullil »

mvanthoor wrote: Sat Jul 27, 2019 11:08 pm

I'm not a Stockfish lover or Lc0 hater. I had just hoped that Lc0, on *THIS* computer, would be able to show me some of the hype. It seems it can't, and I'm somewhat disappointed in that. It probably needs a graphics card much more expensive than I'm willing to pay for, so running Lc0 is not economical for me. Nothing more or less.
First, ignore Chessqueen. Most of us do. :wink:

20 games is hardly definitive. Maybe it was all just chance? If not, let Stockfish run on fewer threads. Don't know enough about lc0 to suggest possible tweaks, but perhaps someone else does.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Stockfish vs. Lc0: IMHO disappointing result for Lc0

Post by mwyoung »

mvanthoor wrote: Sat Jul 27, 2019 11:08 pm
Chessqueen wrote: Sat Jul 27, 2019 10:48 pm So you went and got yourself a superior hardware to run Stockfish versus LCO and you probably spent as much or more to get that hardware that your stockfish is using, for what reason to show that Stockfish is better than LC0 when we all know what LCO with RTX 2080 Ti running in TCEC did against Stockfish in the SF15, where both hardware were of similar or close to being similar. Sorry but you clearly have a mismatch of GPU vs CPU. :mrgreen: :shock: :mrgreen:
No, man... I didn't get any superior hardware for Stockfish. I buy what I need. In 2016, when I bought this computer, I bought a 6700K (for somewhere around €350) because that CPU had the best price/performance for what I wanted to do at the time. I got a GTX 1070 (for around €425 or so) because that card could run The Witcher 3 at the highest settings, and any newer games for the forseeable future.

In a new computer, I'll obviously get a new CPU, probably a 12 or 16 core (between €350 and €500 at this point in time), but the GTX 1070 graphics card possibly stays if there isn't a newer game I want to play.

As I said, if I'd need to buy a graphics card costing €1200+ to be able to make Lc0 (or any other NN-engine) a match for a €400 CPU, then it's not worth it for me. Then I'll happily stick with the A/B CPU-based engines.

It's not about Stockfish or Lc0 being "better" than the other. I don't care. I use what fits my needs.

If a commercial engine overtakes stockfish by 50 or even 100 Elo, I'll still be using Stockfish because it's open source. If another open source overtakes Stockfish but it doesn't have level settings, I'll probably use it for analysis or blunder checks, but I'll still be using Stockfish to play against because of the level settings.

I'm not a Stockfish lover or Lc0 hater. I had just hoped that Lc0, on *THIS* computer, would be able to show me some of the hype. It seems it can't, and I'm somewhat disappointed in that. It probably needs a graphics card much more expensive than I'm willing to pay for, so running Lc0 is not economical for me. Nothing more or less.
You need to remember that Lc0 is not a A/B engine. Or a simple way you can say Lc0 understands positional play first and tactics last. A/B engines and Stockfish understand tactics very fast, and use the tactics to increase positional understanding.

So the programs scale much differently.

On my system for example 2950x(16 cores) for Stockfish latest versions, and a 2080 ti for Lc0. Stockfish will crush Lc0 at short time controls. At about 5m/40 moves Lc0 is equal to Stockfish, and crushes all other A/B engines. As the time control becomes longer Lc0 superiority increases. Is there a point where Lc0 stops scaling better then A/B engines?? I have not seen this yet.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Stockfish vs. Lc0: IMHO disappointing result for Lc0

Post by mvanthoor »

zullil wrote: Sat Jul 27, 2019 11:24 pm First, ignore Chessqueen. Most of us do. :wink:
Oh. OK.
20 games is hardly definitive. Maybe it was all just chance? If not, let Stockfish run on fewer threads. Don't know enough about lc0 to suggest possible tweaks, but perhaps someone else does.
No, 20 games isn't definitive. Lc0 didn't play horribly bad (except for the enormous blunder in game 1) and it didn't get killed outright. If I'd run a thousand games, the result might be closer. Maybe I'll get to that at some point (but I need this computer, so I'll need to do it in parts.)

This computer runs the Crafty 19.17 benchmark in 17 seconds, which means that for CCRL 40/4, the setting needs to be 40 moves in 85 seconds. Running the matches at this time control would (should) ensure that Stockfish, on the 6700K, plays around the 3547 CCRL 40/4 rating.

What I want to know is how strong Lc0 @ GTX 1070 is, against Stockfish @ CCRL-equivalent, so I'm not interested in trying to equalize hardware between the two (*).

I'll test it using the CCRL 40/4 net JH.T6.532, so I can see if there's a significant difference between the GTX 1050 and 1070.

(*)
How would I equalize hardware anyway?

NPS is not reliable. If I run SF10 on 7 threads (using three extra hyperthreading threads), it'll achieve a much higher NPS score than when I run it at 4 threads, but the Elo difference is within the margin of error; it's only +5 Elo, if that.

I can't go and limit Stockfish to 5600 NPS (which is Lc0's speed on the GTX 1070), because Lc0 will kill it. Even on the old 6700K SF's speed runs into the millions of NPS in the starting position.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Stockfish vs. Lc0: IMHO disappointing result for Lc0

Post by mvanthoor »

mwyoung wrote: Sat Jul 27, 2019 11:38 pm You need to remember that Lc0 is not a A/B engine. Or a simple way you can say Lc0 understands positional play first and tactics last. A/B engines and Stockfish understand tactics very fast, and use the tactics to increase positional understanding.
I know. It *was* baffling to see Lc0 at least holding up to Stockfish most of the time with only 5-6K NPS against Stockfish's millions of nodes per second.

It was a HIARCS deja-vu. That program was the same. (When version 14 was still competitive.) It was a lot slower than any competitor, often only achieving 25% or even less NPS than the fastest programs, but it was still up there in the rating lists at the time.
So the programs scale much differently.

On my system for example 2950x(16 cores) for Stockfish latest versions, and a 2080 ti for Lc0. Stockfish will crush Lc0 at short time controls. At about 5m/40 moves Lc0 is equal to Stockfish, and crushes all other A/B engines. As the time control becomes longer Lc0 superiority increases. Is there a point where Lc0 stops scaling better then A/B engines?? I have not seen this yet.
Yes. I was expecting as much.

It's the same with old table-top chess computers against humans. If you played them at short time controls, they'd crush you, because they'd NEVER miss a tactic, or NEVER fail to punish a tactical mistake. (Not even in the 90's, especially if the tactic was less than 6-7 moves deep). As time control increased, and you (as a human) could look into stuff those computers didn't know much about, such as weak squares, bad pawn structure, bad bishops and so on, your chance of defeating them increased dramatically.

I have an old CXG Spinx Titan lying around here. When I set it at a 5 second time control and then play "honestly" and move within 5 seconds myself, I'll still lose, and that thing is 30 years old. If I set it at a 60 seconds time control, I will always win (even if I move within 60 seconds every time).

It was the same with HIARCS; with the longer time controls, there were diminishing returns for the faster engines; they couldn't go any deeper, or maybe just one move. That wasn't enough to outplay HIARCS's positional understanding.

It could be the same with Lc0. I'll have to test a longer time control some day. Probably CCRL 40/40. Even so, analysis are often done at faster time controls; maybe 10 seconds per move or thereabouts. That would mean that, at this point in time, the A/B-engines are better/more economical (with regard to hardware costs) than NN-engines.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
jp
Posts: 1470
Joined: Mon Apr 23, 2018 7:54 am

Re: Stockfish vs. Lc0: IMHO disappointing result for Lc0

Post by jp »

mvanthoor wrote: Sat Jul 27, 2019 11:55 pm
mwyoung wrote: Sat Jul 27, 2019 11:38 pm You need to remember that Lc0 is not a A/B engine. Or a simple way you can say Lc0 understands positional play first and tactics last. A/B engines and Stockfish understand tactics very fast, and use the tactics to increase positional understanding.
I know. It *was* baffling to see Lc0 at least holding up to Stockfish most of the time with only 5-6K NPS against Stockfish's millions of nodes per second.
nps on its own is a very bad measure of how much computing they are doing, even if it meant the same thing. It's even worse because others here have pointed out that Leela's definition of "node" is different from the traditional engine's.