Lc0 wins, but Stockfish is still the best?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Javier Ros
Posts: 200
Joined: Fri Oct 12, 2012 12:48 pm
Location: Seville (SPAIN)
Full name: Javier Ros

Re: Lc0 wins, but Stockfish is still the best?

Post by Javier Ros »

lkaufman wrote: Wed Mar 27, 2019 9:45 pm
Laskos wrote: Tue Mar 26, 2019 9:42 am
mwyoung wrote: Tue Mar 26, 2019 6:07 am
This gives Lc0 a huge hardware price advantage. On my testing system. I run a 16 core CPU for $799. At the same time Lc0 plays on a $1500 RTX 2080 ti. Is Lc0 better, yes but Lc0 needs every bit of the 2080 ti. And at almost twice the price to best Stockfish.
You can easily find an RTX 2080ti in $1200-$1250 range. Doesn't your CPU need a pretty large cooler to run consistently, without throttling on some very heavy and long CPU loads? That's another $70-$100 to almost $900 for CPU. GPUs don't need that. Also, RTX 2080ti doesn't seem to be the best deal around as GPU goes. I have my RTX 2070 for $550, 2.7 times cheaper than your expensive RTX 2080ti.

Let's compare apples to apples: Take the net 41687 (each net has its own speed curve, it is the latest I downloaded), put it on infinite from the standard opening position with "go". The engine should be Lc0 v0.21.1. Observe the speed after 10 million nodes. Mine is:

info depth 19 seldepth 57 time 232129 nodes 10456908 score cp 44 hashfull 246 nps 45047

I am curious what your 2.7 factor more expensive RTX 2080ti will show. If the scaling is good, two RTX 2070 would be a bit cheaper than one RTX 2080ti and I bet they would be faster.

As for CPU, I will buy this Summer or Autumn the third generation Ryzen 9 3850X, a 16 core CPU even stronger than your Threadripper 2950X, but for only $500.

All in all, Leela with latest T40 nets is very competitive with SF_dev strength-wise on fairly equal price-wise home-grade hardware.
I tried to replicate your test on my new laptop with RTX 2080. When I run it directly (without GUI) it doesn't use fp 16 and only gets about 12kNPS. So I run it with fp 16 in Fritz GUI on opening position, but still the NPS peaks at about 30knps and drops back to only 22knps by the time it has reached 10 million nodes. Since you got 45knps, shouldn't I get more than that on my 2080? Either I'm doing something wrong, or my GPU is not running full speed. I suppose that being a laptop might make it have to run slower, but I got the model that supposedly ran at the highest speed, not one that was slower but lighter. Any idea which is the explanation, or how I can find out what is wrong?

I think it's due to the thermal regulation of the GPU. You must try a cooling platform, I have one with five fans under my laptop with GTX1070 and I I have managed to lower the temperature of the GPU under 70 degrees so that the thermal regulation does not work.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Lc0 wins, but Stockfish is still the best?

Post by Laskos »

lkaufman wrote: Wed Mar 27, 2019 9:45 pm
Laskos wrote: Tue Mar 26, 2019 9:42 am
mwyoung wrote: Tue Mar 26, 2019 6:07 am
This gives Lc0 a huge hardware price advantage. On my testing system. I run a 16 core CPU for $799. At the same time Lc0 plays on a $1500 RTX 2080 ti. Is Lc0 better, yes but Lc0 needs every bit of the 2080 ti. And at almost twice the price to best Stockfish.
You can easily find an RTX 2080ti in $1200-$1250 range. Doesn't your CPU need a pretty large cooler to run consistently, without throttling on some very heavy and long CPU loads? That's another $70-$100 to almost $900 for CPU. GPUs don't need that. Also, RTX 2080ti doesn't seem to be the best deal around as GPU goes. I have my RTX 2070 for $550, 2.7 times cheaper than your expensive RTX 2080ti.

Let's compare apples to apples: Take the net 41687 (each net has its own speed curve, it is the latest I downloaded), put it on infinite from the standard opening position with "go". The engine should be Lc0 v0.21.1. Observe the speed after 10 million nodes. Mine is:

info depth 19 seldepth 57 time 232129 nodes 10456908 score cp 44 hashfull 246 nps 45047

I am curious what your 2.7 factor more expensive RTX 2080ti will show. If the scaling is good, two RTX 2070 would be a bit cheaper than one RTX 2080ti and I bet they would be faster.

As for CPU, I will buy this Summer or Autumn the third generation Ryzen 9 3850X, a 16 core CPU even stronger than your Threadripper 2950X, but for only $500.

All in all, Leela with latest T40 nets is very competitive with SF_dev strength-wise on fairly equal price-wise home-grade hardware.
I tried to replicate your test on my new laptop with RTX 2080. When I run it directly (without GUI) it doesn't use fp 16 and only gets about 12kNPS. So I run it with fp 16 in Fritz GUI on opening position, but still the NPS peaks at about 30knps and drops back to only 22knps by the time it has reached 10 million nodes. Since you got 45knps, shouldn't I get more than that on my 2080? Either I'm doing something wrong, or my GPU is not running full speed. I suppose that being a laptop might make it have to run slower, but I got the model that supposedly ran at the highest speed, not one that was slower but lighter. Any idea which is the explanation, or how I can find out what is wrong?
Sure GPU throttling (including GPU temperature control), laptops are not made for RTX 2070-2080ti GPUs. Your laptop with full RTX 2080 GPU load and partial CPU load is something like 10-15 house illumination light bulbs with no proper cooling and ventilation. But to start with:

Have your weights_run1_41687.pb.gz in the same folder with Leela engine and all needed dll. Click on Lc0 v021.1, an engine window opens.
Copy and paste the following into that window:

setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 512
setoption name NNCacheSize value 10000000
setoption name WeightsFile value .\weights_run1_41687.pb.gz

press enter, then type 'go' and the again press enter.

Look at the NPS, and I guess because of the throttling its evolution will be different from mine.
As for speeds you reported, both 30knps and 22knps are bad, although I don't know what net you are using. But the fact that it went from 30knps to 22knps shows an almost certain throttling. Your RTX 2080 should in theory be some 20% faster than my 2070.

I saw so many weird speeds on some spreadsheets circulated, often not knowing what net was used and other variables, and often showing so many low speeds, that I suspect many people in fact have problems with their GPUs. I have the latest driver (2 weeks old for my GPU), and installed GPU-Z and MSI Afterburner long ago (from the times of GTX 1060 GPU I had). To have a very stable desktop system is easy. I changed the power supply unit for stronger one, the case for a larger and more open one and put 2 case fans, one larger close to the GPU and one smaller close to CPU. Now my temperatures, both GPU and CPU, never go above 60 degrees Celsius, under any heavy and prolonged load. Case fans alone decreased temperatures by 10 degrees or so. That all was for under $100 refurbishing. I don't know what you could do with the laptop, maybe on heavier and longer runs put it in the fridge :). More seriously, maybe put some fans under it, but it may look funny :). Throttling mangles also the stability of the testing conditions, I hate it, it can give contradictory results.

So, for now, use GPU-Z to check temperatures (above 80 is probably bad), GPU load and Core frequencies (forgot the base for RTX 2080, should be close to 1600 MHz or so).
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Lc0 wins, but Stockfish is still the best?

Post by lkaufman »

Laskos wrote: Wed Mar 27, 2019 10:37 pm
lkaufman wrote: Wed Mar 27, 2019 9:45 pm
Laskos wrote: Tue Mar 26, 2019 9:42 am
mwyoung wrote: Tue Mar 26, 2019 6:07 am
This gives Lc0 a huge hardware price advantage. On my testing system. I run a 16 core CPU for $799. At the same time Lc0 plays on a $1500 RTX 2080 ti. Is Lc0 better, yes but Lc0 needs every bit of the 2080 ti. And at almost twice the price to best Stockfish.
You can easily find an RTX 2080ti in $1200-$1250 range. Doesn't your CPU need a pretty large cooler to run consistently, without throttling on some very heavy and long CPU loads? That's another $70-$100 to almost $900 for CPU. GPUs don't need that. Also, RTX 2080ti doesn't seem to be the best deal around as GPU goes. I have my RTX 2070 for $550, 2.7 times cheaper than your expensive RTX 2080ti.

Let's compare apples to apples: Take the net 41687 (each net has its own speed curve, it is the latest I downloaded), put it on infinite from the standard opening position with "go". The engine should be Lc0 v0.21.1. Observe the speed after 10 million nodes. Mine is:

info depth 19 seldepth 57 time 232129 nodes 10456908 score cp 44 hashfull 246 nps 45047

I am curious what your 2.7 factor more expensive RTX 2080ti will show. If the scaling is good, two RTX 2070 would be a bit cheaper than one RTX 2080ti and I bet they would be faster.

As for CPU, I will buy this Summer or Autumn the third generation Ryzen 9 3850X, a 16 core CPU even stronger than your Threadripper 2950X, but for only $500.

All in all, Leela with latest T40 nets is very competitive with SF_dev strength-wise on fairly equal price-wise home-grade hardware.
I tried to replicate your test on my new laptop with RTX 2080. When I run it directly (without GUI) it doesn't use fp 16 and only gets about 12kNPS. So I run it with fp 16 in Fritz GUI on opening position, but still the NPS peaks at about 30knps and drops back to only 22knps by the time it has reached 10 million nodes. Since you got 45knps, shouldn't I get more than that on my 2080? Either I'm doing something wrong, or my GPU is not running full speed. I suppose that being a laptop might make it have to run slower, but I got the model that supposedly ran at the highest speed, not one that was slower but lighter. Any idea which is the explanation, or how I can find out what is wrong?
Sure GPU throttling (including GPU temperature control), laptops are not made for RTX 2070-2080ti GPUs. Your laptop with full RTX 2080 GPU load and partial CPU load is something like 10-15 house illumination light bulbs with no proper cooling and ventilation. But to start with:

Have your weights_run1_41687.pb.gz in the same folder with Leela engine and all needed dll. Click on Lc0 v021.1, an engine window opens.
Copy and paste the following into that window:

setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 512
setoption name NNCacheSize value 10000000
setoption name WeightsFile value .\weights_run1_41687.pb.gz

press enter, then type 'go' and the again press enter.

Look at the NPS, and I guess because of the throttling its evolution will be different from mine.
As for speeds you reported, both 30knps and 22knps are bad, although I don't know what net you are using. But the fact that it went from 30knps to 22knps shows an almost certain throttling. Your RTX 2080 should in theory be some 20% faster than my 2070.

I saw so many weird speeds on some spreadsheets circulated, often not knowing what net was used and other variables, and often showing so many low speeds, that I suspect many people in fact have problems with their GPUs. I have the latest driver (2 weeks old for my GPU), and installed GPU-Z and MSI Afterburner long ago (from the times of GTX 1060 GPU I had). To have a very stable desktop system is easy. I changed the power supply unit for stronger one, the case for a larger and more open one and put 2 case fans, one larger close to the GPU and one smaller close to CPU. Now my temperatures, both GPU and CPU, never go above 60 degrees Celsius, under any heavy and prolonged load. Case fans alone decreased temperatures by 10 degrees or so. That all was for under $100 refurbishing. I don't know what you could do with the laptop, maybe on heavier and longer runs put it in the fridge :). More seriously, maybe put some fans under it, but it may look funny :). Throttling mangles also the stability of the testing conditions, I hate it, it can give contradictory results.

So, for now, use GPU-Z to check temperatures (above 80 is probably bad), GPU load and Core frequencies (forgot the base for RTX 2080, should be close to 1600 MHz or so).
Thanks, it seems that the NNCacheSize was the problem, I used the very low default value. When I followed your instructions above, using the same network you give, I got 54.5 KNPS at the 10 million node mark, just one percentage point better than your "in theory" prediction! So nothing is wrong, the laptop is just as good as a desktop should be!
Komodo rules!
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Lc0 wins, but Stockfish is still the best?

Post by mwyoung »

lkaufman wrote: Wed Mar 27, 2019 9:45 pm
Laskos wrote: Tue Mar 26, 2019 9:42 am
mwyoung wrote: Tue Mar 26, 2019 6:07 am
This gives Lc0 a huge hardware price advantage. On my testing system. I run a 16 core CPU for $799. At the same time Lc0 plays on a $1500 RTX 2080 ti. Is Lc0 better, yes but Lc0 needs every bit of the 2080 ti. And at almost twice the price to best Stockfish.
You can easily find an RTX 2080ti in $1200-$1250 range. Doesn't your CPU need a pretty large cooler to run consistently, without throttling on some very heavy and long CPU loads? That's another $70-$100 to almost $900 for CPU. GPUs don't need that. Also, RTX 2080ti doesn't seem to be the best deal around as GPU goes. I have my RTX 2070 for $550, 2.7 times cheaper than your expensive RTX 2080ti.

Let's compare apples to apples: Take the net 41687 (each net has its own speed curve, it is the latest I downloaded), put it on infinite from the standard opening position with "go". The engine should be Lc0 v0.21.1. Observe the speed after 10 million nodes. Mine is:

info depth 19 seldepth 57 time 232129 nodes 10456908 score cp 44 hashfull 246 nps 45047

I am curious what your 2.7 factor more expensive RTX 2080ti will show. If the scaling is good, two RTX 2070 would be a bit cheaper than one RTX 2080ti and I bet they would be faster.

As for CPU, I will buy this Summer or Autumn the third generation Ryzen 9 3850X, a 16 core CPU even stronger than your Threadripper 2950X, but for only $500.

All in all, Leela with latest T40 nets is very competitive with SF_dev strength-wise on fairly equal price-wise home-grade hardware.
I tried to replicate your test on my new laptop with RTX 2080. When I run it directly (without GUI) it doesn't use fp 16 and only gets about 12kNPS. So I run it with fp 16 in Fritz GUI on opening position, but still the NPS peaks at about 30knps and drops back to only 22knps by the time it has reached 10 million nodes. Since you got 45knps, shouldn't I get more than that on my 2080? Either I'm doing something wrong, or my GPU is not running full speed. I suppose that being a laptop might make it have to run slower, but I got the model that supposedly ran at the highest speed, not one that was slower but lighter. Any idea which is the explanation, or how I can find out what is wrong?
Use MSI afterburner to see what temp you are running, and GPU %. Being a laptop I dont know how good it will cool the 2080. I dont get any slow down on my system form the opening position. It increases. Maybe the laptop version is not the same as a desktop RTX. But slowing down over time usually means a thermal issue.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Lc0 wins, but Stockfish is still the best?

Post by Laskos »

lkaufman wrote: Wed Mar 27, 2019 10:55 pm Thanks, it seems that the NNCacheSize was the problem, I used the very low default value. When I followed your instructions above, using the same network you give, I got 54.5 KNPS at the 10 million node mark, just one percentage point better than your "in theory" prediction! So nothing is wrong, the laptop is just as good as a desktop should be!
Wow, very good laptop! Still, on long runs check the temperature and base clock, this was after all a 4 minute full load run only. I frankly cannot imagine a laptop managing to stay at full stable for 8 hours in these conditions on your desk. If issues occur on very long runs, I think you can use MSI Afterburner to underclock it a bit, for stability in long runs (stability in testing too).
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Lc0 wins, but Stockfish is still the best?

Post by lkaufman »

mwyoung wrote: Wed Mar 27, 2019 11:01 pm
lkaufman wrote: Wed Mar 27, 2019 9:45 pm
Laskos wrote: Tue Mar 26, 2019 9:42 am
mwyoung wrote: Tue Mar 26, 2019 6:07 am
This gives Lc0 a huge hardware price advantage. On my testing system. I run a 16 core CPU for $799. At the same time Lc0 plays on a $1500 RTX 2080 ti. Is Lc0 better, yes but Lc0 needs every bit of the 2080 ti. And at almost twice the price to best Stockfish.
You can easily find an RTX 2080ti in $1200-$1250 range. Doesn't your CPU need a pretty large cooler to run consistently, without throttling on some very heavy and long CPU loads? That's another $70-$100 to almost $900 for CPU. GPUs don't need that. Also, RTX 2080ti doesn't seem to be the best deal around as GPU goes. I have my RTX 2070 for $550, 2.7 times cheaper than your expensive RTX 2080ti.

Let's compare apples to apples: Take the net 41687 (each net has its own speed curve, it is the latest I downloaded), put it on infinite from the standard opening position with "go". The engine should be Lc0 v0.21.1. Observe the speed after 10 million nodes. Mine is:

info depth 19 seldepth 57 time 232129 nodes 10456908 score cp 44 hashfull 246 nps 45047

I am curious what your 2.7 factor more expensive RTX 2080ti will show. If the scaling is good, two RTX 2070 would be a bit cheaper than one RTX 2080ti and I bet they would be faster.

As for CPU, I will buy this Summer or Autumn the third generation Ryzen 9 3850X, a 16 core CPU even stronger than your Threadripper 2950X, but for only $500.

All in all, Leela with latest T40 nets is very competitive with SF_dev strength-wise on fairly equal price-wise home-grade hardware.
I tried to replicate your test on my new laptop with RTX 2080. When I run it directly (without GUI) it doesn't use fp 16 and only gets about 12kNPS. So I run it with fp 16 in Fritz GUI on opening position, but still the NPS peaks at about 30knps and drops back to only 22knps by the time it has reached 10 million nodes. Since you got 45knps, shouldn't I get more than that on my 2080? Either I'm doing something wrong, or my GPU is not running full speed. I suppose that being a laptop might make it have to run slower, but I got the model that supposedly ran at the highest speed, not one that was slower but lighter. Any idea which is the explanation, or how I can find out what is wrong?
Use MSI afterburner to see what temp you are running, and GPU %. Being a laptop I dont know how good it will cool the 2080. I dont get any slow down on my system form the opening position. It increases. Maybe the laptop version is not the same as a desktop RTX. But slowing down over time usually means a thermal issue.
My nps also climbs over time following Kai's instructions; the slowdown was only with the tiny default NNCache, which seems plausible to me.
Komodo rules!
FICGS
Posts: 15
Joined: Wed Mar 13, 2013 7:20 pm

Re: Lc0 wins, but Stockfish is still the best?

Post by FICGS »

Well, Stockfish still being the best or not, how much time before all centaur players use a neural network based chess engine? Not so long IMHO (less than 5 years?). Interesting years to come.
Play chess online on the FICGS applications & website - Correspondence chess tournaments & championship
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Lc0 wins, but Stockfish is still the best?

Post by mwyoung »

lkaufman wrote: Thu Mar 28, 2019 12:00 am
mwyoung wrote: Wed Mar 27, 2019 11:01 pm
lkaufman wrote: Wed Mar 27, 2019 9:45 pm
Laskos wrote: Tue Mar 26, 2019 9:42 am
mwyoung wrote: Tue Mar 26, 2019 6:07 am
This gives Lc0 a huge hardware price advantage. On my testing system. I run a 16 core CPU for $799. At the same time Lc0 plays on a $1500 RTX 2080 ti. Is Lc0 better, yes but Lc0 needs every bit of the 2080 ti. And at almost twice the price to best Stockfish.
You can easily find an RTX 2080ti in $1200-$1250 range. Doesn't your CPU need a pretty large cooler to run consistently, without throttling on some very heavy and long CPU loads? That's another $70-$100 to almost $900 for CPU. GPUs don't need that. Also, RTX 2080ti doesn't seem to be the best deal around as GPU goes. I have my RTX 2070 for $550, 2.7 times cheaper than your expensive RTX 2080ti.

Let's compare apples to apples: Take the net 41687 (each net has its own speed curve, it is the latest I downloaded), put it on infinite from the standard opening position with "go". The engine should be Lc0 v0.21.1. Observe the speed after 10 million nodes. Mine is:

info depth 19 seldepth 57 time 232129 nodes 10456908 score cp 44 hashfull 246 nps 45047

I am curious what your 2.7 factor more expensive RTX 2080ti will show. If the scaling is good, two RTX 2070 would be a bit cheaper than one RTX 2080ti and I bet they would be faster.

As for CPU, I will buy this Summer or Autumn the third generation Ryzen 9 3850X, a 16 core CPU even stronger than your Threadripper 2950X, but for only $500.

All in all, Leela with latest T40 nets is very competitive with SF_dev strength-wise on fairly equal price-wise home-grade hardware.
I tried to replicate your test on my new laptop with RTX 2080. When I run it directly (without GUI) it doesn't use fp 16 and only gets about 12kNPS. So I run it with fp 16 in Fritz GUI on opening position, but still the NPS peaks at about 30knps and drops back to only 22knps by the time it has reached 10 million nodes. Since you got 45knps, shouldn't I get more than that on my 2080? Either I'm doing something wrong, or my GPU is not running full speed. I suppose that being a laptop might make it have to run slower, but I got the model that supposedly ran at the highest speed, not one that was slower but lighter. Any idea which is the explanation, or how I can find out what is wrong?
Use MSI afterburner to see what temp you are running, and GPU %. Being a laptop I dont know how good it will cool the 2080. I dont get any slow down on my system form the opening position. It increases. Maybe the laptop version is not the same as a desktop RTX. But slowing down over time usually means a thermal issue.
My nps also climbs over time following Kai's instructions; the slowdown was only with the tiny default NNCache, which seems plausible to me.
Nice,
Enjoy the RTX power Larry....
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Lc0 wins, but Stockfish is still the best?

Post by lkaufman »

FICGS wrote: Thu Mar 28, 2019 12:13 am Well, Stockfish still being the best or not, how much time before all centaur players use a neural network based chess engine? Not so long IMHO (less than 5 years?). Interesting years to come.
Well, it is already apparent to me that in the earlier part of the game, assuming the best GPU and best CPU under $1000 for each, Lc0 or another NN is already superior to Stockfish in the majority of positions, especially if you want to use the engines in multiPV mode, when the choice isn't even close. But there will always be positions that are handled better by the alpha-beta algorithm, so presumably a good centaur with decent hardware will be using both already.
Komodo rules!
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Lc0 wins, but Stockfish is still the best?

Post by Laskos »

lkaufman wrote: Wed Mar 27, 2019 10:55 pm
Laskos wrote: Wed Mar 27, 2019 10:37 pm
lkaufman wrote: Wed Mar 27, 2019 9:45 pm
Laskos wrote: Tue Mar 26, 2019 9:42 am
mwyoung wrote: Tue Mar 26, 2019 6:07 am
This gives Lc0 a huge hardware price advantage. On my testing system. I run a 16 core CPU for $799. At the same time Lc0 plays on a $1500 RTX 2080 ti. Is Lc0 better, yes but Lc0 needs every bit of the 2080 ti. And at almost twice the price to best Stockfish.
You can easily find an RTX 2080ti in $1200-$1250 range. Doesn't your CPU need a pretty large cooler to run consistently, without throttling on some very heavy and long CPU loads? That's another $70-$100 to almost $900 for CPU. GPUs don't need that. Also, RTX 2080ti doesn't seem to be the best deal around as GPU goes. I have my RTX 2070 for $550, 2.7 times cheaper than your expensive RTX 2080ti.

Let's compare apples to apples: Take the net 41687 (each net has its own speed curve, it is the latest I downloaded), put it on infinite from the standard opening position with "go". The engine should be Lc0 v0.21.1. Observe the speed after 10 million nodes. Mine is:

info depth 19 seldepth 57 time 232129 nodes 10456908 score cp 44 hashfull 246 nps 45047

I am curious what your 2.7 factor more expensive RTX 2080ti will show. If the scaling is good, two RTX 2070 would be a bit cheaper than one RTX 2080ti and I bet they would be faster.

As for CPU, I will buy this Summer or Autumn the third generation Ryzen 9 3850X, a 16 core CPU even stronger than your Threadripper 2950X, but for only $500.

All in all, Leela with latest T40 nets is very competitive with SF_dev strength-wise on fairly equal price-wise home-grade hardware.
I tried to replicate your test on my new laptop with RTX 2080. When I run it directly (without GUI) it doesn't use fp 16 and only gets about 12kNPS. So I run it with fp 16 in Fritz GUI on opening position, but still the NPS peaks at about 30knps and drops back to only 22knps by the time it has reached 10 million nodes. Since you got 45knps, shouldn't I get more than that on my 2080? Either I'm doing something wrong, or my GPU is not running full speed. I suppose that being a laptop might make it have to run slower, but I got the model that supposedly ran at the highest speed, not one that was slower but lighter. Any idea which is the explanation, or how I can find out what is wrong?
Sure GPU throttling (including GPU temperature control), laptops are not made for RTX 2070-2080ti GPUs. Your laptop with full RTX 2080 GPU load and partial CPU load is something like 10-15 house illumination light bulbs with no proper cooling and ventilation. But to start with:

Have your weights_run1_41687.pb.gz in the same folder with Leela engine and all needed dll. Click on Lc0 v021.1, an engine window opens.
Copy and paste the following into that window:

setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 512
setoption name NNCacheSize value 10000000
setoption name WeightsFile value .\weights_run1_41687.pb.gz

press enter, then type 'go' and the again press enter.

Look at the NPS, and I guess because of the throttling its evolution will be different from mine.
As for speeds you reported, both 30knps and 22knps are bad, although I don't know what net you are using. But the fact that it went from 30knps to 22knps shows an almost certain throttling. Your RTX 2080 should in theory be some 20% faster than my 2070.

I saw so many weird speeds on some spreadsheets circulated, often not knowing what net was used and other variables, and often showing so many low speeds, that I suspect many people in fact have problems with their GPUs. I have the latest driver (2 weeks old for my GPU), and installed GPU-Z and MSI Afterburner long ago (from the times of GTX 1060 GPU I had). To have a very stable desktop system is easy. I changed the power supply unit for stronger one, the case for a larger and more open one and put 2 case fans, one larger close to the GPU and one smaller close to CPU. Now my temperatures, both GPU and CPU, never go above 60 degrees Celsius, under any heavy and prolonged load. Case fans alone decreased temperatures by 10 degrees or so. That all was for under $100 refurbishing. I don't know what you could do with the laptop, maybe on heavier and longer runs put it in the fridge :). More seriously, maybe put some fans under it, but it may look funny :). Throttling mangles also the stability of the testing conditions, I hate it, it can give contradictory results.

So, for now, use GPU-Z to check temperatures (above 80 is probably bad), GPU load and Core frequencies (forgot the base for RTX 2080, should be close to 1600 MHz or so).
Thanks, it seems that the NNCacheSize was the problem, I used the very low default value. When I followed your instructions above, using the same network you give, I got 54.5 KNPS at the 10 million node mark, just one percentage point better than your "in theory" prediction! So nothing is wrong, the laptop is just as good as a desktop should be!
With a bit of overclocking which proved to be stable after 8 hours of heavy load, and temperatures going only 1 degree Celsius higher (maxed to 61 degrees Celsius, but usually lower), I got the following after 10 million nodes:

info depth 19 seldepth 56 time 229984 nodes 10812659 score cp 44 hashfull 252 nps 47014

I will leave my PC for some four days on heavy load to check the stability in a longish test, as I will be out on vacation (hope to not destroy the apartment lol), will record all the parameters.
I am curious how comparatively that uber-duper RTX 2080ti GPU featured in this thread compares with my cheapo RTX 2070 (really, I bought one of the first and cheapest RTX 2070 available).