Dual RTX 2060 for Leela

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
corres
Posts: 1802
Joined: Wed Nov 18, 2015 10:41 am
Location: hungary

Re: Dual RTX 2060 for Leela

Post by corres » Fri Apr 19, 2019 9:33 pm

Hugo wrote:
Fri Apr 19, 2019 9:14 pm
cards are
Asus GeForce RTX 2070 ROG Strix OC
Gainward GeForce RTX 2060 Phoenix GS
both running @ 1900 MHz without any tool or setting.
For my Network 40 tests, The 2060 I had to downclock to 1600MHz with MSI tool to get a Leela Ratio 1.1
Laskos parameters, I didnt notice yet.
Multiplexing, GPU load was more worse then on round robin.
demux, I havent tested yet, but I will.
C.K.
Both of your cards are well OCed, as I see.
"Laskos parameter" is NNcacheSize = 2000000.
The default value is 200000 only.
And what are about your motherboard and processor?
These are also important.

corres
Posts: 1802
Joined: Wed Nov 18, 2015 10:41 am
Location: hungary

Re: Dual RTX 2060 for Leela

Post by corres » Fri Apr 19, 2019 11:06 pm

corres wrote:
Fri Apr 19, 2019 6:14 pm
Laskos wrote:
Fri Apr 19, 2019 5:24 pm
corres wrote:
Fri Apr 19, 2019 4:44 pm
I like to know your opinion about Elo-effect of "Laskos-parameters".
You mean these:
setoption name MinibatchSize value 512
setoption name NNCacheSize value 10000000
The second one (NNCache) is better than the default in all cases (if you have a decent RAM), and can be significantly better Elo-wise. The first one is debatable between 256 and 512, in my experience 512 might be a tiny bit better (but I have the impression that something like 400 is even better in test-suites), but here the issue is about 5-10 Elo points, a small one.
I am following your works but I should like to know an independent opinion too.
If Crem agree you he should enhance the default value of nncachesize.
Why he does not enhance it - this is the question.
This is the answer (even if it is indirect):
Leela parameters to CCCC-
-minibatchsize = 640
-nncachesize = 20000000 (!)

I made test with these parameters and backend = multiplexing (the best for me), go nodes 5000000
The Result: max nps = 53856
It is ~5 % enhancement.
Maybe the next version of Leela will get these values for default.

chrisw
Posts: 2209
Joined: Tue Apr 03, 2012 2:28 pm

Re: Dual RTX 2060 for Leela

Post by chrisw » Fri Apr 19, 2019 11:32 pm

corres wrote:
Fri Apr 19, 2019 10:05 am
I installed two RTX 2060 (Gigabyte Windforce OC) into my Ryzen7 1800x 8x4000 MHz PC and I made some tests.
I used Leela version 0.21.1 for tests.
1. test: Net 11250
1a. test: Default parameters
GPU1
setoption name backend value cudnn-fp16
go nodes 1000000
Result: max nps = 22533 (depth 10 time 15406 nodes 347152 hasfull 986)
GPU2
setoption name backend value cudnn-fp16
go nodes 1000000
Result: max nps = 20313 (depth 10 time 19277 nodes 391592 hasfull 1000)
Note: GPU2 is in the second (SLI) slot what is a PCIe ver.2.0 (x4) slot with 1/8 bandwith.
DUAL GPU
setoption name threads value 4
setoption name backend value multiplexing
setoption name backendoptions value (backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)
go nodes 1000000
Result: max nps = 41.481 (depth 10 time 1102 nodes 456797 hashfull 1000)

npsGPU1 + npsGPU2 = 42846 so the effectiveness of the dual GPU is about 97%.

1b. test parameters found by Laskos
GPU1
setoption name backend value cudnn-fp16
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
go nodes 5000000
Result: max nps = 28646 (depth 13 time 143931 nodes 4036742 hasfull 919)
GPU2
setoption name backend value cudnn-fp16
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
go nodes 5000000
Result: max nps = 25143 (depth 13 time 145956 nodes 3669798 hashfull 839)
Note: as above
DUAL GPU
setoption name threads value 4
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
setoption name backend value multiplex
setoption name backendoptions value (backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)
go nodes 5000000
Result: max nodes = 51646 (depth 13 time 73566 nodes 3780545 hashfull 876)

npsGPU1 + npsGPU2 = 53789 so the effectiveness of DUAL GPU is about 95%.

(continued)
What is a node in lc0? This is not a stupid question, btw. Could be every node in the tree. Could be every node in the tree plus every node outside the tree. If it’s a tree node, does it count 1 for every time the search goes through, or just the once when it is visited the first time? Could be the count of discrete NN lookups. I kind of have been assuming it’s a “normal” computer chess node, ticked up for every “move”, but maybe not?

cma6
Posts: 145
Joined: Thu May 29, 2014 3:58 pm

RTX 2070 @ 49 Nps

Post by cma6 » Sat Apr 20, 2019 1:52 am

@Laskos
"I am getting the following with these above settings, just after 10 million nodes mark, but with my stable over-clocked RTX 2070:
info depth 19 seldepth 57 time 220147 nodes 10789318 score cp 44 hashfull 250 nps 49009 tbhits 0 pv d2d4"

49K nps is amazing and for a single RTX 2070? Exactly what model and cooling are you using, if I may ask, and O/C at what speed?
Thanks.

corres
Posts: 1802
Joined: Wed Nov 18, 2015 10:41 am
Location: hungary

Re: Dual RTX 2060 for Leela

Post by corres » Sat Apr 20, 2019 8:47 am

chrisw wrote:
Fri Apr 19, 2019 11:32 pm
What is a node in lc0? This is not a stupid question, btw. Could be every node in the tree. Could be every node in the tree plus every node outside the tree. If it’s a tree node, does it count 1 for every time the search goes through, or just the once when it is visited the first time? Could be the count of discrete NN lookups. I kind of have been assuming it’s a “normal” computer chess node, ticked up for every “move”, but maybe not?
For your questions only the developers of Leela can give a correct answer.
Even for AB engines there is no any standard to define what would display the GUI as a "number of nodes" or "nodes per seconds". These are practical questions what can decide the developers only . Sometimes their decision is determined by commercial viewpoints - please think about Rybka, for e.g.

smatovic
Posts: 964
Joined: Wed Mar 10, 2010 9:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic
Contact:

Re: Dual RTX 2060 for Leela

Post by smatovic » Sat Apr 20, 2019 9:04 am

corres wrote:
Sat Apr 20, 2019 8:47 am
chrisw wrote:
Fri Apr 19, 2019 11:32 pm
What is a node in lc0? This is not a stupid question, btw. Could be every node in the tree. Could be every node in the tree plus every node outside the tree. If it’s a tree node, does it count 1 for every time the search goes through, or just the once when it is visited the first time? Could be the count of discrete NN lookups. I kind of have been assuming it’s a “normal” computer chess node, ticked up for every “move”, but maybe not?
For your questions only the developers of Leela can give a correct answer.
Even for AB engines there is no any standard to define what would display the GUI as a "number of nodes" or "nodes per seconds". These are practical questions what can decide the developers only . Sometimes their decision is determined by commercial viewpoints - please think about Rybka, for e.g.
I asked Ankan once, he told me that for nps LC0 counts the NN eval calls made by search,
this includes Policy/Value and NN cache hits.

--
Srdja

Milos
Posts: 3387
Joined: Wed Nov 25, 2009 12:47 am

Re: Dual RTX 2060 for Leela

Post by Milos » Sat Apr 20, 2019 12:15 pm

smatovic wrote:
Sat Apr 20, 2019 9:04 am
corres wrote:
Sat Apr 20, 2019 8:47 am
chrisw wrote:
Fri Apr 19, 2019 11:32 pm
What is a node in lc0? This is not a stupid question, btw. Could be every node in the tree. Could be every node in the tree plus every node outside the tree. If it’s a tree node, does it count 1 for every time the search goes through, or just the once when it is visited the first time? Could be the count of discrete NN lookups. I kind of have been assuming it’s a “normal” computer chess node, ticked up for every “move”, but maybe not?
For your questions only the developers of Leela can give a correct answer.
Even for AB engines there is no any standard to define what would display the GUI as a "number of nodes" or "nodes per seconds". These are practical questions what can decide the developers only . Sometimes their decision is determined by commercial viewpoints - please think about Rybka, for e.g.
I asked Ankan once, he told me that for nps LC0 counts the NN eval calls made by search,
this includes Policy/Value and NN cache hits.

--
Srdja
Correct, whenever leaf node is reached LC0 increases node counter. That's why with increasing NN cache size one gets larger nps.
The interesting thing though is that it is called NN cache, but in essence it is used identically as A/B engines hash.

User avatar
Laskos
Posts: 9545
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: RTX 2070 @ 49 Nps

Post by Laskos » Sat Apr 20, 2019 12:23 pm

cma6 wrote:
Sat Apr 20, 2019 1:52 am
@Laskos
"I am getting the following with these above settings, just after 10 million nodes mark, but with my stable over-clocked RTX 2070:
info depth 19 seldepth 57 time 220147 nodes 10789318 score cp 44 hashfull 250 nps 49009 tbhits 0 pv d2d4"

49K nps is amazing and for a single RTX 2070? Exactly what model and cooling are you using, if I may ask, and O/C at what speed?
Thanks.
It's a Gainward GeForce RTX 2070 8GB GDDR6 256-bit (426018336-4269) non- factory OC-ed GPU with 1620MHz base core clock and boost to 1920MHz IIRC. I OC-ed the base core frequency by 160MHz (some 10%) and set the power limit to 108%. It settles for 1800-1900MHz core speeds over long runs, but that since I installed case fans and good ventilation inside. Without ever reaching 70C+ temperatures.

The speed is not that spectacular as it seems. That particular net is among the fast ones, and even within test40 run, speeds can vary by 25% between the nets. It's important to have the newest drivers, I have installed the newest one from mid-April. So, all in all, OC gives me some additional 4k NPS and newest drivers another 3-4k NPS over the first drivers. But my speeds are not spectacular, I guess folks showing much lower speeds have some cooling issues.

Here is the result with that fast net:

setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 512
setoption name NNCacheSize value 10000000
setoption name WeightsFile value .\weights_run1_41687.pb.gz

info depth 19 seldepth 57 time 252838 nodes 12746584 score cp 44 hashfull 276 nps 50414 tbhits 0 pv d2d4


And here is the result with a much slower net, one of the latest:

setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 512
setoption name NNCacheSize value 10000000
setoption name WeightsFile value .\weights_run1_42033.pb.gz

info depth 20 seldepth 58 time 285902 nodes 11813408 score cp 36 hashfull 354 nps 41319 tbhits 0 pv d2d4


So, it's important to specify the net ID and the conditions. I guess my OC-ed RTX 2070 is some 10-12% slower than a well working non-OC RTX 2080 with fairly new drivers, and quite close to an (non-throttling) RTX 2080 with old drivers.

corres
Posts: 1802
Joined: Wed Nov 18, 2015 10:41 am
Location: hungary

Re: RTX 2070 @ 49 Nps

Post by corres » Sat Apr 20, 2019 2:44 pm

Laskos wrote:
Sat Apr 20, 2019 12:23 pm
cma6 wrote:
Sat Apr 20, 2019 1:52 am
@Laskos
"I am getting the following with these above settings, just after 10 million nodes mark, but with my stable over-clocked RTX 2070:
info depth 19 seldepth 57 time 220147 nodes 10789318 score cp 44 hashfull 250 nps 49009 tbhits 0 pv d2d4"

49K nps is amazing and for a single RTX 2070? Exactly what model and cooling are you using, if I may ask, and O/C at what speed?
Thanks.
It's a Gainward GeForce RTX 2070 8GB GDDR6 256-bit (426018336-4269) non- factory OC-ed GPU with 1620MHz base core clock and boost to 1920MHz IIRC. I OC-ed the base core frequency by 160MHz (some 10%) and set the power limit to 108%. It settles for 1800-1900MHz core speeds over long runs, but that since I installed case fans and good ventilation inside. Without ever reaching 70C+ temperatures.

The speed is not that spectacular as it seems. That particular net is among the fast ones, and even within test40 run, speeds can vary by 25% between the nets. It's important to have the newest drivers, I have installed the newest one from mid-April. So, all in all, OC gives me some additional 4k NPS and newest drivers another 3-4k NPS over the first drivers. But my speeds are not spectacular, I guess folks showing much lower speeds have some cooling issues.

Here is the result with that fast net:

setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 512
setoption name NNCacheSize value 10000000
setoption name WeightsFile value .\weights_run1_41687.pb.gz

info depth 19 seldepth 57 time 252838 nodes 12746584 score cp 44 hashfull 276 nps 50414 tbhits 0 pv d2d4


And here is the result with a much slower net, one of the latest:

setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 512
setoption name NNCacheSize value 10000000
setoption name WeightsFile value .\weights_run1_42033.pb.gz

info depth 20 seldepth 58 time 285902 nodes 11813408 score cp 36 hashfull 354 nps 41319 tbhits 0 pv d2d4


So, it's important to specify the net ID and the conditions. I guess my OC-ed RTX 2070 is some 10-12% slower than a well working non-OC RTX 2080 with fairly new drivers, and quite close to an (non-throttling) RTX 2080 with old drivers.
Beyond the above the nps value depends on the measuring node number too.
It is obvious you get this high nps value with no the command "go nodes 5000000" but (let's say) "go nodes 15000000".
I would be curious to know what an nps value you would get with the command "go nodes 5000000".
For comparison everybody would use the same process to get real nps number.

User avatar
Laskos
Posts: 9545
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: RTX 2070 @ 49 Nps

Post by Laskos » Sat Apr 20, 2019 3:08 pm

corres wrote:
Sat Apr 20, 2019 2:44 pm
Laskos wrote:
Sat Apr 20, 2019 12:23 pm
cma6 wrote:
Sat Apr 20, 2019 1:52 am
@Laskos
"I am getting the following with these above settings, just after 10 million nodes mark, but with my stable over-clocked RTX 2070:
info depth 19 seldepth 57 time 220147 nodes 10789318 score cp 44 hashfull 250 nps 49009 tbhits 0 pv d2d4"

49K nps is amazing and for a single RTX 2070? Exactly what model and cooling are you using, if I may ask, and O/C at what speed?
Thanks.
It's a Gainward GeForce RTX 2070 8GB GDDR6 256-bit (426018336-4269) non- factory OC-ed GPU with 1620MHz base core clock and boost to 1920MHz IIRC. I OC-ed the base core frequency by 160MHz (some 10%) and set the power limit to 108%. It settles for 1800-1900MHz core speeds over long runs, but that since I installed case fans and good ventilation inside. Without ever reaching 70C+ temperatures.

The speed is not that spectacular as it seems. That particular net is among the fast ones, and even within test40 run, speeds can vary by 25% between the nets. It's important to have the newest drivers, I have installed the newest one from mid-April. So, all in all, OC gives me some additional 4k NPS and newest drivers another 3-4k NPS over the first drivers. But my speeds are not spectacular, I guess folks showing much lower speeds have some cooling issues.

Here is the result with that fast net:

setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 512
setoption name NNCacheSize value 10000000
setoption name WeightsFile value .\weights_run1_41687.pb.gz

info depth 19 seldepth 57 time 252838 nodes 12746584 score cp 44 hashfull 276 nps 50414 tbhits 0 pv d2d4


And here is the result with a much slower net, one of the latest:

setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 512
setoption name NNCacheSize value 10000000
setoption name WeightsFile value .\weights_run1_42033.pb.gz

info depth 20 seldepth 58 time 285902 nodes 11813408 score cp 36 hashfull 354 nps 41319 tbhits 0 pv d2d4


So, it's important to specify the net ID and the conditions. I guess my OC-ed RTX 2070 is some 10-12% slower than a well working non-OC RTX 2080 with fairly new drivers, and quite close to an (non-throttling) RTX 2080 with old drivers.
Beyond the above the nps value depends on the measuring node number too.
It is obvious you get this high nps value with no the command "go nodes 5000000" but (let's say) "go nodes 15000000".
I would be curious to know what an nps value you would get with the command "go nodes 5000000".
For comparison everybody would use the same process to get real nps number.
Sure, and I asked you in some of the first posts in this thread to repeat my procedure and to compare numbers, but you didn't do that. Re-read that post.

Post Reply