I installed two RTX 2060 (Gigabyte Windforce OC) into my Ryzen7 1800x 8x4000 MHz PC and I made some tests.
I used Leela version 0.21.1 for tests.
1. test: Net 11250
1a. test: Default parameters
GPU1
setoption name backend value cudnn-fp16
go nodes 1000000
Result: max nps = 22533 (depth 10 time 15406 nodes 347152 hasfull 986)
GPU2
setoption name backend value cudnn-fp16
go nodes 1000000
Result: max nps = 20313 (depth 10 time 19277 nodes 391592 hasfull 1000)
Note: GPU2 is in the second (SLI) slot what is a PCIe ver.2.0 (x4) slot with 1/8 bandwith.
DUAL GPU
setoption name threads value 4
setoption name backend value multiplexing
setoption name backendoptions value (backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)
go nodes 1000000
Result: max nps = 41.481 (depth 10 time 1102 nodes 456797 hashfull 1000)
npsGPU1 + npsGPU2 = 42846 so the effectiveness of the dual GPU is about 97%.
1b. test parameters found by Laskos
GPU1
setoption name backend value cudnn-fp16
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
go nodes 5000000
Result: max nps = 28646 (depth 13 time 143931 nodes 4036742 hasfull 919)
GPU2
setoption name backend value cudnn-fp16
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
go nodes 5000000
Result: max nps = 25143 (depth 13 time 145956 nodes 3669798 hashfull 839)
Note: as above
DUAL GPU
setoption name threads value 4
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
setoption name backend value multiplex
setoption name backendoptions value (backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)
go nodes 5000000
Result: max nodes = 51646 (depth 13 time 73566 nodes 3780545 hashfull 876)
npsGPU1 + npsGPU2 = 53789 so the effectiveness of DUAL GPU is about 95%.
(continued)
Dual RTX 2060 for Leela
Moderators: hgm, Rebel, chrisw
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Dual RTX 2060 for Leela
Thanks, looks good! That is probably the most cost-efficient set-up. For $800, the price of RTX 2080, you get speeds significantly above 2080Ti ($1300) and 40% above 2080. I will build in the future a similar 2 x RTX 2070, but 2060 seems the most cost-efficient solution. Curious if 2x scale well strength-wise, but I guess that if NPS are good, then effective speed-up is not far away.corres wrote: ↑Fri Apr 19, 2019 12:05 pm I installed two RTX 2060 (Gigabyte Windforce OC) into my Ryzen7 1800x 8x4000 MHz PC and I made some tests.
I used Leela version 0.21.1 for tests.
1. test: Net 11250
1a. test: Default parameters
GPU1
setoption name backend value cudnn-fp16
go nodes 1000000
Result: max nps = 22533 (depth 10 time 15406 nodes 347152 hasfull 986)
GPU2
setoption name backend value cudnn-fp16
go nodes 1000000
Result: max nps = 20313 (depth 10 time 19277 nodes 391592 hasfull 1000)
Note: GPU2 is in the second (SLI) slot what is a PCIe ver.2.0 (x4) slot with 1/8 bandwith.
DUAL GPU
setoption name threads value 4
setoption name backend value multiplexing
setoption name backendoptions value (backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)
go nodes 1000000
Result: max nps = 41.481 (depth 10 time 1102 nodes 456797 hashfull 1000)
npsGPU1 + npsGPU2 = 42846 so the effectiveness of the dual GPU is about 97%.
1b. test parameters found by Laskos
GPU1
setoption name backend value cudnn-fp16
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
go nodes 5000000
Result: max nps = 28646 (depth 13 time 143931 nodes 4036742 hasfull 919)
GPU2
setoption name backend value cudnn-fp16
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
go nodes 5000000
Result: max nps = 25143 (depth 13 time 145956 nodes 3669798 hashfull 839)
Note: as above
DUAL GPU
setoption name threads value 4
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
setoption name backend value multiplex
setoption name backendoptions value (backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)
go nodes 5000000
Result: max nodes = 51646 (depth 13 time 73566 nodes 3780545 hashfull 876)
npsGPU1 + npsGPU2 = 53789 so the effectiveness of DUAL GPU is about 95%.
(continued)
Last edited by Laskos on Fri Apr 19, 2019 1:37 pm, edited 1 time in total.
-
- Posts: 3657
- Joined: Wed Nov 18, 2015 11:41 am
- Location: hungary
Re: Dual RTX 2060 for Leela
2. test: Net 41800 (TCEC)corres wrote: ↑Fri Apr 19, 2019 12:05 pm I installed two RTX 2060 (Gigabyte Windforce OC) into my Ryzen7 1800x 8x4000 MHz PC and I made some tests.
I used Leela version 0.21.1 for tests.
1. test: Net 11250
1a. test: Default parameters
GPU1
setoption name backend value cudnn-fp16
go nodes 1000000
Result: max nps = 22533 (depth 10 time 15406 nodes 347152 hasfull 986)
GPU2
setoption name backend value cudnn-fp16
go nodes 1000000
Result: max nps = 20313 (depth 10 time 19277 nodes 391592 hasfull 1000)
Note: GPU2 is in the second (SLI) slot what is a PCIe ver.2.0 (x4) slot with 1/8 bandwith.
DUAL GPU
setoption name threads value 4
setoption name backend value multiplexing
setoption name backendoptions value (backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)
go nodes 1000000
Result: max nps = 41.481 (depth 10 time 1102 nodes 456797 hashfull 1000)
npsGPU1 + npsGPU2 = 42846 so the effectiveness of the dual GPU is about 97%.
1b. test parameters found by Laskos
GPU1
setoption name backend value cudnn-fp16
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
go nodes 5000000
Result: max nps = 28646 (depth 13 time 143931 nodes 4036742 hasfull 919)
GPU2
setoption name backend value cudnn-fp16
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
go nodes 5000000
Result: max nps = 25143 (depth 13 time 145956 nodes 3669798 hashfull 839)
Note: as above
DUAL GPU
setoption name threads value 4
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
setoption name backend value multiplex
setoption name backendoptions value (backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)
go nodes 5000000
Result: max nodes = 51646 (depth 13 time 73566 nodes 3780545 hashfull 876)
npsGPU1 + npsGPU2 = 53789 so the effectiveness of DUAL GPU is about 95%.
(continued)
2a. Test: Default parameters
GPU1
setoption name backend value cudnn-fp16
go nodes 1000000
Result: max nps = 23595 (depth 14 time 26760 nodes 631425 hashfull 1000)
GPU2
setoption name backend value cudnn-fp16
go nodes 1000000
Result: max nps = 21505 (depth 14 time 29358 nodes 631355 hashfull 1000)
Note: as above
DUAL GPU
setoption name threads value 4
setoption name backend value multiplexing
setoption name backendoptions value (backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)
go nodes 1000000
Result: max nodes = 43489 (depth 14 time 14529 nodes 631862 hashfull 1000)
npsGPU1 + npsGPU2 = 45100 so the effectiveness of DUAL GPU is about 96%.
2b.Test: parameters found by Laskos
GPU1
setoption name backend value cudnn-fp16
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
go nodes 5000000
Result: max nps = 34213 (depth 17 time 87071 nodes 2978872 hashfull 469)
GPU2
setoption name backend value cudnn-fp16
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
go nodes 5000000
Result: max nps = 31108 (depth 17 time 95821 nodes 2980840 hashfull 471)
Note: as above
DUAL GPU
setoption name threads value 4
setoption minibatchsize value 512
setoption name nncachesize value 2000000
setoption name backend value multiplexing
setoption name backendoptions value (backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)
go nodes 5000000
Result: max nps = 53130 (depth 17 time 56260 nodes 2989143 hashfull 476)
npsGPU1 + npsGPU2 = 65321 so the effectiveness of DUAL GPU is about 81%.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Dual RTX 2060 for Leela
Hello, can you use these for each card:corres wrote: ↑Fri Apr 19, 2019 1:37 pm2. test: Net 41800 (TCEC)corres wrote: ↑Fri Apr 19, 2019 12:05 pm I installed two RTX 2060 (Gigabyte Windforce OC) into my Ryzen7 1800x 8x4000 MHz PC and I made some tests.
I used Leela version 0.21.1 for tests.
1. test: Net 11250
1a. test: Default parameters
GPU1
setoption name backend value cudnn-fp16
go nodes 1000000
Result: max nps = 22533 (depth 10 time 15406 nodes 347152 hasfull 986)
GPU2
setoption name backend value cudnn-fp16
go nodes 1000000
Result: max nps = 20313 (depth 10 time 19277 nodes 391592 hasfull 1000)
Note: GPU2 is in the second (SLI) slot what is a PCIe ver.2.0 (x4) slot with 1/8 bandwith.
DUAL GPU
setoption name threads value 4
setoption name backend value multiplexing
setoption name backendoptions value (backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)
go nodes 1000000
Result: max nps = 41.481 (depth 10 time 1102 nodes 456797 hashfull 1000)
npsGPU1 + npsGPU2 = 42846 so the effectiveness of the dual GPU is about 97%.
1b. test parameters found by Laskos
GPU1
setoption name backend value cudnn-fp16
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
go nodes 5000000
Result: max nps = 28646 (depth 13 time 143931 nodes 4036742 hasfull 919)
GPU2
setoption name backend value cudnn-fp16
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
go nodes 5000000
Result: max nps = 25143 (depth 13 time 145956 nodes 3669798 hashfull 839)
Note: as above
DUAL GPU
setoption name threads value 4
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
setoption name backend value multiplex
setoption name backendoptions value (backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)
go nodes 5000000
Result: max nodes = 51646 (depth 13 time 73566 nodes 3780545 hashfull 876)
npsGPU1 + npsGPU2 = 53789 so the effectiveness of DUAL GPU is about 95%.
(continued)
2a. Test: Default parameters
GPU1
setoption name backend value cudnn-fp16
go nodes 1000000
Result: max nps = 23595 (depth 14 time 26760 nodes 631425 hashfull 1000)
GPU2
setoption name backend value cudnn-fp16
go nodes 1000000
Result: max nps = 21505 (depth 14 time 29358 nodes 631355 hashfull 1000)
Note: as above
DUAL GPU
setoption name threads value 4
setoption name backend value multiplexing
setoption name backendoptions value (backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)
go nodes 1000000
Result: max nodes = 43489 (depth 14 time 14529 nodes 631862 hashfull 1000)
npsGPU1 + npsGPU2 = 45100 so the effectiveness of DUAL GPU is about 96%.
2b.Test: parameters found by Laskos
GPU1
setoption name backend value cudnn-fp16
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
go nodes 5000000
Result: max nps = 34213 (depth 17 time 87071 nodes 2978872 hashfull 469)
GPU2
setoption name backend value cudnn-fp16
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
go nodes 5000000
Result: max nps = 31108 (depth 17 time 95821 nodes 2980840 hashfull 471)
Note: as above
DUAL GPU
setoption name threads value 4
setoption minibatchsize value 512
setoption name nncachesize value 2000000
setoption name backend value multiplexing
setoption name backendoptions value (backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)
go nodes 5000000
Result: max nps = 53130 (depth 17 time 56260 nodes 2989143 hashfull 476)
npsGPU1 + npsGPU2 = 65321 so the effectiveness of DUAL GPU is about 81%.
Backend value cudnn-fp16
MinibatchSize value 512
NNCacheSize value 10000000
then
WeightsFile value .\weights_run1_41687.pb.gz
and then observe speeds reached immediately after 10 million nodes mark?
Also, could you try first "multiplexing" then another try with "roundrobin"?
EDIT: also, I am not sure about the number of CPU threads. They seemed to have used 3 threads in TCEC on dual GPU, I am not sure what the reason was. Maybe MCTS parallelization is crappy.
Last edited by Laskos on Fri Apr 19, 2019 1:55 pm, edited 1 time in total.
-
- Posts: 3657
- Joined: Wed Nov 18, 2015 11:41 am
- Location: hungary
Re: Dual RTX 2060 for Leela
I agree.Laskos wrote: ↑Fri Apr 19, 2019 1:32 pm ...
Thanks, looks good! That is probably the most cost-efficient set-up. For $800, the price of RTX 2080, you get speeds significantly above 2080Ti ($1300) and 40% above 2080. I will build in the future a similar 2 x RTX 2070, but 2060 seems the most cost-efficient solution. Curious if 2x scale well strength-wise, but I guess that if NPS are good, then effective speed-up is not far away.
But there are some issues with dual gpu:
It needs bigger room in the PC case, it shows enhanced power consumption (RTX 2080 ~220 Wats, RTX 2080Ti ~250 Watts and RTX 2060 dual needs ~320 Watts) and naturally it produces more heat.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Dual RTX 2060 for Leela
Yes, but probably better OC-able with good cooling. Use 2-3 case fans, I used 2 for my smaller set-up and reduced temperatures on both CPU and GPU by some 10-14C, so I managed to OC my 2070 with an effect of 10% NPS speed-up with GPU temp. never going above 70C. The case should be spacious enough (I changed mine).corres wrote: ↑Fri Apr 19, 2019 1:52 pmI agree.Laskos wrote: ↑Fri Apr 19, 2019 1:32 pm ...
Thanks, looks good! That is probably the most cost-efficient set-up. For $800, the price of RTX 2080, you get speeds significantly above 2080Ti ($1300) and 40% above 2080. I will build in the future a similar 2 x RTX 2070, but 2060 seems the most cost-efficient solution. Curious if 2x scale well strength-wise, but I guess that if NPS are good, then effective speed-up is not far away.
But there are some issues with dual gpu:
It needs bigger room in the PC case, it shows enhanced power consumption (RTX 2080 ~220 Wats, RTX 2080Ti ~250 Watts and RTX 2060 dual needs ~320 Watts) and naturally it produces more heat.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Dual RTX 2060 for Leela
I am getting the following with these above settings, just after 10 million nodes mark, but with my stable over-clocked RTX 2070:Laskos wrote: ↑Fri Apr 19, 2019 1:49 pm
Hello, can you use these for each card:
Backend value cudnn-fp16
MinibatchSize value 512
NNCacheSize value 10000000
then
WeightsFile value .\weights_run1_41687.pb.gz
and then observe speeds reached immediately after 10 million nodes mark?
Also, could you try first "multiplexing" then another try with "roundrobin"?
EDIT: also, I am not sure about the number of CPU threads. They seemed to have used 3 threads in TCEC on dual GPU, I am not sure what the reason was. Maybe MCTS parallelization is crappy.
setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 512
setoption name NNCacheSize value 10000000
setoption name WeightsFile value .\weights_run1_41687.pb.gz
info depth 19 seldepth 57 time 220147 nodes 10789318 score cp 44 hashfull 250 nps 49009 tbhits 0 pv d2d4
This is about 3-4 minutes search, or tournament time control. I am curious what your dual 2060 set-up shows (although Lc0 engine seems to have problems digesting above 70-80k NPS speeds).
-
- Posts: 3657
- Joined: Wed Nov 18, 2015 11:41 am
- Location: hungary
Re: Dual RTX 2060 for Leela
"Roundrobin" is a trick to invert the GPUs and it has sense if you use different GPUs only.Laskos wrote: ↑Fri Apr 19, 2019 1:49 pm Hello, can you use these for each card:
Backend value cudnn-fp16
MinibatchSize value 512
NNCacheSize value 10000000
then
WeightsFile value .\weights_run1_41687.pb.gz
and then observe speeds reached immediately after 10 million nodes mark?
Also, could you try first "multiplexing" then another try with "roundrobin"?
EDIT: also, I am not sure about the number of CPU threads. They seemed to have used 3 threads in TCEC on dual GPU, I am not sure what the reason was. Maybe MCTS parallelization is crappy.
If it is used nncachesize 10000000 instead of nncachesize 2000000 the max nps will grow in some minimal measure.
I do not understand .\weights_run1_41687.pb.gz. Obviously 41687 is an another Net file but what is the good of it?
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Dual RTX 2060 for Leela
No, just to have the same net, as every net will show a different speed behavior. I am not sure of your description of "roundrobin" option.corres wrote: ↑Fri Apr 19, 2019 3:24 pm"Roundrobin" is a trick to invert the GPUs and it has sense if you use different GPUs only.Laskos wrote: ↑Fri Apr 19, 2019 1:49 pm Hello, can you use these for each card:
Backend value cudnn-fp16
MinibatchSize value 512
NNCacheSize value 10000000
then
WeightsFile value .\weights_run1_41687.pb.gz
and then observe speeds reached immediately after 10 million nodes mark?
Also, could you try first "multiplexing" then another try with "roundrobin"?
EDIT: also, I am not sure about the number of CPU threads. They seemed to have used 3 threads in TCEC on dual GPU, I am not sure what the reason was. Maybe MCTS parallelization is crappy.
If it is used nncachesize 10000000 instead of nncachesize 2000000 the max nps will grow in some minimal measure.
I do not understand .\weights_run1_41687.pb.gz. Obviously 41687 is an another Net file but what is the good of it?
-
- Posts: 177
- Joined: Wed May 23, 2018 9:29 pm
Re: Dual RTX 2060 for Leela
It's at least intended that roundrobin works the best when all GPUs are the same, and for multiplexing it's not that strict.
I wrote a blog post about backends recently: http://blog.lczero.org/2019/04/backend- ... l?m=1#more
I wrote a blog post about backends recently: http://blog.lczero.org/2019/04/backend- ... l?m=1#more