Dual RTX 2060 for Leela

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Dual RTX 2060 for Leela

Post by corres »

chrisw wrote: Wed Apr 24, 2019 5:32 pm
corres wrote: Tue Apr 23, 2019 9:05 am
Albert Silver wrote: Mon Apr 22, 2019 9:36 pm Just a sidenote: the NNcache can lead to somewhat misleading nodes per second counts. If you benchmark with a large value, and it does not fill it before the end, you will get significant NPS results, but they will not represent your actual speed throughout the game. Once filled, the NPS will drop a lot, and that is actually the speed you will get for the rest of a game.
During the tests there were some cases when Hasfull reached 1000. In these cases the value of nps stopped to grow but the nps was not reduced.
I tried NNCacheSize=20000000 but lc0 got fried.
I do not know the developers of LC0 why set such a low value of nncachesize for default.
Have you any information about the dimension of nncachesize and minibatchsize?
And where they are: in the RAM or in the VRAM?
Developers of LC0 back some information from us, what is a pity thing.
I would assume what is happening (I didn’t implement hash yet), is that if a new node matches with a hash table entry, the new node gets given the win and visit count from the hash, and backs that (win,visits) entry back to the root. That root move gets a higher visit count as a result even though it didn’t actually make the visits.
But, the line that put the original data (wins, visits) into the hash was also backed up at the time, and its (wins,visits) will also be represented at the root by its root move.
Then LC0 computes node-count by adding up the root move visits. Hence the double counting. Anyway, if this theory is correct, a non-doubles node-count could be done at the point of summing the root visits, by subtracting the given hash visits. nodecount=oldnodecount minus hashhits.
So, why in this scenario does the double counting stop when hash is full? Because the hash entries are still there and you’ld assume other hits from other nodes will still take place? Mystery.

Another thought, when backpropagating (wins,visits), if a node on the backprop path is in the hash table, presumably the hash entry should get updated at the same time.
This sounds well but I would be curious to know the opinion of the authors of Leela.
The are very taciturn peoples.
Hugo
Posts: 782
Joined: Tue Dec 01, 2009 11:10 am

Re: Dual RTX 2060 for Leela

Post by Hugo »

I am having problems to gain any performance plus out of using two RTX 2060 cards.
Tried 3 different backends but all the same.
nodes are ~ 45.000+ nps after a minute or two in startpossition.
But in 5m +3s games I cannot see any - ANY! - better results yet.
Tried multiplex, roundrobin and demux. 15 - 20 games.
Let run demux over night.

C.K.
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Dual RTX 2060 for Leela

Post by corres »

Hugo wrote: Wed Apr 24, 2019 9:21 pm I am having problems to gain any performance plus out of using two RTX 2060 cards.
Tried 3 different backends but all the same.
nodes are ~ 45.000+ nps after a minute or two in startpossition.
But in 5m +3s games I cannot see any - ANY! - better results yet.
Tried multiplex, roundrobin and demux. 15 - 20 games.
Let run demux over night.
C.K.
I do not understand you.
Considering to what you did not get any performance plus?
Two RTX 2060 cards give more performance than only one RTX 2060 card, it is sure.
Hugo
Posts: 782
Joined: Tue Dec 01, 2009 11:10 am

Re: Dual RTX 2060 for Leela

Post by Hugo »

crem wrote: Fri Apr 19, 2019 4:46 pm It's at least intended that roundrobin works the best when all GPUs are the same, and for multiplexing it's not that strict.

I wrote a blog post about backends recently: http://blog.lczero.org/2019/04/backend- ... l?m=1#more
so for two RTX 2060 cards which have not the same clock speed( 100MHz difference), you recommend backend=multiplexing ?
what about minnibatch-size? leave it default?
are there other parameters to increase 2 cards performance?

thx for any hint.
C.K.
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Dual RTX 2060 for Leela

Post by corres »

Hugo wrote: Wed Apr 24, 2019 10:58 pm
crem wrote: Fri Apr 19, 2019 4:46 pm It's at least intended that roundrobin works the best when all GPUs are the same, and for multiplexing it's not that strict.
I wrote a blog post about backends recently: http://blog.lczero.org/2019/04/backend- ... l?m=1#more
so for two RTX 2060 cards which have not the same clock speed( 100MHz difference), you recommend backend=multiplexing ?
what about minnibatch-size? leave it default?
are there other parameters to increase 2 cards performance?
thx for any hint.
C.K.
Which backend is the best it depends on the system.
I use Multiplexing with Backendoptions backend=cudnn-fp16,(gpu=0),(gpu=1) and I use Threads 4.
Any other parameters would keep in Default.
May be excluded the NNCacheSize 2000000 - it is enough!
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Dual RTX 2060 for Leela

Post by corres »

corres wrote: Thu Apr 25, 2019 12:33 am
Hugo wrote: Wed Apr 24, 2019 10:58 pm
crem wrote: Fri Apr 19, 2019 4:46 pm It's at least intended that roundrobin works the best when all GPUs are the same, and for multiplexing it's not that strict.
I wrote a blog post about backends recently: http://blog.lczero.org/2019/04/backend- ... l?m=1#more
so for two RTX 2060 cards which have not the same clock speed( 100MHz difference), you recommend backend=multiplexing ?
what about minnibatch-size? leave it default?
are there other parameters to increase 2 cards performance?
thx for any hint.
C.K.
Which backend is the best it depends on the system.
I use Multiplexing with Backendoptions backend=cudnn-fp16,(gpu=0),(gpu=1) and I use Threads 4.
Any other parameters would keep in Default.
May be excluded the NNCacheSize 2000000 - it is enough!
Hugo,
I am curious to know your result
User avatar
M ANSARI
Posts: 3707
Joined: Thu Mar 16, 2006 7:10 pm

Re: Dual RTX 2060 for Leela

Post by M ANSARI »

Laskos wrote: Fri Apr 19, 2019 1:32 pm
corres wrote: Fri Apr 19, 2019 12:05 pm I installed two RTX 2060 (Gigabyte Windforce OC) into my Ryzen7 1800x 8x4000 MHz PC and I made some tests.
I used Leela version 0.21.1 for tests.
1. test: Net 11250
1a. test: Default parameters
GPU1
setoption name backend value cudnn-fp16
go nodes 1000000
Result: max nps = 22533 (depth 10 time 15406 nodes 347152 hasfull 986)
GPU2
setoption name backend value cudnn-fp16
go nodes 1000000
Result: max nps = 20313 (depth 10 time 19277 nodes 391592 hasfull 1000)
Note: GPU2 is in the second (SLI) slot what is a PCIe ver.2.0 (x4) slot with 1/8 bandwith.
DUAL GPU
setoption name threads value 4
setoption name backend value multiplexing
setoption name backendoptions value (backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)
go nodes 1000000
Result: max nps = 41.481 (depth 10 time 1102 nodes 456797 hashfull 1000)

npsGPU1 + npsGPU2 = 42846 so the effectiveness of the dual GPU is about 97%.

1b. test parameters found by Laskos
GPU1
setoption name backend value cudnn-fp16
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
go nodes 5000000
Result: max nps = 28646 (depth 13 time 143931 nodes 4036742 hasfull 919)
GPU2
setoption name backend value cudnn-fp16
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
go nodes 5000000
Result: max nps = 25143 (depth 13 time 145956 nodes 3669798 hashfull 839)
Note: as above
DUAL GPU
setoption name threads value 4
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
setoption name backend value multiplex
setoption name backendoptions value (backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)
go nodes 5000000
Result: max nodes = 51646 (depth 13 time 73566 nodes 3780545 hashfull 876)

npsGPU1 + npsGPU2 = 53789 so the effectiveness of DUAL GPU is about 95%.

(continued)
Thanks, looks good! That is probably the most cost-efficient set-up. For $800, the price of RTX 2080, you get speeds significantly above 2080Ti ($1300) and 40% above 2080. I will build in the future a similar 2 x RTX 2070, but 2060 seems the most cost-efficient solution. Curious if 2x scale well strength-wise, but I guess that if NPS are good, then effective speed-up is not far away.

If this scaling holds up then 2 x 2080 Ti should really rock!
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Dual RTX 2060 for Leela

Post by corres »

M ANSARI wrote: Thu Apr 25, 2019 7:52 am
Laskos wrote: Fri Apr 19, 2019 1:32 pm
corres wrote: Fri Apr 19, 2019 12:05 pm I installed two RTX 2060 (Gigabyte Windforce OC) into my Ryzen7 1800x 8x4000 MHz PC and I made some tests.
I used Leela version 0.21.1 for tests.
1. test: Net 11250
1a. test: Default parameters
GPU1
setoption name backend value cudnn-fp16
go nodes 1000000
Result: max nps = 22533 (depth 10 time 15406 nodes 347152 hasfull 986)
GPU2
setoption name backend value cudnn-fp16
go nodes 1000000
Result: max nps = 20313 (depth 10 time 19277 nodes 391592 hasfull 1000)
Note: GPU2 is in the second (SLI) slot what is a PCIe ver.2.0 (x4) slot with 1/8 bandwith.
DUAL GPU
setoption name threads value 4
setoption name backend value multiplexing
setoption name backendoptions value (backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)
go nodes 1000000
Result: max nps = 41.481 (depth 10 time 1102 nodes 456797 hashfull 1000)

npsGPU1 + npsGPU2 = 42846 so the effectiveness of the dual GPU is about 97%.

1b. test parameters found by Laskos
GPU1
setoption name backend value cudnn-fp16
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
go nodes 5000000
Result: max nps = 28646 (depth 13 time 143931 nodes 4036742 hasfull 919)
GPU2
setoption name backend value cudnn-fp16
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
go nodes 5000000
Result: max nps = 25143 (depth 13 time 145956 nodes 3669798 hashfull 839)
Note: as above
DUAL GPU
setoption name threads value 4
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
setoption name backend value multiplex
setoption name backendoptions value (backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)
go nodes 5000000
Result: max nodes = 51646 (depth 13 time 73566 nodes 3780545 hashfull 876)

npsGPU1 + npsGPU2 = 53789 so the effectiveness of DUAL GPU is about 95%.

(continued)
Thanks, looks good! That is probably the most cost-efficient set-up. For $800, the price of RTX 2080, you get speeds significantly above 2080Ti ($1300) and 40% above 2080. I will build in the future a similar 2 x RTX 2070, but 2060 seems the most cost-efficient solution. Curious if 2x scale well strength-wise, but I guess that if NPS are good, then effective speed-up is not far away.

If this scaling holds up then 2 x 2080 Ti should really rock!
The effectiveness is depend on the cards and the type of Net.
Using newer Net the effectiveness is lowered.
If somebody who posses stronger cards would repeat my tests with identical parameters we would know the effectiveness of stronger cards for newer Net too.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Dual RTX 2060 for Leela

Post by Laskos »

M ANSARI wrote: Thu Apr 25, 2019 7:52 am
Laskos wrote: Fri Apr 19, 2019 1:32 pm
corres wrote: Fri Apr 19, 2019 12:05 pm I installed two RTX 2060 (Gigabyte Windforce OC) into my Ryzen7 1800x 8x4000 MHz PC and I made some tests.
I used Leela version 0.21.1 for tests.
1. test: Net 11250
1a. test: Default parameters
GPU1
setoption name backend value cudnn-fp16
go nodes 1000000
Result: max nps = 22533 (depth 10 time 15406 nodes 347152 hasfull 986)
GPU2
setoption name backend value cudnn-fp16
go nodes 1000000
Result: max nps = 20313 (depth 10 time 19277 nodes 391592 hasfull 1000)
Note: GPU2 is in the second (SLI) slot what is a PCIe ver.2.0 (x4) slot with 1/8 bandwith.
DUAL GPU
setoption name threads value 4
setoption name backend value multiplexing
setoption name backendoptions value (backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)
go nodes 1000000
Result: max nps = 41.481 (depth 10 time 1102 nodes 456797 hashfull 1000)

npsGPU1 + npsGPU2 = 42846 so the effectiveness of the dual GPU is about 97%.

1b. test parameters found by Laskos
GPU1
setoption name backend value cudnn-fp16
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
go nodes 5000000
Result: max nps = 28646 (depth 13 time 143931 nodes 4036742 hasfull 919)
GPU2
setoption name backend value cudnn-fp16
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
go nodes 5000000
Result: max nps = 25143 (depth 13 time 145956 nodes 3669798 hashfull 839)
Note: as above
DUAL GPU
setoption name threads value 4
setoption name minibatchsize value 512
setoption name nncachesize value 2000000
setoption name backend value multiplex
setoption name backendoptions value (backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)
go nodes 5000000
Result: max nodes = 51646 (depth 13 time 73566 nodes 3780545 hashfull 876)

npsGPU1 + npsGPU2 = 53789 so the effectiveness of DUAL GPU is about 95%.

(continued)
Thanks, looks good! That is probably the most cost-efficient set-up. For $800, the price of RTX 2080, you get speeds significantly above 2080Ti ($1300) and 40% above 2080. I will build in the future a similar 2 x RTX 2070, but 2060 seems the most cost-efficient solution. Curious if 2x scale well strength-wise, but I guess that if NPS are good, then effective speed-up is not far away.

If this scaling holds up then 2 x 2080 Ti should really rock!
Yes, but that's for $2500. My impression (might be wrong) is that 2060 and 2070 overclock better. I got 10% overclock on mine 2070 just due to good ventilation and case fans, with GPU stock cooler. Temperatures are never above 70C. I guess I can go further, but the dust sucked in by all the fans is an issue long-term. The best value seems to be 2 x 2060 or 2 x 2070, depending on which OC-s better, for not that much money. 2500 bucks is expensive to me.
Hugo
Posts: 782
Joined: Tue Dec 01, 2009 11:10 am

Re: Dual RTX 2060 for Leela

Post by Hugo »

Yes, but that's for $2500. My impression (might be wrong) is that 2060 and 2070 overclock better. I got 10% overclock on mine 2070 just due to good ventilation and case fans, with GPU stock cooler. Temperatures are never above 70C. I guess I can go further, but the dust sucked in by all the fans is an issue long-term. The best value seems to be 2 x 2060 or 2 x 2070, depending on which OC-s better, for not that much money. 2500 bucks is expensive to me.
I already have a combo of RTX 2060 and RTX 2070 running at 1900Mhz.
I want to find out if multi GPU is a good option for Lc0 .
ALL my pre-test did show absolutely no gain in game performance yet. !
I tried multiplexing, roundrobin and demux. all togethet about 90 games 5m+3s from my single gpu tests.
with two RTX 2060 and weightfile 42000 I got ~42.000 - 45.000nps.

The suspect im me is growing, that there could be a big difference in 45.000 nps from a GPU combo, or 45.000 nps from a RTX 2080ti.

I let run a test with my 2060+2070 combo. 100 games 5m+3s
condition as from my single Lc0 tests.


https://docs.google.com/spreadsheets/d/ ... sp=sharing

C.K.