Page 2 of 7

Re: Dual RTX 2060 for Leela

Posted: Fri Apr 19, 2019 5:08 pm
by Laskos
crem wrote: Fri Apr 19, 2019 4:46 pm It's at least intended that roundrobin works the best when all GPUs are the same, and for multiplexing it's not that strict.

I wrote a blog post about backends recently: http://blog.lczero.org/2019/04/backend- ... l?m=1#more
Thanks, very helpful.
So, in his case, if the are no throttling issues, he should use either roundrobin or demux.

Re: Dual RTX 2060 for Leela

Posted: Fri Apr 19, 2019 6:22 pm
by corres
Laskos wrote: Fri Apr 19, 2019 5:08 pm
crem wrote: Fri Apr 19, 2019 4:46 pm It's at least intended that roundrobin works the best when all GPUs are the same, and for multiplexing it's not that strict.

I wrote a blog post about backends recently: http://blog.lczero.org/2019/04/backend- ... l?m=1#more
Thanks, very helpful.
So, in his case, if the are no throttling issues, he should use either roundrobin or demux.
It is pity, but nowadays there is no any GPU without throttling...
If you want to switch off throttling you should modify the BIOS of GPU. It is a task for developers of GPU only.
MSI AFTERBURNER, etc. is not a good tool for it.
Even the most better cooler can not block the modification of the dye temperature either.

Re: Dual RTX 2060 for Leela

Posted: Fri Apr 19, 2019 6:44 pm
by corres
crem wrote: Fri Apr 19, 2019 4:46 pm It's at least intended that roundrobin works the best when all GPUs are the same, and for multiplexing it's not that strict.
I wrote a blog post about backends recently: http://blog.lczero.org/2019/04/backend- ... l?m=1#more
Thanks.
I noticed some times the incomplete documentation of Leela.
To get experience I run the tests as follows:
1. Test: multiplexing
setoption name threads value 4
setoption name backend value multiplexing
setoption name backendoptions value backend=cudnn-fp16,(gpu=0),(gpu=1) // as you show
go nodes 1000000
Result: max nps = 43608 (depth 14...) ~ as was before
2. Test demux
setoption name threads value 4
setoption name backend value demux
setoption name backendoptions value backend=cudnn-fp16,(gpu=0),(gpu=1)
go nodes 1000000
Result: max nps = 38111 (depth 14...) (?)
3. Test roundrobin
setoption name threads value 4
setoption name backend value roundrobin
setoption name backendoptions value backend=cudnn-fp16,(gpu=0),(gpu=1)
go nodes 1000000
Result: max nps = 42100 (depth 14...)

I like to know your opinion about Elo-effect of "Laskos-parameters".

Re: Dual RTX 2060 for Leela

Posted: Fri Apr 19, 2019 7:06 pm
by Laskos
corres wrote: Fri Apr 19, 2019 6:22 pm
Laskos wrote: Fri Apr 19, 2019 5:08 pm
crem wrote: Fri Apr 19, 2019 4:46 pm It's at least intended that roundrobin works the best when all GPUs are the same, and for multiplexing it's not that strict.

I wrote a blog post about backends recently: http://blog.lczero.org/2019/04/backend- ... l?m=1#more
Thanks, very helpful.
So, in his case, if the are no throttling issues, he should use either roundrobin or demux.
It is pity, but nowadays there is no any GPU without throttling...
If you want to switch off throttling you should modify the BIOS of GPU. It is a task for developers of GPU only.
MSI AFTERBURNER, etc. is not a good tool for it.
Even the most better cooler can not block the modification of the dye temperature either.
I think that if temp. is not above 75-78C, throttling below nominal frequency won't occur. It is the boost core frequency above nominal which can vary with temp. below 75C. Over prolonged constant runs, boost will settle with lesser spikes. But if it is throttling, you will see deterioration of performance over prolonged runs, going below nominal frequency. Check your temperatures and if core clocks go sometimes below nominal in MSI Afterburner. I OC-ed my GPU by as much as 200MHz and it keeps stable over days of constant full load, but I kept 160MHz OC, as the dust will deteriorate temperatures in time (many fans on my system, dust is inevitably sucked up). I guess my system will need a clean-up in these conditions in 5-6 months.

Re: Dual RTX 2060 for Leela

Posted: Fri Apr 19, 2019 7:24 pm
by Laskos
corres wrote: Fri Apr 19, 2019 6:44 pm
I like to know your opinion about Elo-effect of "Laskos-parameters".
You mean these:

setoption name MinibatchSize value 512
setoption name NNCacheSize value 10000000

?

The second one (NNCache) is better than the default in all cases (if you have a decent RAM), and can be significantly better Elo-wise. The first one is debatable between 256 and 512, in my experience 512 might be a tiny bit better (but I have the impression that something like 400 is even better in test-suites), but here the issue is about 5-10 Elo points, a small one.

Re: Dual RTX 2060 for Leela

Posted: Fri Apr 19, 2019 8:14 pm
by corres
Laskos wrote: Fri Apr 19, 2019 7:24 pm
corres wrote: Fri Apr 19, 2019 6:44 pm
I like to know your opinion about Elo-effect of "Laskos-parameters".
You mean these:

setoption name MinibatchSize value 512
setoption name NNCacheSize value 10000000

?

The second one (NNCache) is better than the default in all cases (if you have a decent RAM), and can be significantly better Elo-wise. The first one is debatable between 256 and 512, in my experience 512 might be a tiny bit better (but I have the impression that something like 400 is even better in test-suites), but here the issue is about 5-10 Elo points, a small one.
I am following your works but I should like to know an independent opinion too.
If Crem agree you he should enhance the default value of nncachesize.
Why he does not enhance it - this is the question.

Re: Dual RTX 2060 for Leela

Posted: Fri Apr 19, 2019 8:23 pm
by corres
Laskos wrote: Fri Apr 19, 2019 7:06 pm
corres wrote: Fri Apr 19, 2019 6:22 pm
Laskos wrote: Fri Apr 19, 2019 5:08 pm
crem wrote: Fri Apr 19, 2019 4:46 pm It's at least intended that roundrobin works the best when all GPUs are the same, and for multiplexing it's not that strict.

I wrote a blog post about backends recently: http://blog.lczero.org/2019/04/backend- ... l?m=1#more
Thanks, very helpful.
So, in his case, if the are no throttling issues, he should use either roundrobin or demux.
It is pity, but nowadays there is no any GPU without throttling...
If you want to switch off throttling you should modify the BIOS of GPU. It is a task for developers of GPU only.
MSI AFTERBURNER, etc. is not a good tool for it.
Even the most better cooler can not block the modification of the dye temperature either.
I think that if temp. is not above 75-78C, throttling below nominal frequency won't occur. It is the boost core frequency above nominal which can vary with temp. below 75C. Over prolonged constant runs, boost will settle with lesser spikes. But if it is throttling, you will see deterioration of performance over prolonged runs, going below nominal frequency. Check your temperatures and if core clocks go sometimes below nominal in MSI Afterburner. I OC-ed my GPU by as much as 200MHz and it keeps stable over days of constant full load, but I kept 160MHz OC, as the dust will deteriorate temperatures in time (many fans on my system, dust is inevitably sucked up). I guess my system will need a clean-up in these conditions in 5-6 months.
Sorry, but I am not such an optimistic as you are.
The temperature of dye is the important factor and not temperature of case.
There is only one cause when throttling is zero: If you switch it off.
CPU-throttling is serious issue also. For our luckiness the throttling of CPU can be switch in BIOS.
I always switch it off.

Re: Dual RTX 2060 for Leela

Posted: Fri Apr 19, 2019 9:58 pm
by Hugo
Hi

I had combined a RTX 2060 and RTX 2070 not long ago.

I do not change minibatch size, it is suspected to weaken the search.
So my parameters are , single GPU
--backend=cudnn-fp16
--threads=2
--nncache=20000000

Then i do not use go nodes 1000000 bcs it is way too short, and the nps are on a low level at that point.
Go nodes 5000000 takes arround 1 minute, and seems perfect to me.

So, using a Network 40 I had :
on RTX 2060 > go nodes 5000000 > nps ~30.000
on RTX 2070 > go nodes 5000000 > nps ~35.000

double GPU
--backend=roundrobin
--backend-opts=(backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)
--threads=4
--nncache=20000000

RTX 2060 + RTX 2070 > go nodes 5000000 > nps 65.000

I noticed, the longer the search was lasting, the GPU load was not always on top level. It was something between 75 % - 95 %

C.K.

Re: Dual RTX 2060 for Leela

Posted: Fri Apr 19, 2019 10:22 pm
by corres
Hugo wrote: Fri Apr 19, 2019 9:58 pm Hi

I had combined a RTX 2060 and RTX 2070 not long ago.

I do not change minibatch size, it is suspected to weaken the search.
So my parameters are , single GPU
--backend=cudnn-fp16
--threads=2
--nncache=20000000

Then i do not use go nodes 1000000 bcs it is way too short, and the nps are on a low level at that point.
Go nodes 5000000 takes arround 1 minute, and seems perfect to me.

So, using a Network 40 I had :
on RTX 2060 > go nodes 5000000 > nps ~30.000
on RTX 2070 > go nodes 5000000 > nps ~35.000

double GPU
--backend=roundrobin
--backend-opts=(backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)
--threads=4
--nncache=20000000

RTX 2060 + RTX 2070 > go nodes 5000000 > nps 65.000

I noticed, the longer the search was lasting, the GPU load was not always on top level. It was something between 75 % - 95 %

C.K.
My experience is the same.
Who are the producers of your GPUs and what are their types?
What is your experience about "Laskos parameters"?
And about MULTIPLEXING/ROUNDROBIN/DEMUX?
I think there others who are interested in these questions.

Re: Dual RTX 2060 for Leela

Posted: Fri Apr 19, 2019 11:14 pm
by Hugo
corres wrote: Fri Apr 19, 2019 10:22 pm My experience is the same.
Who are the producers of your GPUs and what are their types?
What is your experience about "Laskos parameters"?
And about MULTIPLEXING/ROUNDROBIN/DEMUX?
I think there others who are interested in these questions.
cards are
Asus GeForce RTX 2070 ROG Strix OC
Gainward GeForce RTX 2060 Phoenix GS
both running @ 1900 MHz without any tool or setting.
For my Network 40 tests, The 2060 I had to downclock to 1600MHz with MSI tool to get a Leela Ratio 1.1

Laskos parameters, I didnt notice yet.

Multiplexing, GPU load was more worse then on round robin.
demux, I havent tested yet, but I will.

C.K.