Dual RTX 2060 for Leela

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: RTX 2070 @ 49 Nps

Post by corres »

Laskos wrote: Sat Apr 20, 2019 5:08 pm Sure, and I asked you in some of the first posts in this thread to repeat my procedure and to compare numbers, but you didn't do that. Re-read that post.
There is no any sense such a measure of nps what is determined by occasional parameters and method of measurement on what somebody think it is beneficial for his system.
Only the standardized measurement has real sense.
Moreover I made a test with nncachesize=20000000 and I yielded only ~5% plus nps.
A test with nncachesize=10000000 gives more less enhancement, it is obvious.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: RTX 2070 @ 49 Nps

Post by Laskos »

corres wrote: Sat Apr 20, 2019 5:36 pm
Laskos wrote: Sat Apr 20, 2019 5:08 pm Sure, and I asked you in some of the first posts in this thread to repeat my procedure and to compare numbers, but you didn't do that. Re-read that post.
There is no any sense such a measure of nps what is determined by occasional parameters and method of measurement on what somebody think it is beneficial for his system.
Only the standardized measurement has real sense.
Moreover I made a test with nncachesize=20000000 and I yielded only ~5% plus nps.
A test with nncachesize=10000000 gives more less enhancement, it is obvious.
I am not thinking of it being beneficial to anything, just to do the _same_ meaningful measurement. Almost on any system with an RTX 2xxx card, my conditions as in early post were pretty representative of LTC games, that's all. Setting, like you, the default cache size, is completely meaningless, on the other hand, and seriously affects negatively the strength of Lc0 at LTC.
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: Dual RTX 2060 for Leela

Post by chrisw »

smatovic wrote: Sat Apr 20, 2019 11:04 am
corres wrote: Sat Apr 20, 2019 10:47 am
chrisw wrote: Sat Apr 20, 2019 1:32 am What is a node in lc0? This is not a stupid question, btw. Could be every node in the tree. Could be every node in the tree plus every node outside the tree. If it’s a tree node, does it count 1 for every time the search goes through, or just the once when it is visited the first time? Could be the count of discrete NN lookups. I kind of have been assuming it’s a “normal” computer chess node, ticked up for every “move”, but maybe not?
For your questions only the developers of Leela can give a correct answer.
Even for AB engines there is no any standard to define what would display the GUI as a "number of nodes" or "nodes per seconds". These are practical questions what can decide the developers only . Sometimes their decision is determined by commercial viewpoints - please think about Rybka, for e.g.
I asked Ankan once, he told me that for nps LC0 counts the NN eval calls made by search,
this includes Policy/Value and NN cache hits.

--
Srdja
So in-tree nodes are not counted twice. Is normal in AB to count those, every step in the tree counts. So, in reality, compared to AB nodecounter, LC0 is undercounting by, let me work it out ... a factor of average branch length minus one. What’s average branch length? More as more search, maybe five to ten? Guessing.
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: RTX 2070 @ 49 Nps

Post by corres »

Laskos wrote: Sat Apr 20, 2019 5:43 pm
corres wrote: Sat Apr 20, 2019 5:36 pm
Laskos wrote: Sat Apr 20, 2019 5:08 pm Sure, and I asked you in some of the first posts in this thread to repeat my procedure and to compare numbers, but you didn't do that. Re-read that post.
There is no any sense such a measure of nps what is determined by occasional parameters and method of measurement on what somebody think it is beneficial for his system.
Only the standardized measurement has real sense.
Moreover I made a test with nncachesize=20000000 and I yielded only ~5% plus nps.
A test with nncachesize=10000000 gives more less enhancement, it is obvious.
I am not thinking of it being beneficial to anything, just to do the _same_ meaningful measurement. Almost on any system with an RTX 2xxx card, my conditions as in early post were pretty representative of LTC games, that's all. Setting, like you, the default cache size, is completely meaningless, on the other hand, and seriously affects negatively the strength of Lc0 at LTC.
So we have different aim: I (and others) should like to know the power of own card relative to others.
The usage of the default parameters is very practical (no need any modifications) for test runs making to get comparable result. This is and was my intent.
But what is your real aim I do not know. You reported a (rather dubious) high nps number without precise command of measurement how you got it. Alone the super cooled PC case and the 10% OC can not explain your number. "Hugo" showed such a high nps number but his number is higher only 16000 nps yet he uses OCed RTX 2070 and OCed RTX 2060 together.
At last I think everybody measure what he wants and he interprets how he wants.
This is very liberal method but it is no any connection to the engineering requirement.
Last edited by corres on Sat Apr 20, 2019 7:37 pm, edited 2 times in total.
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: RTX 2070 @ 49 Nps

Post by corres »

Laskos wrote: Sat Apr 20, 2019 5:43 pm
corres wrote: Sat Apr 20, 2019 5:36 pm
Laskos wrote: Sat Apr 20, 2019 5:08 pm Sure, and I asked you in some of the first posts in this thread to repeat my procedure and to compare numbers, but you didn't do that. Re-read that post.
There is no any sense such a measure of nps what is determined by occasional parameters and method of measurement on what somebody think it is beneficial for his system.
Only the standardized measurement has real sense.
Moreover I made a test with nncachesize=20000000 and I yielded only ~5% plus nps.
A test with nncachesize=10000000 gives more less enhancement, it is obvious.
I am not thinking of it being beneficial to anything, just to do the _same_ meaningful measurement. Almost on any system with an RTX 2xxx card, my conditions as in early post were pretty representative of LTC games, that's all. Setting, like you, the default cache size, is completely meaningless, on the other hand, and seriously affects negatively the strength of Lc0 at LTC.
So we have different aim: I (and others) should like to know the power of own card relative to others.
The usage of the default parameters is very practical (no need any modifications) for test runs making to get comparable result. This is and was my intent.
But what is your real aim I do not know. You reported a (rather dubious) high nps number without precise command of measurement how you got it. Alone the super cooled PC case and the 10% OC can not explain your number. "Hugo" showed such a high nps number but his number is higher only 16000 nps yet he uses OCed RTX 2070 and OCed RTX 2060 together.
At last I think everybody measure what he wants and he interprets how he wants.
This is very liberal method but it is no any connection to engineering requirement.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: RTX 2070 @ 49 Nps

Post by Laskos »

corres wrote: Sat Apr 20, 2019 7:34 pm
Laskos wrote: Sat Apr 20, 2019 5:43 pm
corres wrote: Sat Apr 20, 2019 5:36 pm
Laskos wrote: Sat Apr 20, 2019 5:08 pm Sure, and I asked you in some of the first posts in this thread to repeat my procedure and to compare numbers, but you didn't do that. Re-read that post.
There is no any sense such a measure of nps what is determined by occasional parameters and method of measurement on what somebody think it is beneficial for his system.
Only the standardized measurement has real sense.
Moreover I made a test with nncachesize=20000000 and I yielded only ~5% plus nps.
A test with nncachesize=10000000 gives more less enhancement, it is obvious.
I am not thinking of it being beneficial to anything, just to do the _same_ meaningful measurement. Almost on any system with an RTX 2xxx card, my conditions as in early post were pretty representative of LTC games, that's all. Setting, like you, the default cache size, is completely meaningless, on the other hand, and seriously affects negatively the strength of Lc0 at LTC.
So we have different aim: I (and others) should like to know the power of own card relative to others.
The usage of the default parameters is very practical (no need any modifications) for test runs making to get comparable result. This is and was my intent.
But what is your real aim I do not know. You reported a (rather dubious) high nps number without precise command of measurement how you got it. Alone the super cooled PC case and the 10% OC can not explain your number. "Hugo" showed such a high nps number but his number is higher only 16000 nps yet he uses OCed RTX 2070 and OCed RTX 2060 together.
At last I think everybody measure what he wants and he interprets how he wants.
This is very liberal method but it is no any connection to engineering requirement.
I posted not at all dubious NPS with precise instructions how to measure it. My aim is to determine the strength of my GPU as related to Leela and at LTC. I used the adequate for these conditions parameters, especially cache size, which is ridiculously low by default. To test possible throttling issues, which probably many have (I guess you too), I would propose even a 15 minutes or longer test with adequate parameters. On say 10 seconds runs, even a cheap laptop with RTX GPU will run smoothly.
I do not know what your aim is, you wrote "I (and others) should like to know the power of own card relative to others". I do not know what "others" or you want, you can as well post "Call of Duty" benchmarks to "know the power of own card". I was talking of Leela in LTC game conditions.
I do not want to argue here with you, you are some newbie to this forum, and use strange reasoning and wording in your posts.
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Dual RTX 2060 for Leela

Post by corres »

corres wrote: Sat Apr 20, 2019 1:06 am
Basing on the common test data before we can make a list of RTX 2000 line GPUs.
The common parameters are:
NET: 11250
Backend: cudnn-fp16
Minibatchsize: 512
NNcachesize: 2000000
Other parameters are default
The list:
RTX 2060 OC max nps = 28646 (corres)
RTX 2070 non-OC max nps = 29357 (Laskos)
RTX 2080 max nps = 36300 (Albert Silver)
RTX 2080 Ti max nps = 43297 (Albert Silver)
DUAL RTX 2060 OC max nps = 53789 (corres)
RTX 2080 Ti + RTX 2080 max nps = 77435 (Albert Silver)
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Dual RTX 2060 for Leela

Post by Laskos »

corres wrote: Sun Apr 21, 2019 7:15 pm
corres wrote: Sat Apr 20, 2019 1:06 am
Basing on the common test data before we can make a list of RTX 2000 line GPUs.
The common parameters are:
NET: 11250
Backend: cudnn-fp16
Minibatchsize: 512
NNcachesize: 2000000
Other parameters are default
The list:
RTX 2060 OC max nps = 28646 (corres)
RTX 2070 non-OC max nps = 29357 (Laskos)
RTX 2080 max nps = 36300 (Albert Silver)
RTX 2080 Ti max nps = 43297 (Albert Silver)
DUAL RTX 2060 OC max nps = 53789 (corres)
RTX 2080 Ti + RTX 2080 max nps = 77435 (Albert Silver)
You didn't specify the time or nodes at which those were measured, and I don't remember using NN cache of 2000000. So, it's probably another useless list for Leela, as are most circulating around. Again, could you specify all the parameters, as I posted since the start of this thread:

setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 512
setoption name NNCacheSize value 10000000
setoption name WeightsFile value .\11250.txt.gz
go

My now OC-ed 2070 GPU with 2 threads of 3.8GHz i7 CPU gives:

info depth 14 seldepth 43 time 283565 nodes 10050124 score cp 23 hashfull 426 nps 35442 tbhits 0 pv d2d4
after 10 million nodes

info depth 16 seldepth 50 time 409072 nodes 15183130 score cp 25 hashfull 606 nps 37116 tbhits 0 pv d2d4
after 15 million nodes

LTC like 5 minutes is probably better than very short runs, and hashfull is better to be about half. Even longer TC (say 15 minutes) would be good for checking the throttling. TCEC14 Leela machine (an i5) seemed to me to suffer, I guess many have problems over long runs.
glennsamuel32
Posts: 136
Joined: Sat Dec 04, 2010 5:31 pm
Location: 223

Re: RTX 2070 @ 49 Nps

Post by glennsamuel32 »

corres wrote: Sat Apr 20, 2019 7:34 pm So we have different aim: I (and others) should like to know the power of own card relative to others.
Check benchmarks of various configurations with the actual settings here...

https://docs.google.com/spreadsheets/d/ ... 1508569046
Judge without bias, or don't judge at all...
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Dual RTX 2060 for Leela

Post by Laskos »

Laskos wrote: Sun Apr 21, 2019 10:45 pm
corres wrote: Sun Apr 21, 2019 7:15 pm
corres wrote: Sat Apr 20, 2019 1:06 am
Basing on the common test data before we can make a list of RTX 2000 line GPUs.
The common parameters are:
NET: 11250
Backend: cudnn-fp16
Minibatchsize: 512
NNcachesize: 2000000
Other parameters are default
The list:
RTX 2060 OC max nps = 28646 (corres)
RTX 2070 non-OC max nps = 29357 (Laskos)
RTX 2080 max nps = 36300 (Albert Silver)
RTX 2080 Ti max nps = 43297 (Albert Silver)
DUAL RTX 2060 OC max nps = 53789 (corres)
RTX 2080 Ti + RTX 2080 max nps = 77435 (Albert Silver)
You didn't specify the time or nodes at which those were measured, and I don't remember using NN cache of 2000000. So, it's probably another useless list for Leela, as are most circulating around. Again, could you specify all the parameters, as I posted since the start of this thread:

setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 512
setoption name NNCacheSize value 10000000
setoption name WeightsFile value .\11250.txt.gz
go

My now OC-ed 2070 GPU with 2 threads of 3.8GHz i7 CPU gives:

info depth 14 seldepth 43 time 283565 nodes 10050124 score cp 23 hashfull 426 nps 35442 tbhits 0 pv d2d4
after 10 million nodes

info depth 16 seldepth 50 time 409072 nodes 15183130 score cp 25 hashfull 606 nps 37116 tbhits 0 pv d2d4
after 15 million nodes

LTC like 5 minutes is probably better than very short runs, and hashfull is better to be about half. Even longer TC (say 15 minutes) would be good for checking the throttling. TCEC14 Leela machine (an i5) seemed to me to suffer, I guess many have problems over long runs.
Also, forgot to mention, those results in your list are for different engines, from v19.0 to v21.1, and depending on net formats, it can make a significant difference. You are using a demoted net format of 10xxx run, while the reference for long time from now is and will be 40xxx format. The latest nets of the 40xxx are by a significant margin the best ones, even before the last LR drop. My inference is that test 40 is probably already above AlphaZero level, maybe I will open a thread about that.
So, I think the benchmarks now should use v21.1 engine and a test 40 net. I proposed one at the beginning of this thread, but yoy stuck to your weird arguments and now you post useless performance lists.