My non-OC RTX 2070 is very fast with Lc0

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Laskos »

Albert Silver wrote: Thu Dec 06, 2018 2:25 pm
Laskos wrote: Thu Dec 06, 2018 3:59 am
brianr wrote: Thu Dec 06, 2018 3:33 am OK, something seems off.

Why are the 2080 depth 17 nodes so many more than the depth 19 with the 2070?

Maybe I am missing something.
Thanks.
Probably different nets used. But the speed should be fairly uniform with the latter test30 nets, so nps are probably fair to compare.
I used 11250, which I had on hand.
Ok, with this net I am getting:

info depth 16 seldepth 43 time 95727 nodes 2810327 score cp 25 hashfull 643 nps 29357,

so your is about 23% higher. Having about 28% more CUDA cores at 7% higher frequency. In total 37% expected speed-up. It seems memory speed and bandwidth also matter, as those are the same in 2070 and 80. Also, the price is 40% higher. I think the most ineffective would be RTX 2080 Ti, and the most effective a dual RTX 2070.
Albert Silver
Posts: 3019
Joined: Wed Mar 08, 2006 9:57 pm
Location: Rio de Janeiro, Brazil

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Albert Silver »

Laskos wrote: Thu Dec 06, 2018 3:43 pm
Albert Silver wrote: Thu Dec 06, 2018 2:25 pm
Laskos wrote: Thu Dec 06, 2018 3:59 am
brianr wrote: Thu Dec 06, 2018 3:33 am OK, something seems off.

Why are the 2080 depth 17 nodes so many more than the depth 19 with the 2070?

Maybe I am missing something.
Thanks.
Probably different nets used. But the speed should be fairly uniform with the latter test30 nets, so nps are probably fair to compare.
I used 11250, which I had on hand.
Ok, with this net I am getting:

info depth 16 seldepth 43 time 95727 nodes 2810327 score cp 25 hashfull 643 nps 29357,

so your is about 23% higher. Having about 28% more CUDA cores at 7% higher frequency. In total 37% expected speed-up. It seems memory speed and bandwidth also matter, as those are the same in 2070 and 80. Also, the price is 40% higher. I think the most ineffective would be RTX 2080 Ti, and the most effective a dual RTX 2070.
It depends on your use. For pure playing no doubt this is true, but for actual NN building the 2080ti is king by quite a margin.

https://lambdalabs.com/blog/best-gpu-te ... benchmark/
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Milos »

Albert Silver wrote: Fri Dec 07, 2018 2:59 am
Laskos wrote: Thu Dec 06, 2018 3:43 pm
Albert Silver wrote: Thu Dec 06, 2018 2:25 pm
Laskos wrote: Thu Dec 06, 2018 3:59 am
brianr wrote: Thu Dec 06, 2018 3:33 am OK, something seems off.

Why are the 2080 depth 17 nodes so many more than the depth 19 with the 2070?

Maybe I am missing something.
Thanks.
Probably different nets used. But the speed should be fairly uniform with the latter test30 nets, so nps are probably fair to compare.
I used 11250, which I had on hand.
Ok, with this net I am getting:

info depth 16 seldepth 43 time 95727 nodes 2810327 score cp 25 hashfull 643 nps 29357,

so your is about 23% higher. Having about 28% more CUDA cores at 7% higher frequency. In total 37% expected speed-up. It seems memory speed and bandwidth also matter, as those are the same in 2070 and 80. Also, the price is 40% higher. I think the most ineffective would be RTX 2080 Ti, and the most effective a dual RTX 2070.
It depends on your use. For pure playing no doubt this is true, but for actual NN building the 2080ti is king by quite a margin.

https://lambdalabs.com/blog/best-gpu-te ... benchmark/
I wouldn't be so sure about that. Dual 2070 still has higher FP32 output than 2080ti, costs less and has a same TDP. Multi-GPU training losses depend a bit on training model, but they are fairly low.
schack
Posts: 172
Joined: Thu May 27, 2010 3:32 am

Re: My non-OC RTX 2070 is very fast with Lc0

Post by schack »

Perhaps off-topic, but what's the best value card one might get to run Lc0? GTX 1060?
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Laskos »

schack wrote: Fri Dec 07, 2018 3:50 pm Perhaps off-topic, but what's the best value card one might get to run Lc0? GTX 1060?
NPS per $?
Overall, dual RTX 2070. Cheaper, single RTX 2070. Even cheaper, discount or second hand GTX 1060. But wait for (RTX?) 2060, I am not even sure it supports fp16.
schack
Posts: 172
Joined: Thu May 27, 2010 3:32 am

Re: My non-OC RTX 2070 is very fast with Lc0

Post by schack »

Well, NPS per $ but also in terms of 'value card.' $800 on a video card is not in my budget. I'm also looking for less power-hungry options, and wondering about how much 3gb vs 6gb ram (with teh 1060) would make a difference. I don't game so I've never bought a serious video card.

Or maybe I should just rent time on a cloud server and not change my hardware, at least not for awhile.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Laskos »

schack wrote: Fri Dec 07, 2018 4:19 pm Well, NPS per $ but also in terms of 'value card.' $800 on a video card is not in my budget. I'm also looking for less power-hungry options, and wondering about how much 3gb vs 6gb ram (with teh 1060) would make a difference. I don't game so I've never bought a serious video card.

Or maybe I should just rent time on a cloud server and not change my hardware, at least not for awhile.
Discount or second hand GTX 1060 6GB. Soon they will be for $150 in many places. Bitcoin and such craze seems to stagnate or even recede, so I guess "the future is bright" on discounts for these budget cards.

Edit:
Forgot about upcoming (RTX?) 2060, it might come at $350, and if supporting fp16, be 3-4 times faster than GTX 1060 6GB.
schack
Posts: 172
Joined: Thu May 27, 2010 3:32 am

Re: My non-OC RTX 2070 is very fast with Lc0

Post by schack »

Has anyone benchmarked the Tesla P100 with Lc0? You can rent time on Amazon AWS in a Windows environment - so you can setup the engine to run in the ChessBase Cloud - for like $1.20/hr with 1 P100.
Werewolf
Posts: 1796
Joined: Thu Sep 18, 2008 10:24 pm

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Werewolf »

Laskos wrote: Thu Dec 06, 2018 3:43 pm
Albert Silver wrote: Thu Dec 06, 2018 2:25 pm
Laskos wrote: Thu Dec 06, 2018 3:59 am
brianr wrote: Thu Dec 06, 2018 3:33 am OK, something seems off.

Why are the 2080 depth 17 nodes so many more than the depth 19 with the 2070?

Maybe I am missing something.
Thanks.
Probably different nets used. But the speed should be fairly uniform with the latter test30 nets, so nps are probably fair to compare.
I used 11250, which I had on hand.
Ok, with this net I am getting:

info depth 16 seldepth 43 time 95727 nodes 2810327 score cp 25 hashfull 643 nps 29357,

so your is about 23% higher. Having about 28% more CUDA cores at 7% higher frequency. In total 37% expected speed-up. It seems memory speed and bandwidth also matter, as those are the same in 2070 and 80. Also, the price is 40% higher. I think the most ineffective would be RTX 2080 Ti, and the most effective a dual RTX 2070.
Is this argument correct?
Earlier Joshua stated - and you agreed - that:
"You don't double fp32 to get fp16, since tensor cores run fp16 they have their own unique jump. Around 100 tflops, but because they are hardcoded to one specific task it's not quite as optimized (doesn't support windograd currently) so you end up getting 2-3x instead of something like 8-10x."

If this is the case the speed of the card is determined by its tensor core performance, not its CUDA cores.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Laskos »

Werewolf wrote: Mon Dec 10, 2018 7:43 pm
Laskos wrote: Thu Dec 06, 2018 3:43 pm
Albert Silver wrote: Thu Dec 06, 2018 2:25 pm
Laskos wrote: Thu Dec 06, 2018 3:59 am
brianr wrote: Thu Dec 06, 2018 3:33 am OK, something seems off.

Why are the 2080 depth 17 nodes so many more than the depth 19 with the 2070?

Maybe I am missing something.
Thanks.
Probably different nets used. But the speed should be fairly uniform with the latter test30 nets, so nps are probably fair to compare.
I used 11250, which I had on hand.
Ok, with this net I am getting:

info depth 16 seldepth 43 time 95727 nodes 2810327 score cp 25 hashfull 643 nps 29357,

so your is about 23% higher. Having about 28% more CUDA cores at 7% higher frequency. In total 37% expected speed-up. It seems memory speed and bandwidth also matter, as those are the same in 2070 and 80. Also, the price is 40% higher. I think the most ineffective would be RTX 2080 Ti, and the most effective a dual RTX 2070.
Is this argument correct?
Earlier Joshua stated - and you agreed - that:
"You don't double fp32 to get fp16, since tensor cores run fp16 they have their own unique jump. Around 100 tflops, but because they are hardcoded to one specific task it's not quite as optimized (doesn't support windograd currently) so you end up getting 2-3x instead of something like 8-10x."

If this is the case the speed of the card is determined by its tensor core performance, not its CUDA cores.
Isn't the number of tensor cores proportional to the number of CUDA cores (1:8 or so) anyway? And the things there are a bit more complicated for Lc0 than just counting the number of cores and frequency, as can be seen from these results. Although the Lc0 speed of 2080 Ti, among many other things, is affected by its low frequency. I am pretty sure RTX 2080 Ti is not 1.8-2.0x as fast for Lc0 as RTX 2070, as the number of cores, TFLOPS and bandwidth would suggest, more like 1.5-1.6. Being twice as expensive. And one can lose easily 10% in raw speed of Lc0 by slightly misconfiguring the usage of the GPU for the specific task.