I stumbled upon this article on the new Nvidia RTX GPUs

willmorton · Post by **willmorton** » Thu Oct 22, 2020 9:36 am

It seems Nvidia canceled 3070 and 3080 20GB: https://www.techpowerup.com/273637/nvid ... 3070-16-gb

Laskos · Post by **Laskos** » Thu Oct 22, 2020 3:17 pm

willmorton wrote: ↑Thu Oct 22, 2020 9:36 am It seems Nvidia canceled 3070 and 3080 20GB: https://www.techpowerup.com/273637/nvid ... 3070-16-gb

It all started as a rumor, not an announcement, and it might end as rumor as well.

corres · Post by **corres** » Fri Oct 23, 2020 10:01 am

Laskos wrote: ↑Thu Oct 22, 2020 3:17 pm
willmorton wrote: ↑Thu Oct 22, 2020 9:36 am It seems Nvidia canceled 3070 and 3080 20GB: https://www.techpowerup.com/273637/nvid ... 3070-16-gb
It all started as a rumor, not an announcement, and it might end as rumor as well.

Who is that man who need an NVIDIA GPU with 20 GB RAM??
Only 6GB is more than enough - at least for the nowadays and the near future nets.

Laskos · Post by **Laskos** » Fri Oct 23, 2020 2:05 pm

corres wrote: ↑Fri Oct 23, 2020 10:01 am
Laskos wrote: ↑Thu Oct 22, 2020 3:17 pm
willmorton wrote: ↑Thu Oct 22, 2020 9:36 am It seems Nvidia canceled 3070 and 3080 20GB: https://www.techpowerup.com/273637/nvid ... 3070-16-gb
It all started as a rumor, not an announcement, and it might end as rumor as well.
Who is that man who need an NVIDIA GPU with 20 GB RAM??
Only 6GB is more than enough - at least for the nowadays and the near future nets.

Well, for Lc0 you don't need, but for some games you might need in the future, and for some compute GPU applications one often needs as much RAM as possible. Even if you care about Lc0 only, the December appearance of larger RAM GPU (with AMD arriving too) would possibly push the prices of the earlier lesser RAM 3080 and 3070 a bit down.

smatovic · Post by **smatovic** » Sat Oct 31, 2020 8:08 pm

Laskos wrote: ↑Wed Oct 21, 2020 1:12 pm
smatovic wrote: ↑Wed Oct 21, 2020 1:01 pm
Laskos wrote: ↑Wed Oct 21, 2020 12:56 pm Hmmm, the things changed a bit with the v26.3 engine and CUDA backend. The previous graph in the thread was for v26 engine and OpenCL backend. The new CUDA backend (it's the fastest one with RTX GPUs) shows the largest improvement with RTX 2xxx GPUs and smaller with RTX 3xxx. So that I saw a benchmark showing now RTX 3090 just 40% above RTX 2080Ti, meaning 3080 would stand at mere 10% or so above 2080Ti with v26.3 CUDA backend. Not that good, and I don't know whether there are foreseen backend improvements targeted towards 3xxx series.
10% would confirm the speedup via clock increase only with no benefits from architecture changes.

--
Srdja
Yeah, really disappointing. And not a better deal with the coming more affordable RTX 3070 which by extrapolation would be only 30% faster with Lc0 than my current 2070. Maybe a backend improvement targeting 3xxx series will be available sometime soon, I don't know, someone should ask Ankan.

Take a look at the FP32:FP16 (single precision:half precision) ratio of RTX 30xx series

https://en.wikipedia.org/wiki/List_of_N ... _30_series

Maybe this explains why Lc0 does not take off on these.

A RTX 2070 Super still seems to be a good shot compared to RTX 3070 in FP16

https://en.wikipedia.org/wiki/List_of_N ... _20_series

--
Srdja

Laskos · Post by **Laskos** » Tue Nov 03, 2020 9:57 am

smatovic wrote: ↑Sat Oct 31, 2020 8:08 pm
Laskos wrote: ↑Wed Oct 21, 2020 1:12 pm
smatovic wrote: ↑Wed Oct 21, 2020 1:01 pm
Laskos wrote: ↑Wed Oct 21, 2020 12:56 pm Hmmm, the things changed a bit with the v26.3 engine and CUDA backend. The previous graph in the thread was for v26 engine and OpenCL backend. The new CUDA backend (it's the fastest one with RTX GPUs) shows the largest improvement with RTX 2xxx GPUs and smaller with RTX 3xxx. So that I saw a benchmark showing now RTX 3090 just 40% above RTX 2080Ti, meaning 3080 would stand at mere 10% or so above 2080Ti with v26.3 CUDA backend. Not that good, and I don't know whether there are foreseen backend improvements targeted towards 3xxx series.
10% would confirm the speedup via clock increase only with no benefits from architecture changes.

--
Srdja
Yeah, really disappointing. And not a better deal with the coming more affordable RTX 3070 which by extrapolation would be only 30% faster with Lc0 than my current 2070. Maybe a backend improvement targeting 3xxx series will be available sometime soon, I don't know, someone should ask Ankan.
Take a look at the FP32:FP16 (single precision:half precision) ratio of RTX 30xx series

https://en.wikipedia.org/wiki/List_of_N ... _30_series

Maybe this explains why Lc0 does not take off on these.

A RTX 2070 Super still seems to be a good shot compared to RTX 3070 in FP16

https://en.wikipedia.org/wiki/List_of_N ... _20_series

--
Srdja

Yeah, I saw today some Lc0 benches on discord:

2080 Ti -----> 22.1 knps
3080 -----> 25.2 knps
3090 -----> 30.6 knps

My 2070 -----> 11.5 knps

As of now with Lc0 v26.3 CUDA, 3080 in only 14% faster than 2080 Ti. Disappointing. 3070 will be significantly below 2080 Ti, some 20%.

They seem to have capped FP16 performance of consumer 30xx series on purpose. Games run well, but ML suffers and is relegated to uncapped (and very expensive) A100 and some new Quadro RTX. A bit mean from NVIDIA, but they have monopoly in this field.

smatovic · Post by **smatovic** » Tue Nov 03, 2020 11:23 am

Maybe next gen Nvidia Hopper will be a bigger jump, hard to predict, I see a 4x speedup possible in theory (TSMC 5nm, unified FP32/INT32 cores in SM design), but it is also possible that Nvidia cripples again the consumer brand FP16 throughput, and if it turns out that TensorCores do not pay off in gaming, who knows, maybe they will remove them completely from consumer brand and focus on things like RayTracing, all up in the air.

--
Srdja

Laskos · Post by **Laskos** » Wed Nov 04, 2020 6:24 pm

Laskos wrote: ↑Tue Nov 03, 2020 9:57 am
smatovic wrote: ↑Sat Oct 31, 2020 8:08 pm
Laskos wrote: ↑Wed Oct 21, 2020 1:12 pm
smatovic wrote: ↑Wed Oct 21, 2020 1:01 pm
Laskos wrote: ↑Wed Oct 21, 2020 12:56 pm Hmmm, the things changed a bit with the v26.3 engine and CUDA backend. The previous graph in the thread was for v26 engine and OpenCL backend. The new CUDA backend (it's the fastest one with RTX GPUs) shows the largest improvement with RTX 2xxx GPUs and smaller with RTX 3xxx. So that I saw a benchmark showing now RTX 3090 just 40% above RTX 2080Ti, meaning 3080 would stand at mere 10% or so above 2080Ti with v26.3 CUDA backend. Not that good, and I don't know whether there are foreseen backend improvements targeted towards 3xxx series.
10% would confirm the speedup via clock increase only with no benefits from architecture changes.

--
Srdja
Yeah, really disappointing. And not a better deal with the coming more affordable RTX 3070 which by extrapolation would be only 30% faster with Lc0 than my current 2070. Maybe a backend improvement targeting 3xxx series will be available sometime soon, I don't know, someone should ask Ankan.
Take a look at the FP32:FP16 (single precision:half precision) ratio of RTX 30xx series

https://en.wikipedia.org/wiki/List_of_N ... _30_series

Maybe this explains why Lc0 does not take off on these.

A RTX 2070 Super still seems to be a good shot compared to RTX 3070 in FP16

https://en.wikipedia.org/wiki/List_of_N ... _20_series

--
Srdja
Yeah, I saw today some Lc0 benches on discord:

2080 Ti -----> 22.1 knps
3080 -----> 25.2 knps
3090 -----> 30.6 knps

My 2070 -----> 11.5 knps

As of now with Lc0 v26.3 CUDA, 3080 in only 14% faster than 2080 Ti. Disappointing. 3070 will be significantly below 2080 Ti, some 20%.

They seem to have capped FP16 performance of consumer 30xx series on purpose. Games run well, but ML suffers and is relegated to uncapped (and very expensive) A100 and some new Quadro RTX. A bit mean from NVIDIA, but they have monopoly in this field.

RTX 3070 benchmark came

3070 -----> 16.8 knps
2080 Ti -----> 21.3 knps
3080 -----> 25.2 knps
3090 -----> 30.6 knps

2080 Super -----> 15.5 knps
2070 Super -----> 13.2 knps
My 2070 -----> 11.5 knps

Pretty disappointing, less than 30% faster than 2070 Super. In games 3070 is on par with 2080 Ti.

MMarco · Post by **MMarco** » Mon Nov 09, 2020 8:29 pm

Laskos wrote: 3070 -----> 16.8 knps
2080 Ti -----> 21.3 knps
3080 -----> 25.2 knps
3090 -----> 30.6 knps

2080 Super -----> 15.5 knps
2070 Super -----> 13.2 knps
My 2070 -----> 11.5 knps

Pretty disappointing, less than 30% faster than 2070 Super. In games 3070 is on par with 2080 Ti.

Maybe the 3070 will improve with overclocking (I would think that at 21.3 knps the 2080 ti was overclocked - but I could be wrong).

For your information, I have a 3080 which I slightly overclocked by 100Mhz. According to "backendbench --clippy", I should use minibatch-size=204 with the card. When I benchmark with mbs=204 (10s, all positions, J92-190) I get 24.9 knps. However with mbs=544 (twice the number of tensor cores of the card) I now get 29.8 knps, a 20% increase though it is not clear that Lc0 will play better with the larger batch size ( see: https://github.com/LeelaChessZero/lc0/pull/1458 ). Maybe it will at long control.

I saw that Nvidia released new drivers today, with improvements on the auto-tuning feature. I'll report back when I'll have try them.

Laskos · Post by **Laskos** » Tue Nov 10, 2020 1:49 pm

MMarco wrote: ↑Mon Nov 09, 2020 8:29 pm
Laskos wrote: 3070 -----> 16.8 knps
2080 Ti -----> 21.3 knps
3080 -----> 25.2 knps
3090 -----> 30.6 knps

2080 Super -----> 15.5 knps
2070 Super -----> 13.2 knps
My 2070 -----> 11.5 knps

Pretty disappointing, less than 30% faster than 2070 Super. In games 3070 is on par with 2080 Ti.
Maybe the 3070 will improve with overclocking (I would think that at 21.3 knps the 2080 ti was overclocked - but I could be wrong).

For your information, I have a 3080 which I slightly overclocked by 100Mhz. According to "backendbench --clippy", I should use minibatch-size=204 with the card. When I benchmark with mbs=204 (10s, all positions, J92-190) I get 24.9 knps. However with mbs=544 (twice the number of tensor cores of the card) I now get 29.8 knps, a 20% increase though it is not clear that Lc0 will play better with the larger batch size ( see: https://github.com/LeelaChessZero/lc0/pull/1458 ). Maybe it will at long control.

I saw that Nvidia released new drivers today, with improvements on the auto-tuning feature. I'll report back when I'll have try them.

Thanks, please keep us updated here. Quite a difference in NPS with larger minibatch-size, not clear strength-wise. On my 2070 GPU I don't get such speed boost. I am using 240. I too suspected that the benched 2080 Ti was OC-ed.

I stumbled upon this article on the new Nvidia RTX GPUs

Re: I stumbled upon this article on the new Nvidia RTX GPUs

Re: I stumbled upon this article on the new Nvidia RTX GPUs

Re: I stumbled upon this article on the new Nvidia RTX GPUs

Re: I stumbled upon this article on the new Nvidia RTX GPUs

Re: I stumbled upon this article on the new Nvidia RTX GPUs

Re: I stumbled upon this article on the new Nvidia RTX GPUs

Re: I stumbled upon this article on the new Nvidia RTX GPUs

Re: I stumbled upon this article on the new Nvidia RTX GPUs

Re: I stumbled upon this article on the new Nvidia RTX GPUs

Re: I stumbled upon this article on the new Nvidia RTX GPUs