3070 didn't yet appeare in the market.Laskos wrote: ↑Mon Sep 28, 2020 9:47 amThat's good news. It means that 3080 is about 190% of 2080 as Lc0 speed goes, so represents a very good deal. Do you know the 3070 estimate? To me 3080 is a bit too hungry Watt wise for my PC.Werewolf wrote: ↑Sun Sep 27, 2020 8:24 pm Seems like there is actual data appearing now
Very early results by people quoting ankan suggest if we take the 2080 Ti as a reference point:
3080 is about 1.4x faster
3090 is about 1.6x faster
A100 is about 2.7x faster
Seems like the batch size makes a big difference
That last result contradicts what I saw earlier. Anyway, if this forum had a means to insert an image I'd post the screenshot from the discord...
Next-Gen GPUs for LC0
Moderators: hgm, Rebel, chrisw
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: Next-Gen GPUs for LC0
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Next-Gen GPUs for LC0
Do you have an estimate of how 3070 compares to 3080 in Lc0 case and similar? Nominally, 3080 should be about 45% faster, but practically this varies with application, from 20% faster to 50% faster.Milos wrote: ↑Mon Sep 28, 2020 10:25 am3070 didn't yet appeare in the market.Laskos wrote: ↑Mon Sep 28, 2020 9:47 amThat's good news. It means that 3080 is about 190% of 2080 as Lc0 speed goes, so represents a very good deal. Do you know the 3070 estimate? To me 3080 is a bit too hungry Watt wise for my PC.Werewolf wrote: ↑Sun Sep 27, 2020 8:24 pm Seems like there is actual data appearing now
Very early results by people quoting ankan suggest if we take the 2080 Ti as a reference point:
3080 is about 1.4x faster
3090 is about 1.6x faster
A100 is about 2.7x faster
Seems like the batch size makes a big difference
That last result contradicts what I saw earlier. Anyway, if this forum had a means to insert an image I'd post the screenshot from the discord...
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: Next-Gen GPUs for LC0
Comparing just CUDA cores IMO gives very reliable estimate for Lc0 since it mainly uses CUDA cores. So 8704/5888 = 1.48, i.e. 3080 would be 45-50% faster.Laskos wrote: ↑Mon Sep 28, 2020 11:29 amDo you have an estimate of how 3070 compares to 3080 in Lc0 case and similar? Nominally, 3080 should be about 45% faster, but practically this varies with application, from 20% faster to 50% faster.Milos wrote: ↑Mon Sep 28, 2020 10:25 am3070 didn't yet appeare in the market.Laskos wrote: ↑Mon Sep 28, 2020 9:47 amThat's good news. It means that 3080 is about 190% of 2080 as Lc0 speed goes, so represents a very good deal. Do you know the 3070 estimate? To me 3080 is a bit too hungry Watt wise for my PC.Werewolf wrote: ↑Sun Sep 27, 2020 8:24 pm Seems like there is actual data appearing now
Very early results by people quoting ankan suggest if we take the 2080 Ti as a reference point:
3080 is about 1.4x faster
3090 is about 1.6x faster
A100 is about 2.7x faster
Seems like the batch size makes a big difference
That last result contradicts what I saw earlier. Anyway, if this forum had a means to insert an image I'd post the screenshot from the discord...
-
- Posts: 550
- Joined: Tue Nov 19, 2019 8:48 pm
- Full name: Alayan Feh
Re: Next-Gen GPUs for LC0
Nvidia changed the definition of CUDA cores. You need a workload that fully saturates the FP32 units to get close (but not quite) to the effect the same number of CUDA cores would have had in Turing.
1 CUDA in Turing : 1xFP32 unit + 1xINT32 unit able to execute concurrently
2 CUDA in Ampere : 1xFP32 unit + 1x(INT32 OR FP32) unit able to execute concurrently
But more importantly, isn't Leela supposed to use FP16 operations with most of the relevant FP16 compute from RTX cards coming from tensor cores and not from the 2xFP16 mode of FP32 units ?
1 CUDA in Turing : 1xFP32 unit + 1xINT32 unit able to execute concurrently
2 CUDA in Ampere : 1xFP32 unit + 1x(INT32 OR FP32) unit able to execute concurrently
But more importantly, isn't Leela supposed to use FP16 operations with most of the relevant FP16 compute from RTX cards coming from tensor cores and not from the 2xFP16 mode of FP32 units ?
Last edited by Alayan on Mon Sep 28, 2020 4:43 pm, edited 1 time in total.
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: Next-Gen GPUs for LC0
Leela uses mainly FP16 multipliers from CUDA cores. I am really not aware that this definition changed. Tensor cores are only used for 3x3 convolutions in the input layer (rather inefficiently). You can't use Tensor cores for 1x1 convolutions (which is great majority of operations in Lc0 DNN inference), i.e. you can, but it is grossly inefficient.Alayan wrote: ↑Mon Sep 28, 2020 4:17 pm Nvidia changed the definition of CUDA cores. You need a workload that fully saturates the FP32 units to get close (but not quite) to the effect the same number of CUDA cores would have had in Turing.
1 CUDA in Turing : 1xFP32 unit + 1xINT32 unit able to execute concurrently
2 CUDA in Turing : 1xFP32 unit + 1x(INT32 OR FP32) unit able to execute concurrently
But more importantly, isn't Leela supposed to use FP16 operations with most of the relevant FP16 compute from RTX cards coming from tensor cores and not from the 2xFP16 mode of FP32 units ?
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Next-Gen GPUs for LC0
In this case 3070 is not that a great deal. It will be weaker than 2080Ti with Chess and Go, and I already see 2080Ti second hand as cheap as $600. Might go even lower, because 3080 rocks.Milos wrote: ↑Mon Sep 28, 2020 3:29 pmComparing just CUDA cores IMO gives very reliable estimate for Lc0 since it mainly uses CUDA cores. So 8704/5888 = 1.48, i.e. 3080 would be 45-50% faster.Laskos wrote: ↑Mon Sep 28, 2020 11:29 amDo you have an estimate of how 3070 compares to 3080 in Lc0 case and similar? Nominally, 3080 should be about 45% faster, but practically this varies with application, from 20% faster to 50% faster.Milos wrote: ↑Mon Sep 28, 2020 10:25 am3070 didn't yet appeare in the market.Laskos wrote: ↑Mon Sep 28, 2020 9:47 amThat's good news. It means that 3080 is about 190% of 2080 as Lc0 speed goes, so represents a very good deal. Do you know the 3070 estimate? To me 3080 is a bit too hungry Watt wise for my PC.Werewolf wrote: ↑Sun Sep 27, 2020 8:24 pm Seems like there is actual data appearing now
Very early results by people quoting ankan suggest if we take the 2080 Ti as a reference point:
3080 is about 1.4x faster
3090 is about 1.6x faster
A100 is about 2.7x faster
Seems like the batch size makes a big difference
That last result contradicts what I saw earlier. Anyway, if this forum had a means to insert an image I'd post the screenshot from the discord...
-
- Posts: 671
- Joined: Sun Jan 26, 2020 10:38 pm
- Location: Turkey
- Full name: Mehmet Karaman
Re: Next-Gen GPUs for LC0
Why RTX 3070 will be weaker than RTX 2080 Tİ?
RTX 3070 has 5888 cuda cores but RTX 2080 Ti has 4352 cuda cores.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Next-Gen GPUs for LC0
It results from the data Werewolf and Milos provided: 3080 is 1.4x as strong as 2080Ti and 1.48x as strong as 3070. Results 3070 is weaker than 2080Ti with Lc0.
-
- Posts: 671
- Joined: Sun Jan 26, 2020 10:38 pm
- Location: Turkey
- Full name: Mehmet Karaman
Re: Next-Gen GPUs for LC0
Lc0 benchmarks with SV-3010 network (384x30)
Default settings (minibatch-size=256)
---------------------------------------------
GPU baseline optimized perf gain (%)
---------------------------------------------
Titan RTX.. 17443 - 20084 15.1
RTX 3090.. 26820 - 29767 11.0
A100........ 41785 - 48815 16.8
Minibatch-size=1024, all other settings default:
---------------------------------------------
GPU baseline optimized perf gain (%)
---------------------------------------------
Titan RTX.... 20211 - 23003 13.8
RTX 3090..... 33032 - 36924 11.8
A100.......... 52732 - 59134 12.1
(From Lc0 Discord)
Default settings (minibatch-size=256)
---------------------------------------------
GPU baseline optimized perf gain (%)
---------------------------------------------
Titan RTX.. 17443 - 20084 15.1
RTX 3090.. 26820 - 29767 11.0
A100........ 41785 - 48815 16.8
Minibatch-size=1024, all other settings default:
---------------------------------------------
GPU baseline optimized perf gain (%)
---------------------------------------------
Titan RTX.... 20211 - 23003 13.8
RTX 3090..... 33032 - 36924 11.8
A100.......... 52732 - 59134 12.1
(From Lc0 Discord)
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Next-Gen GPUs for LC0
What is "baseline" and "optimized"?mehmet123 wrote: ↑Mon Sep 28, 2020 5:58 pm Lc0 benchmarks with SV-3010 network (384x30)
Default settings (minibatch-size=256)
---------------------------------------------
GPU baseline optimized perf gain (%)
---------------------------------------------
Titan RTX.. 17443 - 20084 15.1
RTX 3090.. 26820 - 29767 11.0
A100........ 41785 - 48815 16.8
Minibatch-size=1024, all other settings default:
---------------------------------------------
GPU baseline optimized perf gain (%)
---------------------------------------------
Titan RTX.... 20211 - 23003 13.8
RTX 3090..... 33032 - 36924 11.8
A100.......... 52732 - 59134 12.1
(From Lc0 Discord)