Next-Gen GPUs for LC0

Milos · Post by **Milos** » Tue Sep 29, 2020 2:08 pm

smatovic wrote: ↑Tue Sep 29, 2020 1:55 pm
Milos wrote: ↑Tue Sep 29, 2020 1:01 pm
smatovic wrote: ↑Tue Sep 29, 2020 12:48 pm
Milos wrote: ↑Tue Sep 29, 2020 12:39 pm
smatovic wrote: ↑Tue Sep 29, 2020 12:21 pm
Yes, Big Navi with RDNA 2 is supposed to catch up with Nvidia's high-end line,
at least for gaming, we saw already with RTX 20xx Super series a second launch
of the same architecture with better performance/price, remains open if such
a thing will happen after AMD's launch of its high-end series...sad that Intel
does not launch its Xe-HPG this year...would have been fun
Both OpenCL and ROCm are crap compared to CUDA and cudnn, so I see little point in mentioning Big Navi in the context of DL.
One needs 2x faster RDNA2 card in terms of TFLOPS to match performance of RTX card.
- some people prefer AMD over Nvidia
- the DX12 backend of Lc0 runs on AMD too?
- you miss the point, competition is good for us end users, if we have three gaming vendors competing, we profit by the performance/price competition

--
Srdja
Gamers profit for sure, ML scientist not at all. Who wants to buy AMD card that costs 1200$ and has worse performance for ML than NVIDIA card that costs 500$???
Hmm, why did DOE choose for its upcoming exa-FLOP systems Intel (Aurora), AMD (Frontier), AMD (El Capitan) and not IBM/Nvidia?

--
Srdja

Because they don't care about wasting tax payers money.

Laskos · Post by **Laskos** » Tue Sep 29, 2020 8:13 pm

mehmet123 wrote: ↑Mon Sep 28, 2020 5:58 pm Lc0 benchmarks with SV-3010 network (384x30)

Default settings (minibatch-size=256)
---------------------------------------------
GPU baseline optimized perf gain (%)
---------------------------------------------
Titan RTX.. 17443 - 20084 15.1
RTX 3090.. 26820 - 29767 11.0
A100........ 41785 - 48815 16.8

Minibatch-size=1024, all other settings default:
---------------------------------------------
GPU baseline optimized perf gain (%)
---------------------------------------------
Titan RTX.... 20211 - 23003 13.8
RTX 3090..... 33032 - 36924 11.8
A100.......... 52732 - 59134 12.1

(From Lc0 Discord)

Are there reasons 3xxx units need larger batches than 2xxx cards? The goal is the strength, not NPS, and with 2xxx cards the best batch size was 256.

mmt · Post by **mmt** » Fri Oct 02, 2020 6:58 pm

Tensorflow FP16 benchmarks for 3080 and 3090: https://www.pugetsystems.com/labs/hpc/R ... nary-1902/. Disappointing improvement for FP16 with CUDA 11. But they say:

"The current CUDA 11.0 does not have full support for the GA102 chips used in the RTX 3090 and RTX3080 (sm_86)."

" The surprising results were how much better the RTX20 GPUs performed with CUDA 11 and TensorFlow 1.15."

Next-Gen GPUs for LC0

Re: Next-Gen GPUs for LC0

Re: Next-Gen GPUs for LC0

Re: Next-Gen GPUs for LC0