Re: I stumbled upon this article on the new Nvidia RTX GPUs
Posted: Fri Sep 11, 2020 4:33 am
Hosted by Your Move Chess & Games - chessusa.com
Then I suggest to reread Milos' argument:Werewolf wrote: ↑Fri Sep 11, 2020 6:58 amAnother question I’d like to know: if (if!) Lc0 uses Tensor cores, why does the $10,000 A100 barely outperform the Titan? I saw benchmarks specifically for Lc0 yesterday confirming this.
I understand there’s some kind of efficiency issue with the tensor cores (Milos’ argument?), but the A100 is crammed full of them and should have won easily
I am not aware of your benchmark, if CUDA or CUDNN or OpenCL or DX12 backend, boost-frequency, cooling, this all makes a difference.
Sorry, I am not into the concrete implementation of LC0's CNNs, and I have noWerewolf wrote: ↑Fri Sep 11, 2020 2:45 pmI only got about 30 seconds to look at the benchmark, but the difference between the two cards seemed to be within 20%, so that does fit. Unfortunately I don’t know backend details.
Which cores are you claiming Lc0 is running on with each card?
Tensor for both or just the Titan?
Re-reading the Milos post above he does point out a small gain in the CUDA cores. I’m wondering if this is newly relevant again, since the 3090 has more CUDA cores than previously expected - more than the A100 even - would it be faster to use them?
3x3 convolutions are kind of a relic of imagine processing. Since state-of-the-art DNNs at the time of AlphaGo development were mainly Resnets used for image processing they naturally had 3x3 convolutions in input layer that were taking care of RGB pixels in images.smatovic wrote: ↑Fri Sep 11, 2020 9:02 amThen I suggest to reread Milos' argument:
viewtopic.php?f=2&t=72320&hilit=gpu+rum ... 20#p846617
Lc0 like AlphaGo and Leela do use 3x3 convolutions in their CNN design, I guess this is an descendant from the game of Go.
Others have pointed out that 4x4 convolutions would make more sense for the game of Chess.
Milos already pointed out in an older thread that Lc0 with 3x3 CNNs uses only ~30% of the 4x4 TensorCores present in Volta and Turing (RTX 20xx):
http://www.talkchess.com/forum3/viewtop ... 1&start=40
Now we seem to have 8x8 TensorCores in Ampere A100 (not sure about RTX 30xx series) and Lc0 can not utilize these until they may change ther CNN design?