Lc0 v0.24 dev DX backend for AMD Radeon GPU

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Damir
Posts: 2338
Joined: Mon Feb 11, 2008 2:53 pm
Location: Denmark
Full name: Damir Desevac

Lc0 v0.24 dev DX backend for AMD Radeon GPU

Post by Damir » Mon Feb 10, 2020 11:25 am

Hi the official Lc0 release with DX backend and LogitQ not available yet.But you can test it with this version:
https://gofile.io/?c=Ntb3Xv
http://www.filedropper.com/lc0-test-windows-gpu-dx22

https://www.chess2u.com/t13501-lc0-v0-2 ... -gpu#92392

For those who can not download from the 2 links above here is the alternative link:

https://www.dropbox.com/s/66fojbhpagc5j ... 2.zip?dl=0

requirements:
OS: updated Windows 10
AMD drivers: latest >= 20.1

DX is 3-4 x faster than opencl but slower than cudnn-fp16,so its recomended only for amd radeon GPU.But you can test is also on Nvidia GPU anyway.
here some benchmarks
Code:
Nvidia RTX 2060
===============
Network Net-Size OpenCL cudnn-fp32 dx-fp32 cudnn-fp16 dx-fp16
----------------------------------------------------------------------------------
T59 59611 128x10 11624 35470 31088 88009 51326
T30 32390 256x20 2225 4972 6459 15137 17391
T40 42850 256x20 1371 4571 318* 12780 14198
T60 61996 320x24 1555 3421 3317 8418 7669
SV-big-t40-1705 384x30 856 1854 104* 4955 4799
SV-huge-50 512x40 361 802 29* 2241 121*

* - poor performance due to a driver bug (hopefully will be fixed soon).


Nvidia RTX Titan
================
Network Net-Size cudnn-fp32 dx-fp32 cudnn-fp16 dx-fp16
------------------------------------------------------------------------
T59 59611 128x10 70266 50523 123414 73675
T30 32390 256x20 14438 14173 40452 42276
T40 42850 256x20 12932 13113 36753 38388
T60 61996 320x24 8137 7505 17472 18578
SV-big-t40-1705 384x30 4143 3933 11054 13086
SV-huge-50 512x40 1942 1918 4844 7015
Code:
AMD RX 5700XT (Navi)
====================
Network Net-Size OpenCL dx-fp32 dx-fp16
-------------------------------------------------------
T59 59611 128x10 12095 37845 55888
T30 32390 256x20 1505 5814 10198
T40 42850 256x20 900 4666 8041
T60 61996 320x24 1774 2874 5183
SV-big-t40-1705 384x30 . 1479 2801
SV-huge-50 512x40 . * 1080


AMD RX Vega VII
===============
Network Net-Size OpenCL dx-fp32 dx-fp16
-------------------------------------------------------
T59 59611 128x10 14889 46099 59721
T30 32390 256x20 1525 8055 12821
T40 42850 256x20 2754 6168 9156
T60 61996 320x24 2490 3722 6078
SV-big-t40-1705 384x30 1426 2129 3254
SV-huge-50 512x40 . 852 1323

Jhoravi
Posts: 255
Joined: Wed May 08, 2013 4:49 am

Re: Lc0 v0.24 dev DX backend for AMD Radeon GPU

Post by Jhoravi » Tue Feb 11, 2020 1:35 am

Thanks because my GPU happens to be AMD Radeon. BTW what does DX mean?

crem
Posts: 162
Joined: Wed May 23, 2018 7:29 pm

Re: Lc0 v0.24 dev DX backend for AMD Radeon GPU

Post by crem » Tue Feb 11, 2020 11:02 am

Jhoravi wrote:
Tue Feb 11, 2020 1:35 am
Thanks because my GPU happens to be AMD Radeon. BTW what does DX mean?
It's DirectX12.
It's good that you noticed, it was somehow missed. I guess we'll rename it to directx or maybe even directx12 before the release.

User avatar
Laskos
Posts: 10181
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Lc0 v0.24 dev DX backend for AMD Radeon GPU

Post by Laskos » Wed Feb 12, 2020 10:25 pm

crem wrote:
Tue Feb 11, 2020 11:02 am
Jhoravi wrote:
Tue Feb 11, 2020 1:35 am
Thanks because my GPU happens to be AMD Radeon. BTW what does DX mean?
It's DirectX12.
It's good that you noticed, it was somehow missed. I guess we'll rename it to directx or maybe even directx12 before the release.
Very interesting. On my RTX 2070 this backend is only 10% slower with T40 nets than using cudnn-fp16 backend, only 2% slower with T60 nets, and about 15% faster than cudnn-fp16 with the huge SV net 512x40b.

Geonerd
Posts: 69
Joined: Fri Mar 10, 2017 12:44 am

Re: Lc0 v0.24 dev DX backend for AMD Radeon GPU

Post by Geonerd » Wed Feb 12, 2020 10:29 pm

Fantastic news!

Collingwood
Posts: 14
Joined: Sat Nov 09, 2019 2:24 pm
Full name: Colin Wood

Re: Lc0 v0.24 dev DX backend for AMD Radeon GPU

Post by Collingwood » Wed Feb 12, 2020 11:02 pm

Does this mean there's an Lc0 version for any GPU you have?

User avatar
M ANSARI
Posts: 3426
Joined: Thu Mar 16, 2006 6:10 pm

Re: Lc0 v0.24 dev DX backend for AMD Radeon GPU

Post by M ANSARI » Thu Feb 13, 2020 6:12 am

Laskos wrote:
Wed Feb 12, 2020 10:25 pm
crem wrote:
Tue Feb 11, 2020 11:02 am
Jhoravi wrote:
Tue Feb 11, 2020 1:35 am
Thanks because my GPU happens to be AMD Radeon. BTW what does DX mean?
It's DirectX12.
It's good that you noticed, it was somehow missed. I guess we'll rename it to directx or maybe even directx12 before the release.
Very interesting. On my RTX 2070 this backend is only 10% slower with T40 nets than using cudnn-fp16 backend, only 2% slower with T60 nets, and about 15% faster than cudnn-fp16 with the huge SV net 512x40b.
Hmmm ... if that is true maybe that means that Lc0 cudnn-fp16 backend needs to be updated for the RTX cards!

User avatar
Laskos
Posts: 10181
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Lc0 v0.24 dev DX backend for AMD Radeon GPU

Post by Laskos » Thu Feb 13, 2020 9:04 am

M ANSARI wrote:
Thu Feb 13, 2020 6:12 am
Laskos wrote:
Wed Feb 12, 2020 10:25 pm
crem wrote:
Tue Feb 11, 2020 11:02 am
Jhoravi wrote:
Tue Feb 11, 2020 1:35 am
Thanks because my GPU happens to be AMD Radeon. BTW what does DX mean?
It's DirectX12.
It's good that you noticed, it was somehow missed. I guess we'll rename it to directx or maybe even directx12 before the release.
Very interesting. On my RTX 2070 this backend is only 10% slower with T40 nets than using cudnn-fp16 backend, only 2% slower with T60 nets, and about 15% faster than cudnn-fp16 with the huge SV net 512x40b.
Hmmm ... if that is true maybe that means that Lc0 cudnn-fp16 backend needs to be updated for the RTX cards!
It seems so. Here I checked at 30s + 0.3s the strength in a sanity check with 384x30b SV net 1538, one of hte strongest nets to LTC on a strong GPU.

Score of LargeNet_1538_dx vs LargeNet_1538_cudnn: 115 - 81 - 204 [0.542]
Elo difference: 29.6 +/- 23.8, LOS: 99.2 %, DrawRatio: 51.0 %

400 of 400 games finished.

Normalized Elo (pentanomial): 0.170 +/- 0.050 (1 SD)

DX performs better strength-wise too on these larger nets than cuDNN on an RTX 2070 GPU.

ankan
Posts: 77
Joined: Sun Apr 21, 2013 1:29 pm
Full name: Ankan Banerjee
Contact:

Re: Lc0 v0.24 dev DX backend for AMD Radeon GPU

Post by ankan » Thu Feb 13, 2020 11:48 am

Thanks Laskos for testing. It's good to know that the speed increase translates to improvement in playing strength.
The dx backend (that defaults to fp16 precision) uses a different algorithm for convolution (winograd) that scales better with bigger networks compared to what cudnn-fp16 uses (implicit_gemm). We will be adding that path to cudnn-fp16 backend too.

User avatar
Laskos
Posts: 10181
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Lc0 v0.24 dev DX backend for AMD Radeon GPU

Post by Laskos » Thu Feb 13, 2020 1:32 pm

ankan wrote:
Thu Feb 13, 2020 11:48 am
Thanks Laskos for testing. It's good to know that the speed increase translates to improvement in playing strength.
The dx backend (that defaults to fp16 precision) uses a different algorithm for convolution (winograd) that scales better with bigger networks compared to what cudnn-fp16 uses (implicit_gemm). We will be adding that path to cudnn-fp16 backend too.
Thanks for the info, I was thinking that cuDNN backend also uses fast Winograd convolutions, at least that was the talk more than year ago, and in fact I used that 10-12 years ago with image processing.

Post Reply