Using LC0 with one or two GPUs - a guide

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
smatovic
Posts: 931
Joined: Wed Mar 10, 2010 9:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic
Contact:

Using LC0 with one or two GPUs - a guide

Post by smatovic » Sat Mar 30, 2019 9:48 am

Cos it came up repeatedly, here a short guide what to consider when using one or
two gpus with LC0.

Hardware:

- CPU or GPU?
LC0 uses neural networks for evaluation of chess positions, these are
commpute and memory intensive, ideally for being accelerated by an gpu.
To add an discrete gpu to your PC you will need an free PCI Express slot,
and a power supply unit that can serve the additional power consumption.
Note that gpus need commonly two free slots in your PC case.

- AMD or Nvidia?
LC0 is able to run on CPUs and on GPUs via OpenCL and on Nvidia GPUs via CUDA
and cuDNN. Currently the Nvidia CUDA and cuDNN backend outperforms AMD OpenCL
backend by a wide margin. Of course this may change in the future.
See these benchs for some numbers:

https://www.phoronix.com/scan.php?page= ... Benchmarks
https://www.phoronix.com/scan.php?page= ... inux&num=9

- Nvidia RTX or GTX?
The Nvidia RTX series has TensorCores onboard, which accelerates the neural
network of LC0 significantly, of course for an higher price.

- Two or one GPUs?
An additional gpu gives est. about +50 Elo. You can mix different gpus with LC0.

- Thermal issues
A highend gpu produces about 300 Watts thermal power under load, so you may
have to add some additional fans in your PC case for cooling. An alternative
is a water cooling solution. See also:

viewtopic.php?f=2&t=70097

Software:

- FP16 (half precision) or FP32 (single precision)?
Neural network inference is currently done via floating point computation.
Some gpus offer higher instruction throughput with lower precision, so on
these devices FP16 (half precision) can pay off. Nvidia RTX series for
example offer FP16 optimized computation in LC0.

- Which parameters to choose?
LC0 has some tuneable params to get more nps, for example backend type, number
of threads, nncache or batch size. Consider this sheet for different params
and nps:

https://docs.google.com/spreadsheets/d/ ... CjBILe6uA/

- Which network to choose?
LC0 is still under development, and network design may change, so there are
a bunch of different networks which give different nps and Elo. Here an
overview of different networks LC0 offers:

http://www.lczero.org/networks/

--
Srdja

Krzysztof Grzelak
Posts: 816
Joined: Tue Jul 15, 2014 10:47 am

Re: Using LC0 with one or two GPUs - a guide

Post by Krzysztof Grzelak » Sat Mar 30, 2019 10:16 am

You described a very interesting thing, but unfortunately there was one thing missing. He writes a lot about the GPU but you do not write about the CPU at all.

smatovic
Posts: 931
Joined: Wed Mar 10, 2010 9:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic
Contact:

Re: Using LC0 with one or two GPUs - a guide

Post by smatovic » Sat Mar 30, 2019 10:37 am

Krzysztof Grzelak wrote:
Sat Mar 30, 2019 10:16 am
You described a very interesting thing, but unfortunately there was one thing missing. He writes a lot about the GPU but you do not write about the CPU at all.
As for running LC0 on CPU, i have too little experience to write on this topic, but maybe someone else can help out.

Considerung the CPU for running LC0 on GPU:

- threads per GPU
You may want to run two threads per GPU to be able to utilize it fully, so consider two cpu cores per gpu.

- the higher the clock rate the better
I have no numbers for comparison, but the higher the cpu clocks are, the faster the kernel calls should be.

--
Srdja

crem
Posts: 135
Joined: Wed May 23, 2018 7:29 pm

Re: Using LC0 with one or two GPUs - a guide

Post by crem » Sat Mar 30, 2019 11:29 am

I'd also add the following points which are often brought up (sorted from most surprising to least surprising):

- No SLI bridge is needed (or useful at all) when using multiple GPUs.
- For RTX cards, default Leela configuration is much slower because cudnn backend is default instead of cudnn-fp16.
- Multiple GPUs also don't automatically work, one has to pass parameters to Lc0 to enable that.

User avatar
Laskos
Posts: 9483
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Using LC0 with one or two GPUs - a guide

Post by Laskos » Sat Mar 30, 2019 12:24 pm

crem wrote:
Sat Mar 30, 2019 11:29 am
I'd also add the following points which are often brought up (sorted from most surprising to least surprising):

- No SLI bridge is needed (or useful at all) when using multiple GPUs.
- For RTX cards, default Leela configuration is much slower because cudnn backend is default instead of cudnn-fp16.
- Multiple GPUs also don't automatically work, one has to pass parameters to Lc0 to enable that.
How is the scaling with 2 GPUs? First, NPS scaling. Second, effective speed-up scaling. I saw bad scaling even NPS-wise in TCEC, and the effective speed-up must be even worse. How many CPU threads are best for using 2 RTX GPUs? I plan in some future to have a new system with 8-16 core CPU, and am not sure whether a second GPU is worth having. I have a well tuned and cooled RTX 2070 which runs fast and flawlessly on very heavy and long loads, and if the scaling is good, would go for an additional identical second GPU.
Also, is there a perspective for Lc0 engine handling speeds above 80-100k NPS? It is a serious bottleneck for me with smaller nets.

Hugo
Posts: 775
Joined: Tue Dec 01, 2009 10:10 am

Re: Using LC0 with one or two GPUs - a guide

Post by Hugo » Sat Mar 30, 2019 2:59 pm

Hi all

a few days ago, I installed additionally to my RTX 2070 a RTX 2060.
I have benchmaked the system with one of the latest 40 networks.
my benchmark is using go nodes 5000000 in console mode.
single RTX 2070 was about 35.000 nps
single RTX 2060 was about 30.000 nps

and both together (backend=roundrobin) was about 65.000 nps and still increasing when using it in GUI it was far over 70.000 nps after two minutes.
In real game, its not always full load on both GPUs. Its more something between 80% and 98%.

regards, C.K.

Laskos wrote:
Sat Mar 30, 2019 12:24 pm
crem wrote:
Sat Mar 30, 2019 11:29 am
I'd also add the following points which are often brought up (sorted from most surprising to least surprising):

- No SLI bridge is needed (or useful at all) when using multiple GPUs.
- For RTX cards, default Leela configuration is much slower because cudnn backend is default instead of cudnn-fp16.
- Multiple GPUs also don't automatically work, one has to pass parameters to Lc0 to enable that.
How is the scaling with 2 GPUs? First, NPS scaling. Second, effective speed-up scaling. I saw bad scaling even NPS-wise in TCEC, and the effective speed-up must be even worse. How many CPU threads are best for using 2 RTX GPUs? I plan in some future to have a new system with 8-16 core CPU, and am not sure whether a second GPU is worth having. I have a well tuned and cooled RTX 2070 which runs fast and flawlessly on very heavy and long loads, and if the scaling is good, would go for an additional identical second GPU.
Also, is there a perspective for Lc0 engine handling speeds above 80-100k NPS? It is a serious bottleneck for me with smaller nets.

User avatar
Laskos
Posts: 9483
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Using LC0 with one or two GPUs - a guide

Post by Laskos » Sat Mar 30, 2019 5:18 pm

Hugo wrote:
Sat Mar 30, 2019 2:59 pm
Hi all

a few days ago, I installed additionally to my RTX 2070 a RTX 2060.
I have benchmaked the system with one of the latest 40 networks.
my benchmark is using go nodes 5000000 in console mode.
single RTX 2070 was about 35.000 nps
single RTX 2060 was about 30.000 nps

and both together (backend=roundrobin) was about 65.000 nps and still increasing when using it in GUI it was far over 70.000 nps after two minutes.
In real game, its not always full load on both GPUs. Its more something between 80% and 98%.

regards, C.K.

Laskos wrote:
Sat Mar 30, 2019 12:24 pm
crem wrote:
Sat Mar 30, 2019 11:29 am
I'd also add the following points which are often brought up (sorted from most surprising to least surprising):

- No SLI bridge is needed (or useful at all) when using multiple GPUs.
- For RTX cards, default Leela configuration is much slower because cudnn backend is default instead of cudnn-fp16.
- Multiple GPUs also don't automatically work, one has to pass parameters to Lc0 to enable that.
How is the scaling with 2 GPUs? First, NPS scaling. Second, effective speed-up scaling. I saw bad scaling even NPS-wise in TCEC, and the effective speed-up must be even worse. How many CPU threads are best for using 2 RTX GPUs? I plan in some future to have a new system with 8-16 core CPU, and am not sure whether a second GPU is worth having. I have a well tuned and cooled RTX 2070 which runs fast and flawlessly on very heavy and long loads, and if the scaling is good, would go for an additional identical second GPU.
Also, is there a perspective for Lc0 engine handling speeds above 80-100k NPS? It is a serious bottleneck for me with smaller nets.
So what the heck they were doing at TCEC? They had 55-65k NPS with 2080ti + 2080 in the openings and middlegames. Their temperatures were pretty high, and I guess the arrangement of air flow was not optimal, maybe they didn't even have any case fan.

Krzysztof Grzelak
Posts: 816
Joined: Tue Jul 15, 2014 10:47 am

Re: Using LC0 with one or two GPUs - a guide

Post by Krzysztof Grzelak » Mon Apr 01, 2019 3:01 pm

Thank you very much for the information smatovic. Request to Laskos, Hugo, crem. Please, focus a little bit on cpu, not on gpu. I understand that you are using the engine under gpu. But most people will not go straight to the store to buy a graphics card for a few hundred dollars.

Post Reply