Which GPU(s) Lc0 needs to draw SF 8 cores?

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Jouni
Posts: 3857
Joined: Wed Mar 08, 2006 8:15 pm
Full name: Jouni Uski

Which GPU(s) Lc0 needs to draw SF 8 cores?

Post by Jouni »

Any idea?
Jouni
smatovic
Posts: 3642
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Which GPU(s) Lc0 needs to draw SF 8 cores?

Post by smatovic »

On CCRL:
1 Stockfish 15 64-bit 8CPU 3742
5 Lc0 0.29.0 64-bit w753723 RTX2080 3679

Let's say RTX3080 and RTX4080 are each double the NPS or double the net size, let's say each doubling +~50 Elo, so my bet is on RTX4080 vs. 8 cores.

***edit***
Ah, according to Wikipedia:

RTX2080: 20 TFLOP FP16
RTX3080: 22 TFLOP FP16
RTX4080: 43 TFLOP FP16
RTX4090: 73 TFLOP FP16

So, better bet on RTX4090.

Or, alternatively, just buy an Apple ;)

--
Srdja
Vinvin
Posts: 5320
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: Which GPU(s) Lc0 needs to draw SF 8 cores?

Post by Vinvin »

smatovic wrote: Tue May 09, 2023 1:39 pm RTX2080: 20 TFLOP FP16
RTX3080: 22 TFLOP FP16
RTX4080: 43 TFLOP FP16
RTX4090: 73 TFLOP FP16

So, better bet on RTX4090.

Or, alternatively, just buy an Apple ;)
What's the speed of Apple's ?
chrisw
Posts: 4843
Joined: Tue Apr 03, 2012 4:28 pm
Location: Midi-Pyrénées
Full name: Christopher Whittington

Re: Which GPU(s) Lc0 needs to draw SF 8 cores?

Post by chrisw »

smatovic wrote: Tue May 09, 2023 1:39 pm On CCRL:
1 Stockfish 15 64-bit 8CPU 3742
5 Lc0 0.29.0 64-bit w753723 RTX2080 3679

Let's say RTX3080 and RTX4080 are each double the NPS or double the net size, let's say each doubling +~50 Elo, so my bet is on RTX4080 vs. 8 cores.

***edit***
Ah, according to Wikipedia:

RTX2080: 20 TFLOP FP16
RTX3080: 22 TFLOP FP16
RTX4080: 43 TFLOP FP16
RTX4090: 73 TFLOP FP16

So, better bet on RTX4090.

Or, alternatively, just buy an Apple ;)

--
Srdja
I’ve two otherwise identical systems one with 4080 and one with 4090. NNUE training code goes about 9% faster with the 4090
smatovic
Posts: 3642
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Which GPU(s) Lc0 needs to draw SF 8 cores?

Post by smatovic »

Vinvin wrote: Tue May 09, 2023 4:59 pm
smatovic wrote: Tue May 09, 2023 1:39 pm RTX2080: 20 TFLOP FP16
RTX3080: 22 TFLOP FP16
RTX4080: 43 TFLOP FP16
RTX4090: 73 TFLOP FP16

So, better bet on RTX4090.

Or, alternatively, just buy an Apple ;)
What's the speed of Apple's ?
I was just kidding, a lil pun to user Magnum....nps depend on net:
https://github.com/LeelaChessZero/lc0/pull/1693

According to Wikipedia:

M1 2.6 TFLOP FP32
M1 Max 10.4 TFLOP FP32
M1 Ultra 21 FTLOP FP32

https://en.wikipedia.org/wiki/Apple_M1#GPU

Dunno about FP16 on M1 with Lc0 Metal backend, and Nvidia RTX has TensorCores for boosting CNNs, like x2 in Lc0 or so.

--
Srdja
smatovic
Posts: 3642
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Which GPU(s) Lc0 needs to draw SF 8 cores?

Post by smatovic »

chrisw wrote: Tue May 09, 2023 5:04 pm I’ve two otherwise identical systems one with 4080 and one with 4090. NNUE training code goes about 9% faster with the 4090
Curious, how many labeled positions do you use to train one NNUE network, how long does training take on 4090?

--
Srdja
chrisw
Posts: 4843
Joined: Tue Apr 03, 2012 4:28 pm
Location: Midi-Pyrénées
Full name: Christopher Whittington

Re: Which GPU(s) Lc0 needs to draw SF 8 cores?

Post by chrisw »

smatovic wrote: Wed May 10, 2023 7:41 am
chrisw wrote: Tue May 09, 2023 5:04 pm I’ve two otherwise identical systems one with 4080 and one with 4090. NNUE training code goes about 9% faster with the 4090
Curious, how many labeled positions do you use to train one NNUE network, how long does training take on 4090?

--
Srdja
Depends, typically tens of billions, although you can often go round and round a smaller dataset a few times.
Training time, measured in standard epochs (one epoch = 100,000,000 positions) on the 4090 is about 8 minutes, depends on size of net. That’s my system, I suspect SF and OpenBench people are faster, using faster training code.

To put it in context, a 16 core AMD, 4090 and a few TB of SSD comes in around 4000 euros, and that system will produce a NNUE in maybe three days (importantly, it would take overnight to tell you whether the net will be maybe a good one), faster would be better, and two or more systems running at once better still. It brings dev times down to manageable proportions. Even so, dev time to make a net, including all the fails, is in the order of several weeks, months even.
Magnum
Posts: 195
Joined: Thu Feb 04, 2021 10:24 pm
Full name: Arnold Magnum

Re: Which GPU(s) Lc0 needs to draw SF 8 cores?

Post by Magnum »

Vinvin wrote: Tue May 09, 2023 4:59 pm
smatovic wrote: Tue May 09, 2023 1:39 pm RTX2080: 20 TFLOP FP16
RTX3080: 22 TFLOP FP16
RTX4080: 43 TFLOP FP16
RTX4090: 73 TFLOP FP16

So, better bet on RTX4090.

Or, alternatively, just buy an Apple ;)
What's the speed of Apple's ?
https://github.com/LeelaChessZero/lc0/issues/1562

https://github.com/LeelaChessZero/lc0/issues/1795

https://github.com/LeelaChessZero/lc0/issues/1800