My non-OC RTX 2070 is very fast with Lc0

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Laskos »

Werewolf wrote: Mon Dec 03, 2018 2:48 pm
Laskos wrote: Mon Nov 19, 2018 3:00 pm Just got and installed it. With one of the latest nets, Lc0 v19 rc5 engine:

UCI commands:

setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 512
setoption name NNCacheSize value 2000000
go

info depth 19 seldepth 52 time 41681 nodes 984582 score cp 27 hashfull 274 nps 23621
info depth 21 seldepth 53 time 69999 nodes 2032430 score cp 26 hashfull 431 nps 29035
info depth 22 seldepth 54 time 93937 nodes 2845554 score cp 26 hashfull 570 nps 30292

Didn't quite expect such speeds, would have been happy even with 18,000-20,000.
Some 5-6 fold improvement over GTX 1060.

My power supply is not that strong (500W), hope it stays well.

I'm not doubting your results Laskos, but I'm struggling to understand them.
Your 1060 card produced about 4.4 TFLOPS FP32. Your 2070 card is around 7.5 TFLOPS FP32 which with the new ability to use FP16 means about 15 TFLOPS.

That should make your 2070 just under 4x faster than your 1060. Instead you report 5-6x improvement.

Happy for you...but confused.
Joshua explained, and it is also explained here:
http://talkchess.com/forum3/viewtopic.p ... 8&start=44

The speed-up is at least 5 compared to GTX 1060 in almost any condition, and larger than 6 with both in "ideal" conditions.
Test net is ID11261
With GTX 1060 6GB, in ideal settings, I was never getting more than 5100 NPS with it from starting position, but with my RTX 2070, just now setting these values

setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 512
setoption name NNCacheSize value 5000000
setoption name WeightsFile value .\weights_11261.txt.gz

I am getting from initial position:

info depth 19 seldepth 55 time 243657 nodes 8144964 score cp 29 hashfull 599 nps 33427

Which is 6.5x times the maximum speed from initial position for GTX 1060 6GB I got with correct settings.

But I rarely go to 4 min/move in gameplay, only in analysis. Anyway, setting the correct parameters, my RTX 2070 (non-OCed) is about 6 times faster than my GTX 1060 6GB with correct parameters in almost all time and net ID conditions (at least with these 20x256 nets).

I also checked for possible throttling, in 12 hours at full load, temperature is at max 68C, no any throttling in GPU-Z and no any problem with the power supply (a 500W one, but it seems to not complain).

As I said, I myself didn't expect these speeds from RTX 2070, I was happy if 20,000 NPS is achieved in correct conditions. So, I felt compelled to open a thread here.
Werewolf
Posts: 1795
Joined: Thu Sep 18, 2008 10:24 pm

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Werewolf »

Thanks both - I forgot we've aired this before.

Is there a list of nps with each of the newer cards? If speed of Lc0 is determined by the number of tensor cores (multiplied by the speed they run at)
the 2080 Ti should be a bit below 2x the speed of a 2070, but I read somewhere its not that high and not worth the money.


Similarly, if Laskos is reporting nps not far behind the 2080...odd.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Laskos »

Laskos wrote: Tue Dec 04, 2018 7:04 am
Werewolf wrote: Mon Dec 03, 2018 2:48 pm
Laskos wrote: Mon Nov 19, 2018 3:00 pm Just got and installed it. With one of the latest nets, Lc0 v19 rc5 engine:

UCI commands:

setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 512
setoption name NNCacheSize value 2000000
go

info depth 19 seldepth 52 time 41681 nodes 984582 score cp 27 hashfull 274 nps 23621
info depth 21 seldepth 53 time 69999 nodes 2032430 score cp 26 hashfull 431 nps 29035
info depth 22 seldepth 54 time 93937 nodes 2845554 score cp 26 hashfull 570 nps 30292

Didn't quite expect such speeds, would have been happy even with 18,000-20,000.
Some 5-6 fold improvement over GTX 1060.

My power supply is not that strong (500W), hope it stays well.

I'm not doubting your results Laskos, but I'm struggling to understand them.
Your 1060 card produced about 4.4 TFLOPS FP32. Your 2070 card is around 7.5 TFLOPS FP32 which with the new ability to use FP16 means about 15 TFLOPS.

That should make your 2070 just under 4x faster than your 1060. Instead you report 5-6x improvement.

Happy for you...but confused.
Joshua explained, and it is also explained here:
http://talkchess.com/forum3/viewtopic.p ... 8&start=44

The speed-up is at least 5 compared to GTX 1060 in almost any condition, and larger than 6 with both in "ideal" conditions.
Test net is ID11261
With GTX 1060 6GB, in ideal settings, I was never getting more than 5100 NPS with it from starting position, but with my RTX 2070, just now setting these values

setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 512
setoption name NNCacheSize value 5000000
setoption name WeightsFile value .\weights_11261.txt.gz

I am getting from initial position:

info depth 19 seldepth 55 time 243657 nodes 8144964 score cp 29 hashfull 599 nps 33427

Which is 6.5x times the maximum speed from initial position for GTX 1060 6GB I got with correct settings.

But I rarely go to 4 min/move in gameplay, only in analysis. Anyway, setting the correct parameters, my RTX 2070 (non-OCed) is about 6 times faster than my GTX 1060 6GB with correct parameters in almost all time and net ID conditions (at least with these 20x256 nets).

I also checked for possible throttling, in 12 hours at full load, temperature is at max 68C, no any throttling in GPU-Z and no any problem with the power supply (a 500W one, but it seems to not complain).

As I said, I myself didn't expect these speeds from RTX 2070, I was happy if 20,000 NPS is achieved in correct conditions. So, I felt compelled to open a thread here.
At anything longer than 1s/move, MinibatchSize=1024 surpasses 512 value NPS-wise, by 1-3%. Anybody knows if it translates directly to strength?

With these settings:

setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 1024
setoption name NNCacheSize value 5000000
setoption name WeightsFile value .\weights_11261.txt.gz

I get as NPS:

info depth 19 seldepth 56 time 254408 nodes 8711351 score cp 29 hashfull 631 nps 34241

On the other hand, MinibatchSize=256 seems to surpass NPS-wise 512 value at 0.1s/move by some 6%. I will try to see in ultra-fast games if there is an Elo value to it.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Milos »

Laskos wrote: Tue Dec 04, 2018 1:09 pm At anything longer than 1s/move, MinibatchSize=1024 surpasses 512 value NPS-wise, by 1-3%. Anybody knows if it translates directly to strength?
I would doubt if you'd gain anything. You do 2x more speculation with only 1-3% extra NPS. To have strength gain you'd need efficiency drop of less then 3% when going from 512 batch size to 1024 batch size, i.e. if 512 batch size had N useful nodes on average, 1024 batch size should have 1.97N useful nodes at least, which I somehow strongly doubt.
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: My non-OC RTX 2070 is very fast with Lc0

Post by chrisw »

Milos wrote: Tue Dec 04, 2018 7:56 pm
Laskos wrote: Tue Dec 04, 2018 1:09 pm At anything longer than 1s/move, MinibatchSize=1024 surpasses 512 value NPS-wise, by 1-3%. Anybody knows if it translates directly to strength?
I would doubt if you'd gain anything. You do 2x more speculation with only 1-3% extra NPS. To have strength gain you'd need efficiency drop of less then 3% when going from 512 batch size to 1024 batch size, i.e. if 512 batch size had N useful nodes on average, 1024 batch size should have 1.97N useful nodes at least, which I somehow strongly doubt.
To make every node a useful node, set Batchsize=1. Then the search becomes serial and accords with “theory” of how policy guided NN-MCTS should work.

If Batchsize=2, then one node will be useful, the other node has three possibilities
a) it turns out to be useful.
b) it’s useless and evaluated entirely unnecessarily.
c) random node. it gets used but wouldn’t have been if the search was serial. this might turn out to be positive, or it may be entirely wasteful.

On the other side, if Batchsize=1, then nps=300 nps or something pathetically slow, I’m guessing a bit.
As you increase Batchsize, nps increases, but non-linearly. A graph would be useful.

As you increase Batchsize, useless nodes (type b) increase in number and proportion, random nodes (type c) increase in number and proportion, and useful nodes (type a) increase in number but decrease in proportion to total nodes.
Unless and until somebody can make a graph of this, eg come up with some data, you can only talk in general terms.

Still my question of several weeks ago now (LC0 proportion of useful to total nodes, efficiency of search) sits on the board, unanswered. I guess nobody has measured it. Without graphing the value, you’ll remain shooting in the dark.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Laskos »

Milos wrote: Tue Dec 04, 2018 7:56 pm
Laskos wrote: Tue Dec 04, 2018 1:09 pm At anything longer than 1s/move, MinibatchSize=1024 surpasses 512 value NPS-wise, by 1-3%. Anybody knows if it translates directly to strength?
I would doubt if you'd gain anything. You do 2x more speculation with only 1-3% extra NPS. To have strength gain you'd need efficiency drop of less then 3% when going from 512 batch size to 1024 batch size, i.e. if 512 batch size had N useful nodes on average, 1024 batch size should have 1.97N useful nodes at least, which I somehow strongly doubt.
While at ultra-fast, I can check in many games the effect (NPS wise at 0.1s/move MinibatchSize=256 was better than 512), and it came positive at 1.5s + 0.025s (Move Overhead = 0ms, timemargin =100ms in Cutechess-cli, no time losses), 15 +/- 10 Elo points gain in 2,000 games, I cannot play many games at LTC to check for MinibatchSize=1024 instead of 512. But two tests-suites stand out as suitable to check for _strength_ of Lc0 (not Arasan and such tactical suites, where not only Lc0 underperforms heavily compared to regular engines, but they do not show _strenght_ in general). These test-suites are ERET of 111 positions and my custom Openings200 of 200 positions. On these two, the results with Lc0 and a good net are comparable to top 3 engines, which corresponds also to Elo-wise performance.

Here are the resulsts at 60s/position:

ERET test suite:

MinibatchSize=512
score=72/111 [averages on correct positions: depth=7.7 time=4.29 nodes=103125]
MinibatchSize=1024
score=74/111 [averages on correct positions: depth=7.8 time=5.42 nodes=137317]


Openings200 test suite:

MinibatchSize=512
score=153/200 [averages on correct positions: depth=5.3 time=4.02 nodes=101923]
MinibatchSize=1024
score=156/200 [averages on correct positions: depth=5.5 time=4.05 nodes=100214]

Each of them is not extremely statistically significant, but both combined, are pretty significant. My impression is that in my conditions, below ~0.2s/move, MinibatchSize=256 is the best, from ~0.2s/move to ~10s/move (usual time controls) MinibatchSize=512 is the best, and above ~10s/move, MinibatchSize=1024 is the best Elo-wise. Sure, all that is pretty speculative.
Albert Silver
Posts: 3019
Joined: Wed Mar 08, 2006 9:57 pm
Location: Rio de Janeiro, Brazil

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Albert Silver »

Laskos wrote: Mon Nov 19, 2018 3:00 pm Just got and installed it. With one of the latest nets, Lc0 v19 rc5 engine:

UCI commands:

setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 512
setoption name NNCacheSize value 2000000
go

info depth 19 seldepth 52 time 41681 nodes 984582 score cp 27 hashfull 274 nps 23621
info depth 21 seldepth 53 time 69999 nodes 2032430 score cp 26 hashfull 431 nps 29035
info depth 22 seldepth 54 time 93937 nodes 2845554 score cp 26 hashfull 570 nps 30292

Didn't quite expect such speeds, would have been happy even with 18,000-20,000.
Some 5-6 fold improvement over GTX 1060.

My power supply is not that strong (500W), hope it stays well.
I ran the above on a 2080, and am sharing as a means of reference.

info depth 17 seldepth 43 time 93269 nodes 3386903 score cp 26 hashfull 754 nps 36313
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."
brianr
Posts: 536
Joined: Thu Mar 09, 2006 3:01 pm

Re: My non-OC RTX 2070 is very fast with Lc0

Post by brianr »

OK, something seems off.

Why are the 2080 depth 17 nodes so many more than the depth 19 with the 2070?

Maybe I am missing something.
Thanks.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Laskos »

brianr wrote: Thu Dec 06, 2018 3:33 am OK, something seems off.

Why are the 2080 depth 17 nodes so many more than the depth 19 with the 2070?

Maybe I am missing something.
Thanks.
Probably different nets used. But the speed should be fairly uniform with the latter test30 nets, so nps are probably fair to compare.
Albert Silver
Posts: 3019
Joined: Wed Mar 08, 2006 9:57 pm
Location: Rio de Janeiro, Brazil

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Albert Silver »

Laskos wrote: Thu Dec 06, 2018 3:59 am
brianr wrote: Thu Dec 06, 2018 3:33 am OK, something seems off.

Why are the 2080 depth 17 nodes so many more than the depth 19 with the 2070?

Maybe I am missing something.
Thanks.
Probably different nets used. But the speed should be fairly uniform with the latter test30 nets, so nps are probably fair to compare.
I used 11250, which I had on hand.
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."