My non-OC RTX 2070 is very fast with Lc0

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

crem
Posts: 177
Joined: Wed May 23, 2018 9:29 pm

Re: My non-OC RTX 2070 is very fast with Lc0

Post by crem »

Laskos wrote: Thu Nov 22, 2018 8:44 am I checked max node collisions for 32. 48, 64 values. In tactical WAC200.epd corrected by Albert Silver, in 6 runs, I got too that 48 or even 64 is the best. But in more positional and reflecting better the real strength STS 1500 (1 run), Openings200 (6 runs), ERET (6 runs), the results came inconclusive and maybe 32 value as the best. I think I will leave it as it is, at 32, as the improvement at 48 or 64 seems to occur only in very tactical suites.
I don't know whether someone tested changing max collisions in v0.19, but in theory unlike v0.18 increasing them higher than the default value shouldn't really improve strength or nps.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Laskos »

I tried to see NPS not just from the starting position, but from early midgame quiet positions, average on 30 of them. My RTX 2070 shows pretty amazing performance to longer TC. With one of the late test30 nets, ID 31692

Same UCI values:
setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 512
setoption name NNCacheSize value 2000000

tpm is time per move in milliseconds
d is depth achieved
nps is Nodes per Second

Average on 30 midgame positions:

tpm=61.5 d=4.33 nps=4617
tpm=110.1 d=5.78 nps=9497
tpm=376.8 d=7.78 nps=15729
tpm=1114.1 d=9.11 nps=24883
tpm=3612.3 d=12.56 nps=39197
tpm=14015.8 d=15.44 nps=45543
tpm=34465.7 d=17.33 nps=48813
tpm=67636.4 d=20.56 nps=51953

Image


As one can see from numbers, with RTX 2070, Leela can be tested at ultra-fast time controls too, it's not a slouch like it was at ultra-fast on GTX 1060. Lc0 at 6s+0.1s time control with a good net is the level of SF8 on 1 thread. On GTX 1060, it was about 2200 CCRL Elo level. At LTC it surpasses 50 kNPS in midgame positions. Pretty remarkable, similar to speeds I have seen in CCCC2, and about 6-7 times faster than GTX 1060. RTX series from Nvidia is a godsent for these machine learning NN applications.
Javier Ros
Posts: 200
Joined: Fri Oct 12, 2012 12:48 pm
Location: Seville (SPAIN)
Full name: Javier Ros

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Javier Ros »

Thanks for sharing!
It seems that the speed of lc0 is impressive, you could say the specifications of the card like

8 GB de RAM, 1620 MHz Core Clock, etc
jjoshua2
Posts: 99
Joined: Sat Mar 10, 2018 6:16 am

Re: My non-OC RTX 2070 is very fast with Lc0

Post by jjoshua2 »

Laskos wrote: Thu Nov 22, 2018 8:44 am I checked max node collisions for 32. 48, 64 values. In tactical WAC200.epd corrected by Albert Silver, in 6 runs, I got too that 48 or even 64 is the best. But in more positional and reflecting better the real strength STS 1500 (1 run), Openings200 (6 runs), ERET (6 runs), the results came inconclusive and maybe 32 value as the best. I think I will leave it as it is, at 32, as the improvement at 48 or 64 seems to occur only in very tactical suites.
I discovered playing with max-prefetch can also help. Setting it to 256 was a bit weaker in Arasan getting only 109, but is 93% LOS stronger in a test to optimize for tcec div 3.

Code: Select all

ResultSet-EloRating>ratings
Rank Name                                 Elo    +    - games score oppo. draws
   1 lc0tradepenalty bs256 mp256              107   63   63   194   92%  -272   13%
   2 lc0tradepenalty defaults              45   49   49   246   89%  -272   20%
   3 lc0tradepenalty bs512 nc48                43   54   54   194   89%  -272   21%
   4 lc0tradepenalty 3.8 bs256 nc256 mp256       31   77   77    92   88%  -272   20%
   5 lc0tradepenalty 3.8 bs256 256mp 32nc    28   58   58   160   88%  -272   21%
   6 lc0tradepenalty 3.8 bs512 nc48            17   70   70   106   87%  -272   22%
   7 lc0tradepenalty 256 bs256 nc256            9   73   73    94   87%  -272   22%
   8 lc0tradepenalty 3.8 bs256 nc256           -8   66   66   108   85%  -272   24%
   9 Vajolet2_2.5 30CPU                  -272   22   22  1194   11%    41   20%
ResultSet-EloRating>los
                                    lc lc lc lc lc lc lc lc Va
lc0tradepenalty 256 256                93 93 93 96 96 97 99100
lc0tradepenalty defaults             6    52 62 67 74 79 89100
lc0tradepenalty 512 48               6 47    60 64 71 76 87100
lc0tradepenalty 3.8 256 256 256      6 37 39    51 60 65 77 99
lc0tradepenalty 3.8 256 256pf 32nc   3 32 35 48    59 65 78100
lc0tradepenalty 3.8 512 48           3 25 28 39 40    56 69 99
lc0tradepenalty 256 256 256          2 20 23 34 34 43    63 99
lc0tradepenalty 3.8 256 256          0 10 12 22 21 30 36    99
Vajolet2_2.5 30CPU                   0  0  0  0  0  0  0  0
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Laskos »

jjoshua2 wrote: Sat Dec 01, 2018 4:48 pm
Laskos wrote: Thu Nov 22, 2018 8:44 am I checked max node collisions for 32. 48, 64 values. In tactical WAC200.epd corrected by Albert Silver, in 6 runs, I got too that 48 or even 64 is the best. But in more positional and reflecting better the real strength STS 1500 (1 run), Openings200 (6 runs), ERET (6 runs), the results came inconclusive and maybe 32 value as the best. I think I will leave it as it is, at 32, as the improvement at 48 or 64 seems to occur only in very tactical suites.
I discovered playing with max-prefetch can also help. Setting it to 256 was a bit weaker in Arasan getting only 109, but is 93% LOS stronger in a test to optimize for tcec div 3.

Code: Select all

ResultSet-EloRating>ratings
Rank Name                                 Elo    +    - games score oppo. draws
   1 lc0tradepenalty bs256 mp256              107   63   63   194   92%  -272   13%
   2 lc0tradepenalty defaults              45   49   49   246   89%  -272   20%
   3 lc0tradepenalty bs512 nc48                43   54   54   194   89%  -272   21%
   4 lc0tradepenalty 3.8 bs256 nc256 mp256       31   77   77    92   88%  -272   20%
   5 lc0tradepenalty 3.8 bs256 256mp 32nc    28   58   58   160   88%  -272   21%
   6 lc0tradepenalty 3.8 bs512 nc48            17   70   70   106   87%  -272   22%
   7 lc0tradepenalty 256 bs256 nc256            9   73   73    94   87%  -272   22%
   8 lc0tradepenalty 3.8 bs256 nc256           -8   66   66   108   85%  -272   24%
   9 Vajolet2_2.5 30CPU                  -272   22   22  1194   11%    41   20%
ResultSet-EloRating>los
                                    lc lc lc lc lc lc lc lc Va
lc0tradepenalty 256 256                93 93 93 96 96 97 99100
lc0tradepenalty defaults             6    52 62 67 74 79 89100
lc0tradepenalty 512 48               6 47    60 64 71 76 87100
lc0tradepenalty 3.8 256 256 256      6 37 39    51 60 65 77 99
lc0tradepenalty 3.8 256 256pf 32nc   3 32 35 48    59 65 78100
lc0tradepenalty 3.8 512 48           3 25 28 39 40    56 69 99
lc0tradepenalty 256 256 256          2 20 23 34 34 43    63 99
lc0tradepenalty 3.8 256 256          0 10 12 22 21 30 36    99
Vajolet2_2.5 30CPU                   0  0  0  0  0  0  0  0
At ultra-fast (6s + 0.1s), with max-prefetch set at 256, it is significantly weaker, without much doubt.

My card is Gainward GeForce RTX 2070 8GB GDDR6 256-bit (426018336-4269)

Dual Fan
GeForce RTX 2000
Processor clock 1410 MHz
GPU Boost clock 1620 MHz
GDDR6 8 GB
Memory BUS 256 bit
Maximum Effective Frequency 14000 MHz
Bandwidth 448 GB/s
RAMDAC 400 MHz

It sells here for a stable $650, but I bough it on "Black Friday" for $550 (literally hunting for the offer).

I will play a bit with max-prefetch, as it does seem to have a serious impact. Do you have reasons to believe that at longer TC its optimum is different compared to ultra-fast TC?
jjoshua2
Posts: 99
Joined: Sat Mar 10, 2018 6:16 am

Re: My non-OC RTX 2070 is very fast with Lc0

Post by jjoshua2 »

Laskos wrote: Sat Dec 01, 2018 6:00 pm I will play a bit with max-prefetch, as it does seem to have a serious impact. Do you have reasons to believe that at longer TC its optimum is different compared to ultra-fast TC?
This test was with 2080 ti at 60+1 with 2 move book and 6 man syzygy, so with a medium amount of time against a much weaker engine it highly likely to be stronger. I haven't tested it on any other TC or opponents yet, but I'm doing a gauntlet now at 120+2. It was a bit weaker in tactics test, which maybe hurts leela a lot at ultra bullet? You could maybe try STS with a bit more time, since you found different results than Arasan there before?

I think what it effectively does is fill up each batch to fill with speculative nodes that it stores in cache, and since out of order eval uci option is checked by default, it will use them whenever it can down the road even if it wasn't the node it would have wanted to expand first. So it seems the sort of option that would help more with more time since its more likely it will eventually get used with a greater number of future batches, and also the better it is to broaden the search at the expense of slowing down the most important part just a bit. Also, for some reason the speculative nodes are fetched without applying FPU reduction which also tends to broaden search...
Jesse Gersenson
Posts: 593
Joined: Sat Aug 20, 2011 9:43 am

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Jesse Gersenson »

Kai, how did you come up with the settings?

Did you use a script to loops through settings to find optimal values?
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Laskos »

Jesse Gersenson wrote: Sun Dec 02, 2018 11:39 am Kai, how did you come up with the settings?

Did you use a script to loops through settings to find optimal values?
What settings?

I use
setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 512
setoption name NNCacheSize value 2000000

and only NNCache is non-standard. 10x times higher is better both NPS wise and strength wise, at all time controls I checked. Even higher might be better at very long time controls.

I also checked MaxCollisionEvents, which by default is 32, but Elo-wise in 1000 games each match in fast games, 32, 40, 48 come within error margins, so maybe 40 is better, as it is a tiny bit better tactically. But don't expect Elo improvement beyond 5-10 Elo points, if at all.

Now I am checking MaxPrefetch, which is indeed widening the search for high values (higher time-to-depth). On STS 1500 positions at 5s/position, the results came as (default is 32):

16:
score=1205/1500 [averages on correct positions: depth=3.8 time=0.25 nodes=3516]

24:
score=1208/1500 [averages on correct positions: depth=3.9 time=0.24 nodes=3305]

Def=32:
score=1209/1500 [averages on correct positions: depth=3.9 time=0.26 nodes=3596]

40:
score=1213/1500 [averages on correct positions: depth=3.9 time=0.25 nodes=3476]

48:
score=1212/1500 [averages on correct positions: depth=3.9 time=0.26 nodes=3625]

64:
score=1210/1500 [averages on correct positions: depth=3.9 time=0.25 nodes=3302]

128:
score=1204/1500 [averages on correct positions: depth=3.8 time=0.26 nodes=3273]


Preliminary results in fast games are promising for 40 value instead of 32, but don't expect anything above 10-15 Elo points improvement, if at all.
Werewolf
Posts: 1795
Joined: Thu Sep 18, 2008 10:24 pm

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Werewolf »

Laskos wrote: Mon Nov 19, 2018 3:00 pm Just got and installed it. With one of the latest nets, Lc0 v19 rc5 engine:

UCI commands:

setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 512
setoption name NNCacheSize value 2000000
go

info depth 19 seldepth 52 time 41681 nodes 984582 score cp 27 hashfull 274 nps 23621
info depth 21 seldepth 53 time 69999 nodes 2032430 score cp 26 hashfull 431 nps 29035
info depth 22 seldepth 54 time 93937 nodes 2845554 score cp 26 hashfull 570 nps 30292

Didn't quite expect such speeds, would have been happy even with 18,000-20,000.
Some 5-6 fold improvement over GTX 1060.

My power supply is not that strong (500W), hope it stays well.

I'm not doubting your results Laskos, but I'm struggling to understand them.
Your 1060 card produced about 4.4 TFLOPS FP32. Your 2070 card is around 7.5 TFLOPS FP32 which with the new ability to use FP16 means about 15 TFLOPS.

That should make your 2070 just under 4x faster than your 1060. Instead you report 5-6x improvement.

Happy for you...but confused.
jjoshua2
Posts: 99
Joined: Sat Mar 10, 2018 6:16 am

Re: My non-OC RTX 2070 is very fast with Lc0

Post by jjoshua2 »

You don't double fp32 to get fp16, since tensor cores run fp16 they have their own unique jump. Around 100 tflops, but because they are hardcoded to one specific task it's not quite as optimized (doesn't support windograd currently) so you end up getting 2-3x instead of something like 8-10x.