My non-OC RTX 2070 is very fast with Lc0

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: My non-OC RTX 2070 is very fast with Lc0

Post by MikeB »

Laskos wrote: Mon Nov 19, 2018 3:00 pm Just got and installed it. With one of the latest nets, Lc0 v19 rc5 engine:

UCI commands:

setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 512
setoption name NNCacheSize value 2000000
go

info depth 19 seldepth 52 time 41681 nodes 984582 score cp 27 hashfull 274 nps 23621
info depth 21 seldepth 53 time 69999 nodes 2032430 score cp 26 hashfull 431 nps 29035
info depth 22 seldepth 54 time 93937 nodes 2845554 score cp 26 hashfull 570 nps 30292

Didn't quite expect such speeds, would have been happy even with 18,000-20,000.
Some 5-6 fold improvement over GTX 1060.

My power supply is not that strong (500W), hope it stays well.
nice! 👍 - congrats on your new setup!
Image
jjoshua2
Posts: 99
Joined: Sat Mar 10, 2018 6:16 am

Re: My non-OC RTX 2070 is very fast with Lc0

Post by jjoshua2 »

Those are good settings for speed. Glad you figured that much out. I think you will find raising max node collisions to 48 helps even more. I did a 5s / move arasan tactics suite at 512 batchsize with default 32 node collisions, and raising it to 48, 64, and 96, and 48 scored the highest average and most consistent (despite NPS increasing as it goes up.) 64 was close, but less consistent as sometimes the extra speed hurt and sometimes it helped.

Code: Select all

111 of 200 matching moves   Rated time: 07:51			512 batchsize	32 node collisions
				
112 of 200 matching moves   Rated time: 07:46			512 batchsize	48 node collisions
114 of 200 matching moves   Rated time: 07:43				
114 of 200 matching moves   Rated time: 07:39				
113 of 200 matching moves   Rated time: 07:50				
				
113 of 200 matching moves   Rated time: 07:47			512 batchsize	64 node collisions
116 of 200 matching moves   Rated time: 07:29				
111 of 200 matching moves   Rated time: 07:52				
111 of 200 matching moves   Rated time: 07:58				
				
112 of 200 matching moves   Rated time: 07:48			512 batchsize	96 node collisions
111 of 200 matching moves   Rated time: 07:49


more data here https://docs.google.com/spreadsheets/d/1yxri9LRpVH2TMWjgUDuw-V2jfpNs0pkNqNNJ3sHuttA/edit#gid=475598514
Gian-Carlo Pascutto
Posts: 1243
Joined: Sat Dec 13, 2008 7:00 pm

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Gian-Carlo Pascutto »

Just looking at NPS doesn't say anything about engine strength if you tweak the search settings.

Evaluating the same position over and over is really fast but also not very useful.
Werewolf
Posts: 1795
Joined: Thu Sep 18, 2008 10:24 pm

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Werewolf »

Gian-Carlo Pascutto wrote: Wed Nov 21, 2018 9:23 am Just looking at NPS doesn't say anything about engine strength if you tweak the search settings.

Evaluating the same position over and over is really fast but also not very useful.
Do you know the true search speedup with Lc0 going from one card to two?

Would 2 x 2080 Ti be about 1.8x faster than one card? (not simply nps, but true speedup)
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Milos »

Werewolf wrote: Wed Nov 21, 2018 10:44 am
Gian-Carlo Pascutto wrote: Wed Nov 21, 2018 9:23 am Just looking at NPS doesn't say anything about engine strength if you tweak the search settings.

Evaluating the same position over and over is really fast but also not very useful.
Do you know the true search speedup with Lc0 going from one card to two?

Would 2 x 2080 Ti be about 1.8x faster than one card? (not simply nps, but true speedup)
No one can tell you that. It's like with SMP and A/B search, having 2x nps doesn't give you 2x strength (equivalent of 2x time) improvement and to know actual strength improvement one needs to test it thoroughly.
Werewolf
Posts: 1795
Joined: Thu Sep 18, 2008 10:24 pm

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Werewolf »

Milos wrote: Wed Nov 21, 2018 2:21 pm
Werewolf wrote: Wed Nov 21, 2018 10:44 am
Gian-Carlo Pascutto wrote: Wed Nov 21, 2018 9:23 am Just looking at NPS doesn't say anything about engine strength if you tweak the search settings.

Evaluating the same position over and over is really fast but also not very useful.
Do you know the true search speedup with Lc0 going from one card to two?

Would 2 x 2080 Ti be about 1.8x faster than one card? (not simply nps, but true speedup)
No one can tell you that. It's like with SMP and A/B search, having 2x nps doesn't give you 2x strength (equivalent of 2x time) improvement and to know actual strength improvement one needs to test it thoroughly.
But is there some principle you are drawing on to deduce that a doubling of nps (going from 1 graphics card to 2) doesn't give the equivalent of a doubling of time? With alpha beta the reasons for search inefficiency are well known, is there a similar principle for Lc0?
Gian-Carlo Pascutto
Posts: 1243
Joined: Sat Dec 13, 2008 7:00 pm

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Gian-Carlo Pascutto »

Werewolf wrote: Wed Nov 21, 2018 3:26 pm With alpha beta the reasons for search inefficiency are well known, is there a similar principle for Lc0?
It's essentially the same: the core algorithm is sequential, all the parallelism is caused by speculatively evaluating nodes.The more parallelism you try to get, the more positions you will end up evaluating needlessly.
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: My non-OC RTX 2070 is very fast with Lc0

Post by chrisw »

Gian-Carlo Pascutto wrote: Wed Nov 21, 2018 5:05 pm
Werewolf wrote: Wed Nov 21, 2018 3:26 pm With alpha beta the reasons for search inefficiency are well known, is there a similar principle for Lc0?
It's essentially the same: the core algorithm is sequential, all the parallelism is caused by speculatively evaluating nodes.The more parallelism you try to get, the more positions you will end up evaluating needlessly.
That's why I asked a couple of days ago if anybody knew the ratio of [MCTS actually used nodes] to [total evaluated nodes] for LC0. I got the impression from your previous reply to that post that it was 1:1 and figured I must have been being random.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Milos »

Werewolf wrote: Wed Nov 21, 2018 3:26 pm
Milos wrote: Wed Nov 21, 2018 2:21 pm
Werewolf wrote: Wed Nov 21, 2018 10:44 am
Gian-Carlo Pascutto wrote: Wed Nov 21, 2018 9:23 am Just looking at NPS doesn't say anything about engine strength if you tweak the search settings.

Evaluating the same position over and over is really fast but also not very useful.
Do you know the true search speedup with Lc0 going from one card to two?

Would 2 x 2080 Ti be about 1.8x faster than one card? (not simply nps, but true speedup)
No one can tell you that. It's like with SMP and A/B search, having 2x nps doesn't give you 2x strength (equivalent of 2x time) improvement and to know actual strength improvement one needs to test it thoroughly.
But is there some principle you are drawing on to deduce that a doubling of nps (going from 1 graphics card to 2) doesn't give the equivalent of a doubling of time? With alpha beta the reasons for search inefficiency are well known, is there a similar principle for Lc0?
Exactly what Gian-Carlo says. You are executing things in parallel that should have essentially been executed serially and speculatively choosing what to execute in parallel. You don't need multiple GPU cards to do that, you are already doing it on a single GPU when executing in batches.
The difference with A/B is, that there the level of speculations is much higher since you are pruning much more aggressively than in MCTS, so parallelising things might have a bigger impact on algorithm strength, i.e. it is much more probable that you are gonna search in vain nodes that should have been cut. However, as we've seen with Lazy SMP, if non-SMP version of A/B algorithm is too speculative in the first place and you are removing too much from the search tree, using parallel algorithm that broadens the tree might actually help a bit.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Laskos »

jjoshua2 wrote: Wed Nov 21, 2018 2:39 am Those are good settings for speed. Glad you figured that much out. I think you will find raising max node collisions to 48 helps even more. I did a 5s / move arasan tactics suite at 512 batchsize with default 32 node collisions, and raising it to 48, 64, and 96, and 48 scored the highest average and most consistent (despite NPS increasing as it goes up.) 64 was close, but less consistent as sometimes the extra speed hurt and sometimes it helped.

Code: Select all

111 of 200 matching moves   Rated time: 07:51			512 batchsize	32 node collisions
				
112 of 200 matching moves   Rated time: 07:46			512 batchsize	48 node collisions
114 of 200 matching moves   Rated time: 07:43				
114 of 200 matching moves   Rated time: 07:39				
113 of 200 matching moves   Rated time: 07:50				
				
113 of 200 matching moves   Rated time: 07:47			512 batchsize	64 node collisions
116 of 200 matching moves   Rated time: 07:29				
111 of 200 matching moves   Rated time: 07:52				
111 of 200 matching moves   Rated time: 07:58				
				
112 of 200 matching moves   Rated time: 07:48			512 batchsize	96 node collisions
111 of 200 matching moves   Rated time: 07:49


more data here https://docs.google.com/spreadsheets/d/1yxri9LRpVH2TMWjgUDuw-V2jfpNs0pkNqNNJ3sHuttA/edit#gid=475598514
I checked max node collisions for 32. 48, 64 values. In tactical WAC200.epd corrected by Albert Silver, in 6 runs, I got too that 48 or even 64 is the best. But in more positional and reflecting better the real strength STS 1500 (1 run), Openings200 (6 runs), ERET (6 runs), the results came inconclusive and maybe 32 value as the best. I think I will leave it as it is, at 32, as the improvement at 48 or 64 seems to occur only in very tactical suites.