My non-OC RTX 2070 is very fast with Lc0

MikeB · Post by **MikeB** » Wed Nov 21, 2018 12:02 am

Laskos wrote: ↑Mon Nov 19, 2018 3:00 pm Just got and installed it. With one of the latest nets, Lc0 v19 rc5 engine:

UCI commands:

setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 512
setoption name NNCacheSize value 2000000
go

info depth 19 seldepth 52 time 41681 nodes 984582 score cp 27 hashfull 274 nps 23621
info depth 21 seldepth 53 time 69999 nodes 2032430 score cp 26 hashfull 431 nps 29035
info depth 22 seldepth 54 time 93937 nodes 2845554 score cp 26 hashfull 570 nps 30292

Didn't quite expect such speeds, would have been happy even with 18,000-20,000.
Some 5-6 fold improvement over GTX 1060.

My power supply is not that strong (500W), hope it stays well.

nice!

- congrats on your new setup!

jjoshua2 · Post by **jjoshua2** » Wed Nov 21, 2018 2:39 am

Those are good settings for speed. Glad you figured that much out. I think you will find raising max node collisions to 48 helps even more. I did a 5s / move arasan tactics suite at 512 batchsize with default 32 node collisions, and raising it to 48, 64, and 96, and 48 scored the highest average and most consistent (despite NPS increasing as it goes up.) 64 was close, but less consistent as sometimes the extra speed hurt and sometimes it helped.

Code: Select all

111 of 200 matching moves   Rated time: 07:51			512 batchsize	32 node collisions
				
112 of 200 matching moves   Rated time: 07:46			512 batchsize	48 node collisions
114 of 200 matching moves   Rated time: 07:43				
114 of 200 matching moves   Rated time: 07:39				
113 of 200 matching moves   Rated time: 07:50				
				
113 of 200 matching moves   Rated time: 07:47			512 batchsize	64 node collisions
116 of 200 matching moves   Rated time: 07:29				
111 of 200 matching moves   Rated time: 07:52				
111 of 200 matching moves   Rated time: 07:58				
				
112 of 200 matching moves   Rated time: 07:48			512 batchsize	96 node collisions
111 of 200 matching moves   Rated time: 07:49


more data here https://docs.google.com/spreadsheets/d/1yxri9LRpVH2TMWjgUDuw-V2jfpNs0pkNqNNJ3sHuttA/edit#gid=475598514

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Wed Nov 21, 2018 9:23 am

Just looking at NPS doesn't say anything about engine strength if you tweak the search settings.

Evaluating the same position over and over is really fast but also not very useful.

Werewolf · Post by **Werewolf** » Wed Nov 21, 2018 10:44 am

Gian-Carlo Pascutto wrote: ↑Wed Nov 21, 2018 9:23 am Just looking at NPS doesn't say anything about engine strength if you tweak the search settings.

Evaluating the same position over and over is really fast but also not very useful.

Do you know the true search speedup with Lc0 going from one card to two?

Would 2 x 2080 Ti be about 1.8x faster than one card? (not simply nps, but true speedup)

Milos · Post by **Milos** » Wed Nov 21, 2018 2:21 pm

Werewolf wrote: ↑Wed Nov 21, 2018 10:44 am
Gian-Carlo Pascutto wrote: ↑Wed Nov 21, 2018 9:23 am Just looking at NPS doesn't say anything about engine strength if you tweak the search settings.

Evaluating the same position over and over is really fast but also not very useful.
Do you know the true search speedup with Lc0 going from one card to two?

Would 2 x 2080 Ti be about 1.8x faster than one card? (not simply nps, but true speedup)

No one can tell you that. It's like with SMP and A/B search, having 2x nps doesn't give you 2x strength (equivalent of 2x time) improvement and to know actual strength improvement one needs to test it thoroughly.

Werewolf · Post by **Werewolf** » Wed Nov 21, 2018 3:26 pm

Milos wrote: ↑Wed Nov 21, 2018 2:21 pm
Werewolf wrote: ↑Wed Nov 21, 2018 10:44 am
Gian-Carlo Pascutto wrote: ↑Wed Nov 21, 2018 9:23 am Just looking at NPS doesn't say anything about engine strength if you tweak the search settings.

Evaluating the same position over and over is really fast but also not very useful.
Do you know the true search speedup with Lc0 going from one card to two?

Would 2 x 2080 Ti be about 1.8x faster than one card? (not simply nps, but true speedup)
No one can tell you that. It's like with SMP and A/B search, having 2x nps doesn't give you 2x strength (equivalent of 2x time) improvement and to know actual strength improvement one needs to test it thoroughly.

But is there some principle you are drawing on to deduce that a doubling of nps (going from 1 graphics card to 2) doesn't give the equivalent of a doubling of time? With alpha beta the reasons for search inefficiency are well known, is there a similar principle for Lc0?

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Wed Nov 21, 2018 5:05 pm

Werewolf wrote: ↑Wed Nov 21, 2018 3:26 pm With alpha beta the reasons for search inefficiency are well known, is there a similar principle for Lc0?

It's essentially the same: the core algorithm is sequential, all the parallelism is caused by speculatively evaluating nodes.The more parallelism you try to get, the more positions you will end up evaluating needlessly.

chrisw · Post by **chrisw** » Wed Nov 21, 2018 5:54 pm

Gian-Carlo Pascutto wrote: ↑Wed Nov 21, 2018 5:05 pm
Werewolf wrote: ↑Wed Nov 21, 2018 3:26 pm With alpha beta the reasons for search inefficiency are well known, is there a similar principle for Lc0?
It's essentially the same: the core algorithm is sequential, all the parallelism is caused by speculatively evaluating nodes.The more parallelism you try to get, the more positions you will end up evaluating needlessly.

That's why I asked a couple of days ago if anybody knew the ratio of [MCTS actually used nodes] to [total evaluated nodes] for LC0. I got the impression from your previous reply to that post that it was 1:1 and figured I must have been being random.

Milos · Post by **Milos** » Wed Nov 21, 2018 8:23 pm

Werewolf wrote: ↑Wed Nov 21, 2018 3:26 pm
Milos wrote: ↑Wed Nov 21, 2018 2:21 pm
Werewolf wrote: ↑Wed Nov 21, 2018 10:44 am
Gian-Carlo Pascutto wrote: ↑Wed Nov 21, 2018 9:23 am Just looking at NPS doesn't say anything about engine strength if you tweak the search settings.

Evaluating the same position over and over is really fast but also not very useful.
Do you know the true search speedup with Lc0 going from one card to two?

Would 2 x 2080 Ti be about 1.8x faster than one card? (not simply nps, but true speedup)
No one can tell you that. It's like with SMP and A/B search, having 2x nps doesn't give you 2x strength (equivalent of 2x time) improvement and to know actual strength improvement one needs to test it thoroughly.
But is there some principle you are drawing on to deduce that a doubling of nps (going from 1 graphics card to 2) doesn't give the equivalent of a doubling of time? With alpha beta the reasons for search inefficiency are well known, is there a similar principle for Lc0?

Exactly what Gian-Carlo says. You are executing things in parallel that should have essentially been executed serially and speculatively choosing what to execute in parallel. You don't need multiple GPU cards to do that, you are already doing it on a single GPU when executing in batches.
The difference with A/B is, that there the level of speculations is much higher since you are pruning much more aggressively than in MCTS, so parallelising things might have a bigger impact on algorithm strength, i.e. it is much more probable that you are gonna search in vain nodes that should have been cut. However, as we've seen with Lazy SMP, if non-SMP version of A/B algorithm is too speculative in the first place and you are removing too much from the search tree, using parallel algorithm that broadens the tree might actually help a bit.

Laskos · Post by **Laskos** » Thu Nov 22, 2018 8:44 am

jjoshua2 wrote: ↑Wed Nov 21, 2018 2:39 am Those are good settings for speed. Glad you figured that much out. I think you will find raising max node collisions to 48 helps even more. I did a 5s / move arasan tactics suite at 512 batchsize with default 32 node collisions, and raising it to 48, 64, and 96, and 48 scored the highest average and most consistent (despite NPS increasing as it goes up.) 64 was close, but less consistent as sometimes the extra speed hurt and sometimes it helped.
Code: Select all
111 of 200 matching moves   Rated time: 07:51			512 batchsize	32 node collisions
				
112 of 200 matching moves   Rated time: 07:46			512 batchsize	48 node collisions
114 of 200 matching moves   Rated time: 07:43				
114 of 200 matching moves   Rated time: 07:39				
113 of 200 matching moves   Rated time: 07:50				
				
113 of 200 matching moves   Rated time: 07:47			512 batchsize	64 node collisions
116 of 200 matching moves   Rated time: 07:29				
111 of 200 matching moves   Rated time: 07:52				
111 of 200 matching moves   Rated time: 07:58				
				
112 of 200 matching moves   Rated time: 07:48			512 batchsize	96 node collisions
111 of 200 matching moves   Rated time: 07:49


more data here https://docs.google.com/spreadsheets/d/1yxri9LRpVH2TMWjgUDuw-V2jfpNs0pkNqNNJ3sHuttA/edit#gid=475598514

I checked max node collisions for 32. 48, 64 values. In tactical WAC200.epd corrected by Albert Silver, in 6 runs, I got too that 48 or even 64 is the best. But in more positional and reflecting better the real strength STS 1500 (1 run), Openings200 (6 runs), ERET (6 runs), the results came inconclusive and maybe 32 value as the best. I think I will leave it as it is, at 32, as the improvement at 48 or 64 seems to occur only in very tactical suites.

My non-OC RTX 2070 is very fast with Lc0

Re: My non-OC RTX 2070 is very fast with Lc0

Re: My non-OC RTX 2070 is very fast with Lc0

Re: My non-OC RTX 2070 is very fast with Lc0

Re: My non-OC RTX 2070 is very fast with Lc0

Re: My non-OC RTX 2070 is very fast with Lc0

Re: My non-OC RTX 2070 is very fast with Lc0

Re: My non-OC RTX 2070 is very fast with Lc0

Re: My non-OC RTX 2070 is very fast with Lc0

Re: My non-OC RTX 2070 is very fast with Lc0

Re: My non-OC RTX 2070 is very fast with Lc0