nice! - congrats on your new setup!Laskos wrote: ↑Mon Nov 19, 2018 3:00 pm Just got and installed it. With one of the latest nets, Lc0 v19 rc5 engine:
UCI commands:
setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 512
setoption name NNCacheSize value 2000000
go
info depth 19 seldepth 52 time 41681 nodes 984582 score cp 27 hashfull 274 nps 23621
info depth 21 seldepth 53 time 69999 nodes 2032430 score cp 26 hashfull 431 nps 29035
info depth 22 seldepth 54 time 93937 nodes 2845554 score cp 26 hashfull 570 nps 30292
Didn't quite expect such speeds, would have been happy even with 18,000-20,000.
Some 5-6 fold improvement over GTX 1060.
My power supply is not that strong (500W), hope it stays well.
My non-OC RTX 2070 is very fast with Lc0
Moderators: hgm, Rebel, chrisw
-
- Posts: 4889
- Joined: Thu Mar 09, 2006 6:34 am
- Location: Pen Argyl, Pennsylvania
Re: My non-OC RTX 2070 is very fast with Lc0
-
- Posts: 99
- Joined: Sat Mar 10, 2018 6:16 am
Re: My non-OC RTX 2070 is very fast with Lc0
Those are good settings for speed. Glad you figured that much out. I think you will find raising max node collisions to 48 helps even more. I did a 5s / move arasan tactics suite at 512 batchsize with default 32 node collisions, and raising it to 48, 64, and 96, and 48 scored the highest average and most consistent (despite NPS increasing as it goes up.) 64 was close, but less consistent as sometimes the extra speed hurt and sometimes it helped.
Code: Select all
111 of 200 matching moves Rated time: 07:51 512 batchsize 32 node collisions
112 of 200 matching moves Rated time: 07:46 512 batchsize 48 node collisions
114 of 200 matching moves Rated time: 07:43
114 of 200 matching moves Rated time: 07:39
113 of 200 matching moves Rated time: 07:50
113 of 200 matching moves Rated time: 07:47 512 batchsize 64 node collisions
116 of 200 matching moves Rated time: 07:29
111 of 200 matching moves Rated time: 07:52
111 of 200 matching moves Rated time: 07:58
112 of 200 matching moves Rated time: 07:48 512 batchsize 96 node collisions
111 of 200 matching moves Rated time: 07:49
more data here https://docs.google.com/spreadsheets/d/1yxri9LRpVH2TMWjgUDuw-V2jfpNs0pkNqNNJ3sHuttA/edit#gid=475598514
-
- Posts: 1243
- Joined: Sat Dec 13, 2008 7:00 pm
Re: My non-OC RTX 2070 is very fast with Lc0
Just looking at NPS doesn't say anything about engine strength if you tweak the search settings.
Evaluating the same position over and over is really fast but also not very useful.
Evaluating the same position over and over is really fast but also not very useful.
-
- Posts: 1797
- Joined: Thu Sep 18, 2008 10:24 pm
Re: My non-OC RTX 2070 is very fast with Lc0
Do you know the true search speedup with Lc0 going from one card to two?Gian-Carlo Pascutto wrote: ↑Wed Nov 21, 2018 9:23 am Just looking at NPS doesn't say anything about engine strength if you tweak the search settings.
Evaluating the same position over and over is really fast but also not very useful.
Would 2 x 2080 Ti be about 1.8x faster than one card? (not simply nps, but true speedup)
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: My non-OC RTX 2070 is very fast with Lc0
No one can tell you that. It's like with SMP and A/B search, having 2x nps doesn't give you 2x strength (equivalent of 2x time) improvement and to know actual strength improvement one needs to test it thoroughly.Werewolf wrote: ↑Wed Nov 21, 2018 10:44 amDo you know the true search speedup with Lc0 going from one card to two?Gian-Carlo Pascutto wrote: ↑Wed Nov 21, 2018 9:23 am Just looking at NPS doesn't say anything about engine strength if you tweak the search settings.
Evaluating the same position over and over is really fast but also not very useful.
Would 2 x 2080 Ti be about 1.8x faster than one card? (not simply nps, but true speedup)
-
- Posts: 1797
- Joined: Thu Sep 18, 2008 10:24 pm
Re: My non-OC RTX 2070 is very fast with Lc0
But is there some principle you are drawing on to deduce that a doubling of nps (going from 1 graphics card to 2) doesn't give the equivalent of a doubling of time? With alpha beta the reasons for search inefficiency are well known, is there a similar principle for Lc0?Milos wrote: ↑Wed Nov 21, 2018 2:21 pmNo one can tell you that. It's like with SMP and A/B search, having 2x nps doesn't give you 2x strength (equivalent of 2x time) improvement and to know actual strength improvement one needs to test it thoroughly.Werewolf wrote: ↑Wed Nov 21, 2018 10:44 amDo you know the true search speedup with Lc0 going from one card to two?Gian-Carlo Pascutto wrote: ↑Wed Nov 21, 2018 9:23 am Just looking at NPS doesn't say anything about engine strength if you tweak the search settings.
Evaluating the same position over and over is really fast but also not very useful.
Would 2 x 2080 Ti be about 1.8x faster than one card? (not simply nps, but true speedup)
-
- Posts: 1243
- Joined: Sat Dec 13, 2008 7:00 pm
Re: My non-OC RTX 2070 is very fast with Lc0
It's essentially the same: the core algorithm is sequential, all the parallelism is caused by speculatively evaluating nodes.The more parallelism you try to get, the more positions you will end up evaluating needlessly.
-
- Posts: 4321
- Joined: Tue Apr 03, 2012 4:28 pm
Re: My non-OC RTX 2070 is very fast with Lc0
That's why I asked a couple of days ago if anybody knew the ratio of [MCTS actually used nodes] to [total evaluated nodes] for LC0. I got the impression from your previous reply to that post that it was 1:1 and figured I must have been being random.Gian-Carlo Pascutto wrote: ↑Wed Nov 21, 2018 5:05 pmIt's essentially the same: the core algorithm is sequential, all the parallelism is caused by speculatively evaluating nodes.The more parallelism you try to get, the more positions you will end up evaluating needlessly.
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: My non-OC RTX 2070 is very fast with Lc0
Exactly what Gian-Carlo says. You are executing things in parallel that should have essentially been executed serially and speculatively choosing what to execute in parallel. You don't need multiple GPU cards to do that, you are already doing it on a single GPU when executing in batches.Werewolf wrote: ↑Wed Nov 21, 2018 3:26 pmBut is there some principle you are drawing on to deduce that a doubling of nps (going from 1 graphics card to 2) doesn't give the equivalent of a doubling of time? With alpha beta the reasons for search inefficiency are well known, is there a similar principle for Lc0?Milos wrote: ↑Wed Nov 21, 2018 2:21 pmNo one can tell you that. It's like with SMP and A/B search, having 2x nps doesn't give you 2x strength (equivalent of 2x time) improvement and to know actual strength improvement one needs to test it thoroughly.Werewolf wrote: ↑Wed Nov 21, 2018 10:44 amDo you know the true search speedup with Lc0 going from one card to two?Gian-Carlo Pascutto wrote: ↑Wed Nov 21, 2018 9:23 am Just looking at NPS doesn't say anything about engine strength if you tweak the search settings.
Evaluating the same position over and over is really fast but also not very useful.
Would 2 x 2080 Ti be about 1.8x faster than one card? (not simply nps, but true speedup)
The difference with A/B is, that there the level of speculations is much higher since you are pruning much more aggressively than in MCTS, so parallelising things might have a bigger impact on algorithm strength, i.e. it is much more probable that you are gonna search in vain nodes that should have been cut. However, as we've seen with Lazy SMP, if non-SMP version of A/B algorithm is too speculative in the first place and you are removing too much from the search tree, using parallel algorithm that broadens the tree might actually help a bit.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: My non-OC RTX 2070 is very fast with Lc0
I checked max node collisions for 32. 48, 64 values. In tactical WAC200.epd corrected by Albert Silver, in 6 runs, I got too that 48 or even 64 is the best. But in more positional and reflecting better the real strength STS 1500 (1 run), Openings200 (6 runs), ERET (6 runs), the results came inconclusive and maybe 32 value as the best. I think I will leave it as it is, at 32, as the improvement at 48 or 64 seems to occur only in very tactical suites.jjoshua2 wrote: ↑Wed Nov 21, 2018 2:39 am Those are good settings for speed. Glad you figured that much out. I think you will find raising max node collisions to 48 helps even more. I did a 5s / move arasan tactics suite at 512 batchsize with default 32 node collisions, and raising it to 48, 64, and 96, and 48 scored the highest average and most consistent (despite NPS increasing as it goes up.) 64 was close, but less consistent as sometimes the extra speed hurt and sometimes it helped.
Code: Select all
111 of 200 matching moves Rated time: 07:51 512 batchsize 32 node collisions 112 of 200 matching moves Rated time: 07:46 512 batchsize 48 node collisions 114 of 200 matching moves Rated time: 07:43 114 of 200 matching moves Rated time: 07:39 113 of 200 matching moves Rated time: 07:50 113 of 200 matching moves Rated time: 07:47 512 batchsize 64 node collisions 116 of 200 matching moves Rated time: 07:29 111 of 200 matching moves Rated time: 07:52 111 of 200 matching moves Rated time: 07:58 112 of 200 matching moves Rated time: 07:48 512 batchsize 96 node collisions 111 of 200 matching moves Rated time: 07:49 more data here https://docs.google.com/spreadsheets/d/1yxri9LRpVH2TMWjgUDuw-V2jfpNs0pkNqNNJ3sHuttA/edit#gid=475598514