My non-OC RTX 2070 is very fast with Lc0

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Albert Silver
Posts: 3019
Joined: Wed Mar 08, 2006 9:57 pm
Location: Rio de Janeiro, Brazil

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Albert Silver »

Laskos wrote: Thu Dec 06, 2018 3:43 pm
Albert Silver wrote: Thu Dec 06, 2018 2:25 pm
Laskos wrote: Thu Dec 06, 2018 3:59 am
brianr wrote: Thu Dec 06, 2018 3:33 am OK, something seems off.

Why are the 2080 depth 17 nodes so many more than the depth 19 with the 2070?

Maybe I am missing something.
Thanks.
Probably different nets used. But the speed should be fairly uniform with the latter test30 nets, so nps are probably fair to compare.
I used 11250, which I had on hand.
Ok, with this net I am getting:

info depth 16 seldepth 43 time 95727 nodes 2810327 score cp 25 hashfull 643 nps 29357,

so your is about 23% higher. Having about 28% more CUDA cores at 7% higher frequency. In total 37% expected speed-up. It seems memory speed and bandwidth also matter, as those are the same in 2070 and 80. Also, the price is 40% higher. I think the most ineffective would be RTX 2080 Ti, and the most effective a dual RTX 2070.
Although still paired with a very old i5 (will compare with Threadripper later), the 2080ti yields:

info depth 17 seldepth 43 time 69846 nodes 3024184 score cp 26 hashfull 681 nps 43297
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Laskos »

Albert Silver wrote: Fri Dec 14, 2018 11:11 pm
Laskos wrote: Thu Dec 06, 2018 3:43 pm
Albert Silver wrote: Thu Dec 06, 2018 2:25 pm
Laskos wrote: Thu Dec 06, 2018 3:59 am
brianr wrote: Thu Dec 06, 2018 3:33 am OK, something seems off.

Why are the 2080 depth 17 nodes so many more than the depth 19 with the 2070?

Maybe I am missing something.
Thanks.
Probably different nets used. But the speed should be fairly uniform with the latter test30 nets, so nps are probably fair to compare.
I used 11250, which I had on hand.
Ok, with this net I am getting:

info depth 16 seldepth 43 time 95727 nodes 2810327 score cp 25 hashfull 643 nps 29357,

so your is about 23% higher. Having about 28% more CUDA cores at 7% higher frequency. In total 37% expected speed-up. It seems memory speed and bandwidth also matter, as those are the same in 2070 and 80. Also, the price is 40% higher. I think the most ineffective would be RTX 2080 Ti, and the most effective a dual RTX 2070.
Although still paired with a very old i5 (will compare with Threadripper later), the 2080ti yields:

info depth 17 seldepth 43 time 69846 nodes 3024184 score cp 26 hashfull 681 nps 43297
Thanks, I was expecting 1.5-1.6 faster, and it comes at little below 1.5 faster. Clearly not the best value. I hope, in some 6-9 months I will have a Ryzen 8 or 16 core machine with 2 x RTX 2070. I am not sure about the effective speed-up from 2 GPUs, it can be anything from 1.4 to 1.95. Nobody seems to have performed strength tests on that, just NPS, which scale almost perfectly to 2 GPUs with Lc0.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Milos »

Laskos wrote: Sat Dec 15, 2018 12:56 am Thanks, I was expecting 1.5-1.6 faster, and it comes at little below 1.5 faster. Clearly not the best value. I hope, in some 6-9 months I will have a Ryzen 8 or 16 core machine with 2 x RTX 2070. I am not sure about the effective speed-up from 2 GPUs, it can be anything from 1.4 to 1.95. Nobody seems to have performed strength tests on that, just NPS, which scale almost perfectly to 2 GPUs with Lc0.
I know it's not the same, but there is an interesting test you can perform. Run Lc0 with for example batch size 32 vs batch size 512 with fixed number of nodes per move and check Elo difference.
Albert Silver
Posts: 3019
Joined: Wed Mar 08, 2006 9:57 pm
Location: Rio de Janeiro, Brazil

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Albert Silver »

Laskos wrote: Sat Dec 15, 2018 12:56 am
Albert Silver wrote: Fri Dec 14, 2018 11:11 pm
Laskos wrote: Thu Dec 06, 2018 3:43 pm
Albert Silver wrote: Thu Dec 06, 2018 2:25 pm
Laskos wrote: Thu Dec 06, 2018 3:59 am
brianr wrote: Thu Dec 06, 2018 3:33 am OK, something seems off.

Why are the 2080 depth 17 nodes so many more than the depth 19 with the 2070?

Maybe I am missing something.
Thanks.
Probably different nets used. But the speed should be fairly uniform with the latter test30 nets, so nps are probably fair to compare.
I used 11250, which I had on hand.
Ok, with this net I am getting:

info depth 16 seldepth 43 time 95727 nodes 2810327 score cp 25 hashfull 643 nps 29357,

so your is about 23% higher. Having about 28% more CUDA cores at 7% higher frequency. In total 37% expected speed-up. It seems memory speed and bandwidth also matter, as those are the same in 2070 and 80. Also, the price is 40% higher. I think the most ineffective would be RTX 2080 Ti, and the most effective a dual RTX 2070.
Although still paired with a very old i5 (will compare with Threadripper later), the 2080ti yields:

info depth 17 seldepth 43 time 69846 nodes 3024184 score cp 26 hashfull 681 nps 43297
Thanks, I was expecting 1.5-1.6 faster, and it comes at little below 1.5 faster. Clearly not the best value. I hope, in some 6-9 months I will have a Ryzen 8 or 16 core machine with 2 x RTX 2070. I am not sure about the effective speed-up from 2 GPUs, it can be anything from 1.4 to 1.95. Nobody seems to have performed strength tests on that, just NPS, which scale almost perfectly to 2 GPUs with Lc0.
It is possible the CPU is also a limiting factor. This is a Sandy Bridge generation i5-2500k, In a week I will test with a more powerful one and then share the results.
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Milos »

Albert Silver wrote: Sat Dec 15, 2018 5:47 am It is possible the CPU is also a limiting factor. This is a Sandy Bridge generation i5-2500k, In a week I will test with a more powerful one and then share the results.
Despite being Sandy Bridge these are still relatively fast 4 cores, no way 45k nps would be a bottleneck on 4 cores. You can check your GPU usage, but unless there is some thermal throttling it is almost certainly very close to 100%.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Laskos »

Milos wrote: Sat Dec 15, 2018 1:56 am
Laskos wrote: Sat Dec 15, 2018 12:56 am Thanks, I was expecting 1.5-1.6 faster, and it comes at little below 1.5 faster. Clearly not the best value. I hope, in some 6-9 months I will have a Ryzen 8 or 16 core machine with 2 x RTX 2070. I am not sure about the effective speed-up from 2 GPUs, it can be anything from 1.4 to 1.95. Nobody seems to have performed strength tests on that, just NPS, which scale almost perfectly to 2 GPUs with Lc0.
I know it's not the same, but there is an interesting test you can perform. Run Lc0 with for example batch size 32 vs batch size 512 with fixed number of nodes per move and check Elo difference.
Yes, that's an interesting experiment. I will do that.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Laskos »

Albert Silver wrote: Sat Dec 15, 2018 5:47 am
Laskos wrote: Sat Dec 15, 2018 12:56 am
Albert Silver wrote: Fri Dec 14, 2018 11:11 pm
Laskos wrote: Thu Dec 06, 2018 3:43 pm
Albert Silver wrote: Thu Dec 06, 2018 2:25 pm
Laskos wrote: Thu Dec 06, 2018 3:59 am
brianr wrote: Thu Dec 06, 2018 3:33 am OK, something seems off.

Why are the 2080 depth 17 nodes so many more than the depth 19 with the 2070?

Maybe I am missing something.
Thanks.
Probably different nets used. But the speed should be fairly uniform with the latter test30 nets, so nps are probably fair to compare.
I used 11250, which I had on hand.
Ok, with this net I am getting:

info depth 16 seldepth 43 time 95727 nodes 2810327 score cp 25 hashfull 643 nps 29357,

so your is about 23% higher. Having about 28% more CUDA cores at 7% higher frequency. In total 37% expected speed-up. It seems memory speed and bandwidth also matter, as those are the same in 2070 and 80. Also, the price is 40% higher. I think the most ineffective would be RTX 2080 Ti, and the most effective a dual RTX 2070.
Although still paired with a very old i5 (will compare with Threadripper later), the 2080ti yields:

info depth 17 seldepth 43 time 69846 nodes 3024184 score cp 26 hashfull 681 nps 43297
Thanks, I was expecting 1.5-1.6 faster, and it comes at little below 1.5 faster. Clearly not the best value. I hope, in some 6-9 months I will have a Ryzen 8 or 16 core machine with 2 x RTX 2070. I am not sure about the effective speed-up from 2 GPUs, it can be anything from 1.4 to 1.95. Nobody seems to have performed strength tests on that, just NPS, which scale almost perfectly to 2 GPUs with Lc0.
It is possible the CPU is also a limiting factor. This is a Sandy Bridge generation i5-2500k, In a week I will test with a more powerful one and then share the results.
No, that CPU is completely enough. No HT, so 2 or 3 threads are 2 or 3 full cores. Speed of CPU is good too. With GTX 1060, I even setup the 2 threads to run on a single i7 core, with no lose in NPS.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Laskos »

Milos wrote: Sat Dec 15, 2018 2:08 pm
Milos wrote: Sat Dec 15, 2018 1:56 am
Laskos wrote: Sat Dec 15, 2018 12:56 am Thanks, I was expecting 1.5-1.6 faster, and it comes at little below 1.5 faster. Clearly not the best value. I hope, in some 6-9 months I will have a Ryzen 8 or 16 core machine with 2 x RTX 2070. I am not sure about the effective speed-up from 2 GPUs, it can be anything from 1.4 to 1.95. Nobody seems to have performed strength tests on that, just NPS, which scale almost perfectly to 2 GPUs with Lc0.
I know it's not the same, but there is an interesting test you can perform. Run Lc0 with for example batch size 32 vs batch size 512 with fixed number of nodes per move and check Elo difference.
Yes, that's an interesting experiment. I will do that.
Here are some results:

5000 nodes/move
Score of lc0_v191_32013_512 vs lc0_v191_32013_32: 51 - 79 - 270 [0.465] 400
Elo difference: -24.36 +/- 19.35

20000 nodes/move
Score of lc0_v191_32013_512 vs lc0_v191_32013_32: 7 - 14 - 79 [0.465] 100
Elo difference: -24.36 +/- 31.03


It seems to lose not that much from 32 to 512 batch size. That would seem to indicate good effective speed-up from 1 GPU to 2. Do you have any estimate based on that? I would guess as high as 1.8 or even better.
chrisw
Posts: 4315
Joined: Tue Apr 03, 2012 4:28 pm

Re: My non-OC RTX 2070 is very fast with Lc0

Post by chrisw »

Laskos wrote: Sun Dec 16, 2018 12:26 pm
Milos wrote: Sat Dec 15, 2018 2:08 pm
Milos wrote: Sat Dec 15, 2018 1:56 am
Laskos wrote: Sat Dec 15, 2018 12:56 am Thanks, I was expecting 1.5-1.6 faster, and it comes at little below 1.5 faster. Clearly not the best value. I hope, in some 6-9 months I will have a Ryzen 8 or 16 core machine with 2 x RTX 2070. I am not sure about the effective speed-up from 2 GPUs, it can be anything from 1.4 to 1.95. Nobody seems to have performed strength tests on that, just NPS, which scale almost perfectly to 2 GPUs with Lc0.
I know it's not the same, but there is an interesting test you can perform. Run Lc0 with for example batch size 32 vs batch size 512 with fixed number of nodes per move and check Elo difference.
Yes, that's an interesting experiment. I will do that.
Here are some results:

5000 nodes/move
Score of lc0_v191_32013_512 vs lc0_v191_32013_32: 51 - 79 - 270 [0.465] 400
Elo difference: -24.36 +/- 19.35

20000 nodes/move
Score of lc0_v191_32013_512 vs lc0_v191_32013_32: 7 - 14 - 79 [0.465] 100
Elo difference: -24.36 +/- 31.03


It seems to lose not that much from 32 to 512 batch size. That would seem to indicate good effective speed-up from 1 GPU to 2. Do you have any estimate based on that? I would guess as high as 1.8 or even better.
it certainly does show the 32 batch nodes are way more effectively used than the 512 batch.

Not sure what would happen if you set batch=1, you might crash it. But a batch=1 (or batch=2, if 1 won't work) versus a batch=512 will show up how many of those 512 nodes get not to be used at all. Well, not how many, but it will give an idea.

Graph of ELO diff/batch size ?
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: My non-OC RTX 2070 is very fast with Lc0

Post by Laskos »

chrisw wrote: Sun Dec 16, 2018 12:51 pm
Laskos wrote: Sun Dec 16, 2018 12:26 pm
Milos wrote: Sat Dec 15, 2018 2:08 pm
Milos wrote: Sat Dec 15, 2018 1:56 am
Laskos wrote: Sat Dec 15, 2018 12:56 am Thanks, I was expecting 1.5-1.6 faster, and it comes at little below 1.5 faster. Clearly not the best value. I hope, in some 6-9 months I will have a Ryzen 8 or 16 core machine with 2 x RTX 2070. I am not sure about the effective speed-up from 2 GPUs, it can be anything from 1.4 to 1.95. Nobody seems to have performed strength tests on that, just NPS, which scale almost perfectly to 2 GPUs with Lc0.
I know it's not the same, but there is an interesting test you can perform. Run Lc0 with for example batch size 32 vs batch size 512 with fixed number of nodes per move and check Elo difference.
Yes, that's an interesting experiment. I will do that.
Here are some results:

5000 nodes/move
Score of lc0_v191_32013_512 vs lc0_v191_32013_32: 51 - 79 - 270 [0.465] 400
Elo difference: -24.36 +/- 19.35

20000 nodes/move
Score of lc0_v191_32013_512 vs lc0_v191_32013_32: 7 - 14 - 79 [0.465] 100
Elo difference: -24.36 +/- 31.03


It seems to lose not that much from 32 to 512 batch size. That would seem to indicate good effective speed-up from 1 GPU to 2. Do you have any estimate based on that? I would guess as high as 1.8 or even better.
it certainly does show the 32 batch nodes are way more effectively used than the 512 batch.

Not sure what would happen if you set batch=1, you might crash it. But a batch=1 (or batch=2, if 1 won't work) versus a batch=512 will show up how many of those 512 nodes get not to be used at all. Well, not how many, but it will give an idea.

Graph of ELO diff/batch size ?
I will do that, but first I have to derive how much Elo gives the doubling from 2500 to 5000 nodes/move.