Leela ver.0.25.0

zullil · Post by **zullil** » Fri May 01, 2020 1:25 am

corres wrote: ↑Fri May 01, 2020 12:19 am
zullil wrote: ↑Thu Apr 30, 2020 9:26 pm
corres wrote: ↑Thu Apr 30, 2020 5:23 pm
For the interest I made test with my RTX 2080 Ti OC TURBO Threads=2, net was T40B.4-160, using Leela ver.0.25.0
max.speed = 50.0 kn/sec (Depth=18, Time=100sec, Nodes=4970 kn)

Note
The version 0.25.0 is a good Leela, it works well and its compilation was flawless.
Did you mean JHorthos T40B.4-106?
No.
I used T40B.4-160 net (TCEC16)
Leela bench is a new possibility, so I can not compare the bench result to my earlier ones.
If you want, use it. I do not need it.

Yes, my mistake. I've found now found the T40B.4-160 network.

Since I also have a RTX 2080 Ti, I was simply curious what benchmark you would obtain using the standard built-in Lc0 benchmark, which uses default settings for all options.

Here's mine:

Code: Select all

$ ./lc0 benchmark
       _
|   _ | |
|_ |_ |_| v0.26.0-dev+git.ad4b5f2 built Apr 30 2020
Found pb network file: ./T40B.4-160
Creating backend [cudnn-auto]...
Switching to [cudnn-fp16]...
CUDA Runtime version: 10.2.0
Cudnn version: 7.6.5
Latest version of CUDA supported by the driver: 10.2.0
GPU: GeForce RTX 2080 Ti
GPU memory: 10.7534 Gb
GPU clock frequency: 1635 MHz
GPU compute capability: 7.5

===========================
Total time (ms) : 340977
Nodes searched  : 16651961
Nodes/second    : 48836

corres · Post by **corres** » Fri May 01, 2020 8:46 pm

corres wrote: ↑Thu Apr 30, 2020 5:23 pm To check the enhancement I made for my DUAL (= 2 x RTX 2060, Backend=Multiplexing, Threads=4, NNCachesize=20000000, others are old default) two version of Leela: ver.0.23.3 and ver.0.25.0.
The net was JHorthos T40B.4-160 (20x256).
Results (after go nodes 10000000)
DUAL with Leela ver.0.25.0
max.speed = 55.1 kn/sec (Depth=21, Time=183 sec, Nodes=9705 kn)
DUAL with Leela ver.0.23.3
max.speed = 49.8 kn/sec (Depth=19, Time=123 sec Nodes=5980 kn))
The enhancement is ~10 %.
For the interest I made test with my RTX 2080 Ti OC TURBO Threads=2, net was T40B.4-160, using Leela ver.0.25.0
max.speed = 50.0 kn/sec (Depth=18, Time=100sec, Nodes=4970 kn)

Note
The version 0.25.0 is a good Leela, it works well and its compilation was flawless.

Version 0.25.1 is good for huge nets (512x30) too.

corres · Post by **corres** » Fri May 01, 2020 9:28 pm

corres wrote: ↑Fri May 01, 2020 8:46 pm
corres wrote: ↑Thu Apr 30, 2020 5:23 pm To check the enhancement I made for my DUAL (= 2 x RTX 2060, Backend=Multiplexing, Threads=4, NNCachesize=20000000, others are old default) two version of Leela: ver.0.23.3 and ver.0.25.0.
The net was JHorthos T40B.4-160 (20x256).
Results (after go nodes 10000000)
DUAL with Leela ver.0.25.0
max.speed = 55.1 kn/sec (Depth=21, Time=183 sec, Nodes=9705 kn)
DUAL with Leela ver.0.23.3
max.speed = 49.8 kn/sec (Depth=19, Time=123 sec Nodes=5980 kn))
The enhancement is ~10 %.
For the interest I made test with my RTX 2080 Ti OC TURBO Threads=2, net was T40B.4-160, using Leela ver.0.25.0
max.speed = 50.0 kn/sec (Depth=18, Time=100sec, Nodes=4970 kn)

Note
The version 0.25.0 is a good Leela, it works well and its compilation was flawless.
Version 0.25.1 is good for huge nets (512x40) too. (Sorry)

Laskos · Post by **Laskos** » Sun May 03, 2020 7:30 am

corres wrote: ↑Fri May 01, 2020 8:46 pm
corres wrote: ↑Thu Apr 30, 2020 5:23 pm To check the enhancement I made for my DUAL (= 2 x RTX 2060, Backend=Multiplexing, Threads=4, NNCachesize=20000000, others are old default) two version of Leela: ver.0.23.3 and ver.0.25.0.
The net was JHorthos T40B.4-160 (20x256).
Results (after go nodes 10000000)
DUAL with Leela ver.0.25.0
max.speed = 55.1 kn/sec (Depth=21, Time=183 sec, Nodes=9705 kn)
DUAL with Leela ver.0.23.3
max.speed = 49.8 kn/sec (Depth=19, Time=123 sec Nodes=5980 kn))
The enhancement is ~10 %.
For the interest I made test with my RTX 2080 Ti OC TURBO Threads=2, net was T40B.4-160, using Leela ver.0.25.0
max.speed = 50.0 kn/sec (Depth=18, Time=100sec, Nodes=4970 kn)

Note
The version 0.25.0 is a good Leela, it works well and its compilation was flawless.
Version 0.25.1 is good for huge nets (512x30) too.

On benchmark I am getting from my RTX 2070 with T40B.4-160 a 28.6 kns mark. What do you mean that 0.25.1 is good with huge nets? I am getting with DX backend 15% better speed than using cudnn-fp16 backend with 512x40 SV net and 9% better speed with 384x30 SV net. With T40 256x20 nets, DX is slower by about 15% than cudnn-fp16. So, in my case, with huge nets, DX backend is better, be it v0.24 or v0.25 or v0.25.1

corres · Post by **corres** » Sun May 03, 2020 8:40 am

Laskos wrote: ↑Sun May 03, 2020 7:30 am
corres wrote: ↑Fri May 01, 2020 8:46 pm
corres wrote: ↑Thu Apr 30, 2020 5:23 pm To check the enhancement I made for my DUAL (= 2 x RTX 2060, Backend=Multiplexing, Threads=4, NNCachesize=20000000, others are old default) two version of Leela: ver.0.23.3 and ver.0.25.0.
The net was JHorthos T40B.4-160 (20x256).
Results (after go nodes 10000000)
DUAL with Leela ver.0.25.0
max.speed = 55.1 kn/sec (Depth=21, Time=183 sec, Nodes=9705 kn)
DUAL with Leela ver.0.23.3
max.speed = 49.8 kn/sec (Depth=19, Time=123 sec Nodes=5980 kn))
The enhancement is ~10 %.
For the interest I made test with my RTX 2080 Ti OC TURBO Threads=2, net was T40B.4-160, using Leela ver.0.25.0
max.speed = 50.0 kn/sec (Depth=18, Time=100sec, Nodes=4970 kn)

Note
The version 0.25.0 is a good Leela, it works well and its compilation was flawless.
Version 0.25.1 is good for huge nets (512x40) too.
On benchmark I am getting from my RTX 2070 with T40B.4-160 a 28.6 kns mark. What do you mean that 0.25.1 is good with huge nets? I am getting with DX backend 15% better speed than using cudnn-fp16 backend with 512x40 SV net and 9% better speed with 384x30 SV net. With T40 256x20 nets, DX is slower by about 15% than cudnn-fp16. So, in my case, with huge nets, DX backend is better, be it v0.24 or v0.25 or v0.25.1

Developers (and some users) stated ver.0.25.0 showed a fault in the case of huge net and gtx 1660 GPU. In ver.0.25.1 they repaired the issue.
I tried ver. 0.25.0 with SV 512x40 on RTX GPU-s and Cudnn-fp16 backend without any issue.

Laskos · Post by **Laskos** » Sun May 03, 2020 8:53 am

corres wrote: ↑Sun May 03, 2020 8:40 am
Laskos wrote: ↑Sun May 03, 2020 7:30 am
corres wrote: ↑Fri May 01, 2020 8:46 pm
corres wrote: ↑Thu Apr 30, 2020 5:23 pm To check the enhancement I made for my DUAL (= 2 x RTX 2060, Backend=Multiplexing, Threads=4, NNCachesize=20000000, others are old default) two version of Leela: ver.0.23.3 and ver.0.25.0.
The net was JHorthos T40B.4-160 (20x256).
Results (after go nodes 10000000)
DUAL with Leela ver.0.25.0
max.speed = 55.1 kn/sec (Depth=21, Time=183 sec, Nodes=9705 kn)
DUAL with Leela ver.0.23.3
max.speed = 49.8 kn/sec (Depth=19, Time=123 sec Nodes=5980 kn))
The enhancement is ~10 %.
For the interest I made test with my RTX 2080 Ti OC TURBO Threads=2, net was T40B.4-160, using Leela ver.0.25.0
max.speed = 50.0 kn/sec (Depth=18, Time=100sec, Nodes=4970 kn)

Note
The version 0.25.0 is a good Leela, it works well and its compilation was flawless.
Version 0.25.1 is good for huge nets (512x40) too.
On benchmark I am getting from my RTX 2070 with T40B.4-160 a 28.6 kns mark. What do you mean that 0.25.1 is good with huge nets? I am getting with DX backend 15% better speed than using cudnn-fp16 backend with 512x40 SV net and 9% better speed with 384x30 SV net. With T40 256x20 nets, DX is slower by about 15% than cudnn-fp16. So, in my case, with huge nets, DX backend is better, be it v0.24 or v0.25 or v0.25.1
Developers (and some users) stated ver.0.25.0 showed a fault in the case of huge net and gtx 1660 GPU. In ver.0.25.1 they repaired the issue.
I tried ver. 0.25.0 with SV 512x40 on RTX GPU-s and Cudnn-fp16 backend without any issue.

Ah, ok. So they simply work with huge nets. I thought that they in v0.25.1 improved the speed when using huge nets by using for example Winograd convolutions which are used in DX backend. With large and huge nets, I seem to get better results using DX backend. Elo-wise too. I guess cudnn-fp16 backend has a room for improvement.

corres · Post by **corres** » Sun May 03, 2020 10:49 am

Laskos wrote: ↑Sun May 03, 2020 8:53 am
corres wrote: ↑Sun May 03, 2020 8:40 am
Laskos wrote: ↑Sun May 03, 2020 7:30 am
corres wrote: ↑Fri May 01, 2020 8:46 pm
corres wrote: ↑Thu Apr 30, 2020 5:23 pm To check the enhancement I made for my DUAL (= 2 x RTX 2060, Backend=Multiplexing, Threads=4, NNCachesize=20000000, others are old default) two version of Leela: ver.0.23.3 and ver.0.25.0.
The net was JHorthos T40B.4-160 (20x256).
Results (after go nodes 10000000)
DUAL with Leela ver.0.25.0
max.speed = 55.1 kn/sec (Depth=21, Time=183 sec, Nodes=9705 kn)
DUAL with Leela ver.0.23.3
max.speed = 49.8 kn/sec (Depth=19, Time=123 sec Nodes=5980 kn))
The enhancement is ~10 %.
For the interest I made test with my RTX 2080 Ti OC TURBO Threads=2, net was T40B.4-160, using Leela ver.0.25.0
max.speed = 50.0 kn/sec (Depth=18, Time=100sec, Nodes=4970 kn)

Note
The version 0.25.0 is a good Leela, it works well and its compilation was flawless.
Version 0.25.1 is good for huge nets (512x40) too.
On benchmark I am getting from my RTX 2070 with T40B.4-160 a 28.6 kns mark. What do you mean that 0.25.1 is good with huge nets? I am getting with DX backend 15% better speed than using cudnn-fp16 backend with 512x40 SV net and 9% better speed with 384x30 SV net. With T40 256x20 nets, DX is slower by about 15% than cudnn-fp16. So, in my case, with huge nets, DX backend is better, be it v0.24 or v0.25 or v0.25.1
Developers (and some users) stated ver.0.25.0 showed a fault in the case of huge net and gtx 1660 GPU. In ver.0.25.1 they repaired the issue.
I tried ver. 0.25.0 with SV 512x40 on RTX GPU-s and Cudnn-fp16 backend without any issue.
Ah, ok. So they simply work with huge nets. I thought that they in v0.25.1 improved the speed when using huge nets by using for example Winograd convolutions which are used in DX backend. With large and huge nets, I seem to get better results using DX backend. Elo-wise too. I guess cudnn-fp16 backend has a room for improvement.

Cudnn backend can be improved only by NVIDIA.

Laskos · Post by **Laskos** » Sun May 03, 2020 6:33 pm

corres wrote: ↑Sun May 03, 2020 10:49 am
Laskos wrote: ↑Sun May 03, 2020 8:53 am
corres wrote: ↑Sun May 03, 2020 8:40 am
Laskos wrote: ↑Sun May 03, 2020 7:30 am
corres wrote: ↑Fri May 01, 2020 8:46 pm
corres wrote: ↑Thu Apr 30, 2020 5:23 pm To check the enhancement I made for my DUAL (= 2 x RTX 2060, Backend=Multiplexing, Threads=4, NNCachesize=20000000, others are old default) two version of Leela: ver.0.23.3 and ver.0.25.0.
The net was JHorthos T40B.4-160 (20x256).
Results (after go nodes 10000000)
DUAL with Leela ver.0.25.0
max.speed = 55.1 kn/sec (Depth=21, Time=183 sec, Nodes=9705 kn)
DUAL with Leela ver.0.23.3
max.speed = 49.8 kn/sec (Depth=19, Time=123 sec Nodes=5980 kn))
The enhancement is ~10 %.
For the interest I made test with my RTX 2080 Ti OC TURBO Threads=2, net was T40B.4-160, using Leela ver.0.25.0
max.speed = 50.0 kn/sec (Depth=18, Time=100sec, Nodes=4970 kn)

Note
The version 0.25.0 is a good Leela, it works well and its compilation was flawless.
Version 0.25.1 is good for huge nets (512x40) too.
On benchmark I am getting from my RTX 2070 with T40B.4-160 a 28.6 kns mark. What do you mean that 0.25.1 is good with huge nets? I am getting with DX backend 15% better speed than using cudnn-fp16 backend with 512x40 SV net and 9% better speed with 384x30 SV net. With T40 256x20 nets, DX is slower by about 15% than cudnn-fp16. So, in my case, with huge nets, DX backend is better, be it v0.24 or v0.25 or v0.25.1
Developers (and some users) stated ver.0.25.0 showed a fault in the case of huge net and gtx 1660 GPU. In ver.0.25.1 they repaired the issue.
I tried ver. 0.25.0 with SV 512x40 on RTX GPU-s and Cudnn-fp16 backend without any issue.
Ah, ok. So they simply work with huge nets. I thought that they in v0.25.1 improved the speed when using huge nets by using for example Winograd convolutions which are used in DX backend. With large and huge nets, I seem to get better results using DX backend. Elo-wise too. I guess cudnn-fp16 backend has a room for improvement.
Cudnn backend can be improved only by NVIDIA.

I see you get a probable almost perfect NPS scaling with 2x RTX 2060 GPUs. Could you check that strength-wise the scaling is also almost perfect? It's fairly easy to check: play a match of fast games 2x GPUs using 3 threads against 1xGPU on 2 threads at 2x time control. You can use 15'' + 0.25'' TC for 2x GPUs against 30'' + 0.5'' TC for 1x GPU. 200 games. It will take you several hours. If the result will be almost equal (close to 0 Elo points difference), then the scaling is indeed nearly perfect. If 2x GPUs will come significantly worse (say -50 Elo points), then the strength-wise scaling to 2 GPUs is far from perfect. I don't remember someone doing such a scaling test, it's easy to do and important.

PS You will probably need some unbalanced openings to avoid 80%+ draw rate. I can post here a link to such an EPD file of openings.

corres · Post by **corres** » Sun May 03, 2020 6:44 pm

Laskos wrote: ↑Sun May 03, 2020 6:33 pm I see you get a probable almost perfect NPS scaling with 2x RTX 2060 GPUs. Could you check that strength-wise the scaling is also almost perfect? It's fairly easy to check: play a match of fast games 2x GPUs using 3 threads against 1xGPU on 2 threads at 2x time control. You can use 15'' + 0.25'' TC for 2x GPUs against 30'' + 0.5'' TC for 1x GPU. 200 games. It will take you several hours. If the result will be almost equal (close to 0 Elo points difference), then the scaling is indeed nearly perfect. If 2x GPUs will come significantly worse (say -50 Elo points), then the strength-wise scaling to 2 GPUs is far from perfect. I don't remember someone doing such a scaling test, it's easy to do and important.
PS You will probably need some unbalanced openings to avoid 80%+ draw rate. I can post here a link to such an EPD file of openings.

Maybe I will make a scaling test in the future but now my machine is busy playing in some matches.

Leela ver.0.25.0

Re: Leela ver.0.25.0

Re: Leela ver.0.25.0

Re: Leela ver.0.25.0

Re: Leela ver.0.25.0

Re: Leela ver.0.25.0

Re: Leela ver.0.25.0

Re: Leela ver.0.25.0

Re: Leela ver.0.25.0

Re: Leela ver.0.25.0