how good is a GeForce GTX 1060 6GB for Leela ?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
JJJ
Posts: 1287
Joined: Sat Apr 19, 2014 11:47 am

how good is a GeForce GTX 1060 6GB for Leela ?

Post by JJJ » Mon Apr 30, 2018 3:29 am

All in the title, I d like to know how will perform leela in the average with it.

For the starting position I have en average of 2K nodes per secondes with it. Is it good ?

Albert Silver
Posts: 2867
Joined: Wed Mar 08, 2006 8:57 pm
Location: Rio de Janeiro, Brazil

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Post by Albert Silver » Mon Apr 30, 2018 4:36 am

JJJ wrote:All in the title, I d like to know how will perform leela in the average with it.

For the starting position I have en average of 2K nodes per secondes with it. Is it good ?
Be sure to run the full-tune on it. Then run it with the start position until ply 26 and see your average ply depth.

Even with an old i5-2500K and GTX1060 I get about 2250NPS in the benchmark I described. This is good for ~2900 CCRL.
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."

JJJ
Posts: 1287
Joined: Sat Apr 19, 2014 11:47 am

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Post by JJJ » Mon Apr 30, 2018 6:33 am

What do you mean the full tune on it ?
from starting position, my leela needed 1min26 to reach depth 26. I don't know if it is ok for my card or below.

Anyway, my leela is winning against Hakkapelitta. So that's nice already.

User avatar
AdminX
Posts: 5182
Joined: Mon Mar 13, 2006 1:34 pm
Location: Acworth, GA
Contact:

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Post by AdminX » Mon Apr 30, 2018 6:48 am

JJJ wrote:What do you mean the full tune on it ?
from starting position, my leela needed 1min26 to reach depth 26. I don't know if it is ok for my card or below.

Anyway, my leela is winning against Hakkapelitta. So that's nice already.
PS: Make sure you have the correct --gpu # if you have more than one GPU.

Code: Select all

lczero --gpu 0 --tune-only --full-tuner 
Last edited by AdminX on Mon Apr 30, 2018 6:50 am, edited 1 time in total.
"Good decisions come from experience, and experience comes from bad decisions."
__________________________________________________________________
Ted Summers

User avatar
Guenther
Posts: 3107
Joined: Wed Oct 01, 2008 4:33 am
Location: Regensburg, Germany
Full name: Guenther Simon
Contact:

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Post by Guenther » Mon Apr 30, 2018 6:50 am

JJJ wrote:What do you mean the full tune on it ?
from starting position, my leela needed 1min26 to reach depth 26. I don't know if it is ok for my card or below.

Anyway, my leela is winning against Hakkapelitta. So that's nice already.
At first start LCZero does an automatical tuning for what settings to use with your gpu. This is of course a standard tuning.
By doing a full tuning you can get a speed increase up to perhaps 150-200% in some cases.
Delete the automatically created file named leelaz_opencl_tuning and start the process like described below.

Run sth like this (adapt names/files) from commandline or add it to a batch file in case of windows.
This might run some time and you can see how each tried setting gets more GFlops out of your card.

Code: Select all

lczero07.exe --tune-only --full-tuner -w ID222
example for my very weak gpu which must be retuned now for the new NN size (NN ID222 is now at 15*192):

Code: Select all

C:\Engines\UCIPG\LCZero_07ID222>lczero07.exe --tune-only --full-tuner -w ID222
Using 2 thread(s).
Detecting residual layers...v2...192 channels...15 blocks.
Initializing OpenCL.
Detected 1 OpenCL platforms.
Platform version: OpenCL 1.2 CUDA 9.1.75
Platform profile: FULL_PROFILE
Platform name:    NVIDIA CUDA
Platform vendor:  NVIDIA Corporation
Device ID:     0
Device name:   GeForce GT 710
Device type:   GPU
Device vendor: NVIDIA Corporation
Device driver: 388.13
Device speed:  954 MHz
Device cores:  1 CU
Device score:  1112
Selected platform: NVIDIA CUDA
Selected device: GeForce GT 710
with OpenCL 1.2 capability.

Started OpenCL SGEMM tuner.
RNG seed: 0xec7a02 (thread: 701728073)
Will try 5279 valid configurations.
(1/5279) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 ST
RM=0 STRN=0 VWM=1 VWN=1 0.8444 ms (22.4 GFLOPS)
(18/5279) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=16 NDIMC=8 NWG=16 SA=0 SB=0
STRM=0 STRN=0 VWM=1 VWN=1 0.8431 ms (22.4 GFLOPS)
(93/5279) KWG=16 KWI=2 MDIMA=16 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0
STRM=0 STRN=0 VWM=2 VWN=1 0.6607 ms (28.6 GFLOPS)
(96/5279) KWG=32 KWI=2 MDIMA=32 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0
STRM=0 STRN=0 VWM=2 VWN=1 0.5157 ms (36.6 GFLOPS)
(119/5279) KWG=16 KWI=8 MDIMA=16 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0
 STRM=0 STRN=0 VWM=2 VWN=1 0.5133 ms (36.8 GFLOPS)
(133/5279) KWG=32 KWI=8 MDIMA=16 MDIMC=8 MWG=64 NDIMB=16 NDIMC=8 NWG=16 SA=0 SB=
0 STRM=0 STRN=0 VWM=2 VWN=1 0.5128 ms (36.8 GFLOPS)
(145/5279) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=16 NDIMC=8 NWG=16 SA=0 SB=0
 STRM=0 STRN=0 VWM=4 VWN=1 0.4856 ms (38.9 GFLOPS)
(171/5279) KWG=16 KWI=8 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0
STRM=0 STRN=0 VWM=8 VWN=1 0.4844 ms (39.0 GFLOPS)
(481/5279) KWG=16 KWI=2 MDIMA=16 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0
 STRM=1 STRN=0 VWM=4 VWN=1 0.4836 ms (39.0 GFLOPS)
(610/5279) KWG=16 KWI=2 MDIMA=16 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0
 STRM=1 STRN=0 VWM=4 VWN=2 0.4068 ms (46.4 GFLOPS)
(1555/5279) KWG=16 KWI=8 MDIMA=16 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=
0 STRM=0 STRN=0 VWM=1 VWN=2 0.3585 ms (52.6 GFLOPS)
(1577/5279) KWG=16 KWI=2 MDIMA=32 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=
0 STRM=0 STRN=0 VWM=2 VWN=2 0.3254 ms (58.0 GFLOPS)
(2547/5279) KWG=16 KWI=8 MDIMA=32 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=
0 STRM=1 STRN=1 VWM=2 VWN=2 0.2867 ms (65.8 GFLOPS)
...
Current foe list count : [101]
http://rwbc-chess.de/chronology.htm

shrapnel
Posts: 1245
Joined: Fri Nov 02, 2012 8:43 am
Location: New Delhi, India

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Post by shrapnel » Mon Apr 30, 2018 6:55 am

JJJ wrote:Anyway, my leela is winning against Hakkapelitta. So that's nice already.
Good to know. Glad that at least some Chess Engines have started to use the Power of GPUs.
If I get a Dual- 1080 Ti System, can it beat the latest Stockfish/Komodo ?
i7 5960X @ 4.1 Ghz, 64 GB G.Skill RipJaws RAM, Twin Asus ROG Strix OC 11 GB Geforce 2080 Tis

nabildanial
Posts: 104
Joined: Thu Jun 05, 2014 3:29 am
Location: Malaysia

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Post by nabildanial » Mon Apr 30, 2018 7:26 am

shrapnel wrote:
JJJ wrote:Anyway, my leela is winning against Hakkapelitta. So that's nice already.
Good to know. Glad that at least some Chess Engines have started to use the Power of GPUs.
If I get a Dual- 1080 Ti System, can it beat the latest Stockfish/Komodo ?
It doesn't support multiple GPU, at least not yet.

Nay Lin Tun
Posts: 529
Joined: Mon Jan 16, 2012 5:34 am

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Post by Nay Lin Tun » Mon Apr 30, 2018 9:33 am

I got around 2200 nps with my 1060. For benchmark, you can see in Ipman benchmark.

shrapnel
Posts: 1245
Joined: Fri Nov 02, 2012 8:43 am
Location: New Delhi, India

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Post by shrapnel » Mon Apr 30, 2018 1:05 pm

nabildanial wrote:It doesn't support multiple GPU, at least not yet.
OK, thanks.
Which would be better for Chess, an nVidia TitanXp or the Geforce 1080 Ti ?
i7 5960X @ 4.1 Ghz, 64 GB G.Skill RipJaws RAM, Twin Asus ROG Strix OC 11 GB Geforce 2080 Tis

Albert Silver
Posts: 2867
Joined: Wed Mar 08, 2006 8:57 pm
Location: Rio de Janeiro, Brazil

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Post by Albert Silver » Mon Apr 30, 2018 4:15 pm

Guenther wrote:
JJJ wrote:What do you mean the full tune on it ?
from starting position, my leela needed 1min26 to reach depth 26. I don't know if it is ok for my card or below.

Anyway, my leela is winning against Hakkapelitta. So that's nice already.
At first start LCZero does an automatical tuning for what settings to use with your gpu. This is of course a standard tuning.
By doing a full tuning you can get a speed increase up to perhaps 150-200% in some cases.
Delete the automatically created file named leelaz_opencl_tuning and start the process like described below.

Run sth like this (adapt names/files) from commandline or add it to a batch file in case of windows.
This might run some time and you can see how each tried setting gets more GFlops out of your card.

Code: Select all

lczero07.exe --tune-only --full-tuner -w ID222
example for my very weak gpu which must be retuned now for the new NN size (NN ID222 is now at 15*192):

Code: Select all

C:\Engines\UCIPG\LCZero_07ID222>lczero07.exe --tune-only --full-tuner -w ID222
Using 2 thread(s).
Detecting residual layers...v2...192 channels...15 blocks.
Initializing OpenCL.
Detected 1 OpenCL platforms.
Platform version: OpenCL 1.2 CUDA 9.1.75
Platform profile: FULL_PROFILE
Platform name:    NVIDIA CUDA
Platform vendor:  NVIDIA Corporation
Device ID:     0
Device name:   GeForce GT 710
Device type:   GPU
Device vendor: NVIDIA Corporation
Device driver: 388.13
Device speed:  954 MHz
Device cores:  1 CU
Device score:  1112
Selected platform: NVIDIA CUDA
Selected device: GeForce GT 710
with OpenCL 1.2 capability.

Started OpenCL SGEMM tuner.
RNG seed: 0xec7a02 (thread: 701728073)
Will try 5279 valid configurations.
(1/5279) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 ST
...
Yes, here is what I got on my laptop:

Code: Select all

C:\Users\Albert\Chess\Leela Zero\GPU>lczero.exe -t3 -w weights.txt --full-tuner
Using 3 thread(s).
Detecting residual layers...v2...192 channels...15 blocks.
Initializing OpenCL.
Device name:   Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
Device type:   CPU
Device vendor: Intel(R) Corporation
Device driver: 7.6.0.611
Device speed:  2600 MHz
Device cores:  8 CU
Device score:  521
Platform version: OpenCL 1.2 CUDA 9.1.84
Platform profile: FULL_PROFILE
Platform name:    NVIDIA CUDA
Platform vendor:  NVIDIA Corporation
Device ID:     2
Device name:   GeForce GTX 980M
Device type:   GPU
Device vendor: NVIDIA Corporation
Device driver: 391.35
Device speed:  1126 MHz
Device cores:  12 CU
Device score:  1112
Selected platform: NVIDIA CUDA
Selected device: GeForce GTX 980M
with OpenCL 1.2 capability.

Started OpenCL SGEMM tuner.
RNG seed: 0x65d47141 (thread: 2783254248)
Will try 5117 valid configurations.
(1/5117) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.1067 ms (177.0 GFLOPS)
(6/5117) KWG=32 KWI=2 MDIMA=8 MDIMC=16 MWG=32 NDIMB=8 NDIMC=16 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0946 ms (199.5 GFLOPS)
(9/5117) KWG=16 KWI=2 MDIMA=16 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0894 ms (211.1 GFLOPS)
(79/5117) KWG=16 KWI=8 MDIMA=32 MDIMC=32 MWG=64 NDIMB=16 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0651 ms (289.9 GFLOPS)
(566/5117) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=1 STRN=0 VWM=2 VWN=2 0.0594 ms (317.6 GFLOPS)
(853/5117) KWG=16 KWI=2 MDIMA=8 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=1 VWM=2 VWN=2 0.0571 ms (330.4 GFLOPS)
(1276/5117) KWG=32 KWI=2 MDIMA=16 MDIMC=16 MWG=32 NDIMB=16 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0551 ms (342.8 GFLOPS)
(1278/5117) KWG=32 KWI=2 MDIMA=16 MDIMC=8 MWG=32 NDIMB=16 NDIMC=16 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0530 ms (356.2 GFLOPS)
(1306/5117) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0501 ms (377.0 GFLOPS)
(1348/5117) KWG=16 KWI=2 MDIMA=32 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=2 VWN=1 0.0484 ms (390.2 GFLOPS)
(1404/5117) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=16 NDIMC=16 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=4 VWN=1 0.0444 ms (424.8 GFLOPS)
(1504/5117) KWG=16 KWI=8 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=4 VWN=2 0.0424 ms (444.7 GFLOPS)
(1837/5117) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=1 STRN=0 VWM=4 VWN=2 0.0421 ms (447.9 GFLOPS)
(3906/5117) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=1 0.0399 ms (473.1 GFLOPS)
(3921/5117) KWG=16 KWI=2 MDIMA=32 MDIMC=32 MWG=64 NDIMB=16 NDIMC=8 NWG=16 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=1 0.0374 ms (504.1 GFLOPS)
(3942/5117) KWG=32 KWI=8 MDIMA=16 MDIMC=32 MWG=64 NDIMB=16 NDIMC=8 NWG=16 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=1 0.0348 ms (542.6 GFLOPS)
(4400/5117) KWG=32 KWI=8 MDIMA=8 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=1 STRM=1 STRN=0 VWM=2 VWN=2 0.0332 ms (568.9 GFLOPS)
Wavefront/Warp size: 32
Max workgroup size: 1024
Max workgroup dimensions: 1024 1024 64
BLAS Core: Haswell
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."

Post Reply