how good is a GeForce GTX 1060 6GB for Leela ?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Robert Flesher
Posts: 1280
Joined: Tue Aug 18, 2009 3:06 am

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Post by Robert Flesher »

Albert Silver wrote:
Guenther wrote:
JJJ wrote:What do you mean the full tune on it ?
from starting position, my leela needed 1min26 to reach depth 26. I don't know if it is ok for my card or below.

Anyway, my leela is winning against Hakkapelitta. So that's nice already.
At first start LCZero does an automatical tuning for what settings to use with your gpu. This is of course a standard tuning.
By doing a full tuning you can get a speed increase up to perhaps 150-200% in some cases.
Delete the automatically created file named leelaz_opencl_tuning and start the process like described below.

Run sth like this (adapt names/files) from commandline or add it to a batch file in case of windows.
This might run some time and you can see how each tried setting gets more GFlops out of your card.

Code: Select all

lczero07.exe --tune-only --full-tuner -w ID222
example for my very weak gpu which must be retuned now for the new NN size (NN ID222 is now at 15*192):

Code: Select all

C:\Engines\UCIPG\LCZero_07ID222>lczero07.exe --tune-only --full-tuner -w ID222
Using 2 thread(s).
Detecting residual layers...v2...192 channels...15 blocks.
Initializing OpenCL.
Detected 1 OpenCL platforms.
Platform version: OpenCL 1.2 CUDA 9.1.75
Platform profile: FULL_PROFILE
Platform name:    NVIDIA CUDA
Platform vendor:  NVIDIA Corporation
Device ID:     0
Device name:   GeForce GT 710
Device type:   GPU
Device vendor: NVIDIA Corporation
Device driver: 388.13
Device speed:  954 MHz
Device cores:  1 CU
Device score:  1112
Selected platform: NVIDIA CUDA
Selected device: GeForce GT 710
with OpenCL 1.2 capability.

Started OpenCL SGEMM tuner.
RNG seed: 0xec7a02 (thread: 701728073)
Will try 5279 valid configurations.
(1/5279) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 ST
...
Yes, here is what I got on my laptop:

Code: Select all

C:\Users\Albert\Chess\Leela Zero\GPU>lczero.exe -t3 -w weights.txt --full-tuner
Using 3 thread(s).
Detecting residual layers...v2...192 channels...15 blocks.
Initializing OpenCL.
Device name:   Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
Device type:   CPU
Device vendor: Intel(R) Corporation
Device driver: 7.6.0.611
Device speed:  2600 MHz
Device cores:  8 CU
Device score:  521
Platform version: OpenCL 1.2 CUDA 9.1.84
Platform profile: FULL_PROFILE
Platform name:    NVIDIA CUDA
Platform vendor:  NVIDIA Corporation
Device ID:     2
Device name:   GeForce GTX 980M
Device type:   GPU
Device vendor: NVIDIA Corporation
Device driver: 391.35
Device speed:  1126 MHz
Device cores:  12 CU
Device score:  1112
Selected platform: NVIDIA CUDA
Selected device: GeForce GTX 980M
with OpenCL 1.2 capability.

Started OpenCL SGEMM tuner.
RNG seed: 0x65d47141 (thread: 2783254248)
Will try 5117 valid configurations.
(1/5117) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.1067 ms (177.0 GFLOPS)
(6/5117) KWG=32 KWI=2 MDIMA=8 MDIMC=16 MWG=32 NDIMB=8 NDIMC=16 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0946 ms (199.5 GFLOPS)
(9/5117) KWG=16 KWI=2 MDIMA=16 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0894 ms (211.1 GFLOPS)
(79/5117) KWG=16 KWI=8 MDIMA=32 MDIMC=32 MWG=64 NDIMB=16 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0651 ms (289.9 GFLOPS)
(566/5117) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=1 STRN=0 VWM=2 VWN=2 0.0594 ms (317.6 GFLOPS)
(853/5117) KWG=16 KWI=2 MDIMA=8 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=1 VWM=2 VWN=2 0.0571 ms (330.4 GFLOPS)
(1276/5117) KWG=32 KWI=2 MDIMA=16 MDIMC=16 MWG=32 NDIMB=16 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0551 ms (342.8 GFLOPS)
(1278/5117) KWG=32 KWI=2 MDIMA=16 MDIMC=8 MWG=32 NDIMB=16 NDIMC=16 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0530 ms (356.2 GFLOPS)
(1306/5117) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0501 ms (377.0 GFLOPS)
(1348/5117) KWG=16 KWI=2 MDIMA=32 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=2 VWN=1 0.0484 ms (390.2 GFLOPS)
(1404/5117) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=16 NDIMC=16 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=4 VWN=1 0.0444 ms (424.8 GFLOPS)
(1504/5117) KWG=16 KWI=8 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=4 VWN=2 0.0424 ms (444.7 GFLOPS)
(1837/5117) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=1 STRN=0 VWM=4 VWN=2 0.0421 ms (447.9 GFLOPS)
(3906/5117) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=1 0.0399 ms (473.1 GFLOPS)
(3921/5117) KWG=16 KWI=2 MDIMA=32 MDIMC=32 MWG=64 NDIMB=16 NDIMC=8 NWG=16 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=1 0.0374 ms (504.1 GFLOPS)
(3942/5117) KWG=32 KWI=8 MDIMA=16 MDIMC=32 MWG=64 NDIMB=16 NDIMC=8 NWG=16 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=1 0.0348 ms (542.6 GFLOPS)
(4400/5117) KWG=32 KWI=8 MDIMA=8 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=1 STRM=1 STRN=0 VWM=2 VWN=2 0.0332 ms (568.9 GFLOPS)
Wavefront/Warp size: 32
Max workgroup size: 1024
Max workgroup dimensions: 1024 1024 64
BLAS Core: Haswell
[/quote



Curious when I try to run full tune i get the message " could not open weights file : network
Any ideas? I just renamed the id226 to network?
User avatar
AdminX
Posts: 6340
Joined: Mon Mar 13, 2006 2:34 pm
Location: Acworth, GA

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Post by AdminX »

Robert Flesher wrote:
Albert Silver wrote:
Guenther wrote:
JJJ wrote:What do you mean the full tune on it ?
from starting position, my leela needed 1min26 to reach depth 26. I don't know if it is ok for my card or below.

Anyway, my leela is winning against Hakkapelitta. So that's nice already.
At first start LCZero does an automatical tuning for what settings to use with your gpu. This is of course a standard tuning.
By doing a full tuning you can get a speed increase up to perhaps 150-200% in some cases.
Delete the automatically created file named leelaz_opencl_tuning and start the process like described below.

Run sth like this (adapt names/files) from commandline or add it to a batch file in case of windows.
This might run some time and you can see how each tried setting gets more GFlops out of your card.

Code: Select all

lczero07.exe --tune-only --full-tuner -w ID222
example for my very weak gpu which must be retuned now for the new NN size (NN ID222 is now at 15*192):

Code: Select all

C:\Engines\UCIPG\LCZero_07ID222>lczero07.exe --tune-only --full-tuner -w ID222
Using 2 thread(s).
Detecting residual layers...v2...192 channels...15 blocks.
Initializing OpenCL.
Detected 1 OpenCL platforms.
Platform version: OpenCL 1.2 CUDA 9.1.75
Platform profile: FULL_PROFILE
Platform name:    NVIDIA CUDA
Platform vendor:  NVIDIA Corporation
Device ID:     0
Device name:   GeForce GT 710
Device type:   GPU
Device vendor: NVIDIA Corporation
Device driver: 388.13
Device speed:  954 MHz
Device cores:  1 CU
Device score:  1112
Selected platform: NVIDIA CUDA
Selected device: GeForce GT 710
with OpenCL 1.2 capability.

Started OpenCL SGEMM tuner.
RNG seed: 0xec7a02 (thread: 701728073)
Will try 5279 valid configurations.
(1/5279) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 ST
...
Yes, here is what I got on my laptop:

Code: Select all

C:\Users\Albert\Chess\Leela Zero\GPU>lczero.exe -t3 -w weights.txt --full-tuner
Using 3 thread(s).
Detecting residual layers...v2...192 channels...15 blocks.
Initializing OpenCL.
Device name:   Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
Device type:   CPU
Device vendor: Intel(R) Corporation
Device driver: 7.6.0.611
Device speed:  2600 MHz
Device cores:  8 CU
Device score:  521
Platform version: OpenCL 1.2 CUDA 9.1.84
Platform profile: FULL_PROFILE
Platform name:    NVIDIA CUDA
Platform vendor:  NVIDIA Corporation
Device ID:     2
Device name:   GeForce GTX 980M
Device type:   GPU
Device vendor: NVIDIA Corporation
Device driver: 391.35
Device speed:  1126 MHz
Device cores:  12 CU
Device score:  1112
Selected platform: NVIDIA CUDA
Selected device: GeForce GTX 980M
with OpenCL 1.2 capability.

Started OpenCL SGEMM tuner.
RNG seed: 0x65d47141 (thread: 2783254248)
Will try 5117 valid configurations.
(1/5117) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.1067 ms (177.0 GFLOPS)
(6/5117) KWG=32 KWI=2 MDIMA=8 MDIMC=16 MWG=32 NDIMB=8 NDIMC=16 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0946 ms (199.5 GFLOPS)
(9/5117) KWG=16 KWI=2 MDIMA=16 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0894 ms (211.1 GFLOPS)
(79/5117) KWG=16 KWI=8 MDIMA=32 MDIMC=32 MWG=64 NDIMB=16 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0651 ms (289.9 GFLOPS)
(566/5117) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=1 STRN=0 VWM=2 VWN=2 0.0594 ms (317.6 GFLOPS)
(853/5117) KWG=16 KWI=2 MDIMA=8 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=1 VWM=2 VWN=2 0.0571 ms (330.4 GFLOPS)
(1276/5117) KWG=32 KWI=2 MDIMA=16 MDIMC=16 MWG=32 NDIMB=16 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0551 ms (342.8 GFLOPS)
(1278/5117) KWG=32 KWI=2 MDIMA=16 MDIMC=8 MWG=32 NDIMB=16 NDIMC=16 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0530 ms (356.2 GFLOPS)
(1306/5117) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0501 ms (377.0 GFLOPS)
(1348/5117) KWG=16 KWI=2 MDIMA=32 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=2 VWN=1 0.0484 ms (390.2 GFLOPS)
(1404/5117) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=16 NDIMC=16 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=4 VWN=1 0.0444 ms (424.8 GFLOPS)
(1504/5117) KWG=16 KWI=8 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=4 VWN=2 0.0424 ms (444.7 GFLOPS)
(1837/5117) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=1 STRN=0 VWM=4 VWN=2 0.0421 ms (447.9 GFLOPS)
(3906/5117) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=1 0.0399 ms (473.1 GFLOPS)
(3921/5117) KWG=16 KWI=2 MDIMA=32 MDIMC=32 MWG=64 NDIMB=16 NDIMC=8 NWG=16 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=1 0.0374 ms (504.1 GFLOPS)
(3942/5117) KWG=32 KWI=8 MDIMA=16 MDIMC=32 MWG=64 NDIMB=16 NDIMC=8 NWG=16 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=1 0.0348 ms (542.6 GFLOPS)
(4400/5117) KWG=32 KWI=8 MDIMA=8 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=1 STRM=1 STRN=0 VWM=2 VWN=2 0.0332 ms (568.9 GFLOPS)
Wavefront/Warp size: 32
Max workgroup size: 1024
Max workgroup dimensions: 1024 1024 64
BLAS Core: Haswell


Curious when I try to run full tune i get the message " could not open weights file : network
Any ideas? I just renamed the id226 to network?
I got that message once. It was because I forgot to extract the weights file from the weights_###.txt.gz package.
"Good decisions come from experience, and experience comes from bad decisions."
__________________________________________________________________
Ted Summers
User avatar
Guenther
Posts: 4606
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Post by Guenther »

Robert Flesher wrote:
Curious when I try to run full tune i get the message " could not open weights file : network
Any ideas? I just renamed the id226 to network?
If you renamed it to network you must of course add -network to the commandline.
Otherwise check if your system hasn't added a *.txt* file extension and
that you are able to see it.
(it should be set to visible always anyway, if you want to work with your computer)
https://rwbc-chess.de

trollwatch:
Chessqueen + chessica + AlexChess + Eduard + Sylwy
User avatar
Guenther
Posts: 4606
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Post by Guenther »

AdminX wrote: I got that message once. It was because I forgot to extract the weights file from the weights_###.txt.gz package.
Well, since quite a while that is not necessary anymore.
LCZero meanwhile reads the compressed file directly too.
https://rwbc-chess.de

trollwatch:
Chessqueen + chessica + AlexChess + Eduard + Sylwy
Vinvin
Posts: 5228
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Post by Vinvin »

Albert Silver wrote:
Guenther wrote:
JJJ wrote:What do you mean the full tune on it ?
from starting position, my leela needed 1min26 to reach depth 26. I don't know if it is ok for my card or below.

Anyway, my leela is winning against Hakkapelitta. So that's nice already.
At first start LCZero does an automatical tuning for what settings to use with your gpu. This is of course a standard tuning.
By doing a full tuning you can get a speed increase up to perhaps 150-200% in some cases.
Delete the automatically created file named leelaz_opencl_tuning and start the process like described below.

Run sth like this (adapt names/files) from commandline or add it to a batch file in case of windows.
This might run some time and you can see how each tried setting gets more GFlops out of your card.

Code: Select all

lczero07.exe --tune-only --full-tuner -w ID222
example for my very weak gpu which must be retuned now for the new NN size (NN ID222 is now at 15*192):

Code: Select all

C:\Engines\UCIPG\LCZero_07ID222>lczero07.exe --tune-only --full-tuner -w ID222
Using 2 thread(s).
Detecting residual layers...v2...192 channels...15 blocks.
Initializing OpenCL.
Detected 1 OpenCL platforms.
Platform version: OpenCL 1.2 CUDA 9.1.75
Platform profile: FULL_PROFILE
Platform name:    NVIDIA CUDA
Platform vendor:  NVIDIA Corporation
Device ID:     0
Device name:   GeForce GT 710
Device type:   GPU
Device vendor: NVIDIA Corporation
Device driver: 388.13
Device speed:  954 MHz
Device cores:  1 CU
Device score:  1112
Selected platform: NVIDIA CUDA
Selected device: GeForce GT 710
with OpenCL 1.2 capability.

Started OpenCL SGEMM tuner.
RNG seed: 0xec7a02 (thread: 701728073)
Will try 5279 valid configurations.
(1/5279) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 ST
...
Yes, here is what I got on my laptop:

Code: Select all

C:\Users\Albert\Chess\Leela Zero\GPU>lczero.exe -t3 -w weights.txt --full-tuner
Using 3 thread(s).
Detecting residual layers...v2...192 channels...15 blocks.
Initializing OpenCL.
Device name:   Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
Device type:   CPU
Device vendor: Intel(R) Corporation
Device driver: 7.6.0.611
Device speed:  2600 MHz
Device cores:  8 CU
Device score:  521
Platform version: OpenCL 1.2 CUDA 9.1.84
Platform profile: FULL_PROFILE
Platform name:    NVIDIA CUDA
Platform vendor:  NVIDIA Corporation
Device ID:     2
Device name:   GeForce GTX 980M
Device type:   GPU
Device vendor: NVIDIA Corporation
Device driver: 391.35
Device speed:  1126 MHz
Device cores:  12 CU
Device score:  1112
Selected platform: NVIDIA CUDA
Selected device: GeForce GTX 980M
with OpenCL 1.2 capability.

Started OpenCL SGEMM tuner.
RNG seed: 0x65d47141 (thread: 2783254248)
Will try 5117 valid configurations.
(1/5117) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.1067 ms (177.0 GFLOPS)
(6/5117) KWG=32 KWI=2 MDIMA=8 MDIMC=16 MWG=32 NDIMB=8 NDIMC=16 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0946 ms (199.5 GFLOPS)
(9/5117) KWG=16 KWI=2 MDIMA=16 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0894 ms (211.1 GFLOPS)
(79/5117) KWG=16 KWI=8 MDIMA=32 MDIMC=32 MWG=64 NDIMB=16 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0651 ms (289.9 GFLOPS)
(566/5117) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=1 STRN=0 VWM=2 VWN=2 0.0594 ms (317.6 GFLOPS)
(853/5117) KWG=16 KWI=2 MDIMA=8 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=1 VWM=2 VWN=2 0.0571 ms (330.4 GFLOPS)
(1276/5117) KWG=32 KWI=2 MDIMA=16 MDIMC=16 MWG=32 NDIMB=16 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0551 ms (342.8 GFLOPS)
(1278/5117) KWG=32 KWI=2 MDIMA=16 MDIMC=8 MWG=32 NDIMB=16 NDIMC=16 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0530 ms (356.2 GFLOPS)
(1306/5117) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0501 ms (377.0 GFLOPS)
(1348/5117) KWG=16 KWI=2 MDIMA=32 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=2 VWN=1 0.0484 ms (390.2 GFLOPS)
(1404/5117) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=16 NDIMC=16 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=4 VWN=1 0.0444 ms (424.8 GFLOPS)
(1504/5117) KWG=16 KWI=8 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=4 VWN=2 0.0424 ms (444.7 GFLOPS)
(1837/5117) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=1 STRN=0 VWM=4 VWN=2 0.0421 ms (447.9 GFLOPS)
(3906/5117) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=1 0.0399 ms (473.1 GFLOPS)
(3921/5117) KWG=16 KWI=2 MDIMA=32 MDIMC=32 MWG=64 NDIMB=16 NDIMC=8 NWG=16 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=1 0.0374 ms (504.1 GFLOPS)
(3942/5117) KWG=32 KWI=8 MDIMA=16 MDIMC=32 MWG=64 NDIMB=16 NDIMC=8 NWG=16 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=1 0.0348 ms (542.6 GFLOPS)
(4400/5117) KWG=32 KWI=8 MDIMA=8 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=1 STRM=1 STRN=0 VWM=2 VWN=2 0.0332 ms (568.9 GFLOPS)
Wavefront/Warp size: 32
Max workgroup size: 1024
Max workgroup dimensions: 1024 1024 64
BLAS Core: Haswell
Here are numbers from my GTX 750 Ti :

Code: Select all

>lczero.exe --tune-only --full-tuner -w weights.txt
Using 2 thread(s).
Detecting residual layers...v2...192 channels...15 blocks.
Initializing OpenCL.
Detected 1 OpenCL platforms.
Platform version: OpenCL 1.2 CUDA 9.1.75
Platform profile: FULL_PROFILE
Platform name:    NVIDIA CUDA
Platform vendor:  NVIDIA Corporation
Device ID:     0
Device name:   GeForce GTX 750 Ti
Device type:   GPU
Device vendor: NVIDIA Corporation
Device driver: 388.13
Device speed:  1110 MHz
Device cores:  5 CU
Device score:  1112
Selected platform: NVIDIA CUDA
Selected device: GeForce GTX 750 Ti
with OpenCL 1.2 capability.

Started OpenCL SGEMM tuner.
RNG seed: 0x41912f66 (thread: 886403780)
Will try 5128 valid configurations.
(1/5128) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=16 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.1962 ms (96.2 GFLOPS)
(15/5128) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=16 NDIMB=16 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.1960 ms (96.3 GFLOPS)
(20/5128) KWG=16 KWI=2 MDIMA=16 MDIMC=16 MWG=32 NDIMB=16 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.1816 ms (104.0 GFLOPS)
(26/5128) KWG=16 KWI=2 MDIMA=32 MDIMC=8 MWG=32 NDIMB=16 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.1613 ms (117.0 GFLOPS)
(54/5128) KWG=16 KWI=8 MDIMA=8 MDIMC=8 MWG=32 NDIMB=16 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.1612 ms (117.1 GFLOPS)
(55/5128) KWG=32 KWI=8 MDIMA=8 MDIMC=8 MWG=64 NDIMB=16 NDIMC=8 NWG=32 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.1466 ms (128.8 GFLOPS)
(71/5128) KWG=16 KWI=8 MDIMA=32 MDIMC=8 MWG=64 NDIMB=16 NDIMC=8 NWG=32 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.1436 ms (131.5 GFLOPS)
(95/5128) KWG=16 KWI=2 MDIMA=16 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=2 VWN=1 0.1121 ms (168.4 GFLOPS)
(136/5128) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=4 VWN=1 0.1038 ms (181.8 GFLOPS)
(257/5128) KWG=32 KWI=8 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=8 VWN=2 0.1007 ms (187.5 GFLOPS)
(1304/5128) KWG=32 KWI=2 MDIMA=32 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0841 ms (224.4 GFLOPS)
(1339/5128) KWG=32 KWI=8 MDIMA=32 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0768 ms (245.7 GFLOPS)
(1376/5128) KWG=16 KWI=2 MDIMA=16 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=2 VWN=1 0.0733 ms (257.5 GFLOPS)
(1441/5128) KWG=32 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=16 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=4 VWN=1 0.0716 ms (263.4 GFLOPS)
(1742/5128) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=1 STRN=0 VWM=4 VWN=1 0.0574 ms (328.7 GFLOPS)
(1755/5128) KWG=16 KWI=8 MDIMA=16 MDIMC=8 MWG=64 NDIMB=16 NDIMC=8 NWG=16 SA=1 SB=0 STRM=1 STRN=0 VWM=4 VWN=1 0.0566 ms (333.5 GFLOPS)
(2501/5128) KWG=16 KWI=8 MDIMA=16 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=1 STRN=1 VWM=2 VWN=2 0.0562 ms (335.9 GFLOPS)
(4276/5128) KWG=16 KWI=2 MDIMA=16 MDIMC=8 MWG=64 NDIMB=16 NDIMC=8 NWG=16 SA=1 SB=1 STRM=1 STRN=0 VWM=4 VWN=1 0.0542 ms (348.5 GFLOPS)
(5068/5128) KWG=16 KWI=8 MDIMA=32 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=1 STRM=1 STRN=1 VWM=2 VWN=2 0.0538 ms (350.6 GFLOPS)
User avatar
Guenther
Posts: 4606
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Post by Guenther »

Vinvin wrote:

Code: Select all

...

(5068/5128) KWG=16 KWI=8 MDIMA=32 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=1 STRM=1 STRN=1 VWM=2 VWN=2 0.0538 ms (350.6 GFLOPS)
Still we don't know how the GFlops correlate with nps. GFlops alone don't determine the speed, but memory and clockspeed are relevant too.
May be I try to calculate a formula from the data, if you also add that meanwhile 'self-established' benchmark of 'go infinite' and report for depth 26.
(note that I already asked for a way to establish benchmark stats, 5 weeks ago at the LCZero github site - the result was a bit disappointing and the
ways of measurement too)
https://rwbc-chess.de

trollwatch:
Chessqueen + chessica + AlexChess + Eduard + Sylwy
Robert Flesher
Posts: 1280
Joined: Tue Aug 18, 2009 3:06 am

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Post by Robert Flesher »

Guenther wrote:
AdminX wrote: I got that message once. It was because I forgot to extract the weights file from the weights_###.txt.gz package.
Well, since quite a while that is not necessary anymore.
LCZero meanwhile reads the compressed file directly too.


I have no idea what I am doing wrong but I cannot get it to run. I get the same message over and over! :evil:
User avatar
Guenther
Posts: 4606
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Post by Guenther »

Robert Flesher wrote:
Guenther wrote:
AdminX wrote: I got that message once. It was because I forgot to extract the weights file from the weights_###.txt.gz package.
Well, since quite a while that is not necessary anymore.
LCZero meanwhile reads the compressed file directly too.


I have no idea what I am doing wrong but I cannot get it to run. I get the same message over and over! :evil:
Can you describe exactly what you are doing and what files are there?
https://rwbc-chess.de

trollwatch:
Chessqueen + chessica + AlexChess + Eduard + Sylwy
Robert Flesher
Posts: 1280
Joined: Tue Aug 18, 2009 3:06 am

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Post by Robert Flesher »

Guenther wrote:
Robert Flesher wrote:
Guenther wrote:
AdminX wrote: I got that message once. It was because I forgot to extract the weights file from the weights_###.txt.gz package.
Well, since quite a while that is not necessary anymore.
LCZero meanwhile reads the compressed file directly too.


I have no idea what I am doing wrong but I cannot get it to run. I get the same message over and over! :evil:
Can you describe exactly what you are doing and what files are there?
C:\users\robert\desktop\lczero\lczero.exe --tune-only --full-tuner -w network


the id file is
is in the LCzero folder and named network
User avatar
Guenther
Posts: 4606
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Post by Guenther »

Robert Flesher wrote:
Guenther wrote:
Robert Flesher wrote:
Guenther wrote:
AdminX wrote: I got that message once. It was because I forgot to extract the weights file from the weights_###.txt.gz package.
Well, since quite a while that is not necessary anymore.
LCZero meanwhile reads the compressed file directly too.


I have no idea what I am doing wrong but I cannot get it to run. I get the same message over and over! :evil:
Can you describe exactly what you are doing and what files are there?
C:\users\robert\desktop\lczero\lczero.exe --tune-only --full-tuner -w network


the id file is
is in the LCzero folder and named network
Did you check that it is really renamed to network w/o any extension as I wrote already earlier? (file manager : display extensions for known file types)
https://rwbc-chess.de

trollwatch:
Chessqueen + chessica + AlexChess + Eduard + Sylwy