Lc0 OpenCL benchmark with 128x10 network

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Max
Posts: 247
Joined: Tue Apr 13, 2010 10:41 am

Lc0 OpenCL benchmark with 128x10 network

Post by Max »

We all know that Nvidia RTX GPUs are king of the Lc0 hill. Nevertheless let's collect some OpenCL benchmark values from our GPUs. It takes only some minutes.

Use Lc0 OpenCL 0.22.0 with network 56215 and run ./lc0 benchmark

Code: Select all

NPS	GPU (OpenCL)		System					OS
==================================================================================
 595	Intel 6100		MacBook Air 13" 2015, i5-5250U 		macOS 13.6
 353	Nvidia GT 650M		MacBook Pro 15" 2012, i7-3615QM		macOS 12.6
 155	Intel HD 505		Acer Spin 1, Pentium N4200		Win10 1903
Detailed how to (Windows):
- download OpenCl version of lc0 from https://github.com/LeelaChessZero/lc0/r ... ag/v0.22.0
- unpack to a folder of your choice (e.g. C:\Chess\Lc0)
- download weights file 56215 from https://lczero.org/networks/2
- copy the weights file into your lc0 folder
- open a CMD prompt and navigate to the lc0 folder
- run lc0 by typing lc0.exe

C:\Chess\Lc0>lc0.exe
_
| _ | |
|_ |_ |_| v0.22.0 built Aug 5 2019

- now type

go nodes 10

- with first run of lc0 the OpenCL SGEMM tuner starts. Depending on your GPU this can take some minutes. After lc0 finished type

quit

- and start lc0 again, but now with the option benchmark


C:\Chess\Lc0>lc0.exe benchmark
_
| _ | |
|_ |_ |_| v0.22.0 built Aug 5 2019
Found pb network file: ./weights_run2_56215.pb.gz
Creating backend [opencl]...
OpenCL, maximum batch size set to 16.
Initializing OpenCL.
Detected 1 OpenCL platforms.
Platform version: OpenCL 1.2
Platform profile: FULL_PROFILE
Platform name: Intel(R) OpenCL
Platform vendor: Intel(R) Corporation
Device ID: 0
Device name: Intel(R) HD Graphics 505
Device type: GPU
Device vendor: Intel(R) Corporation
Device driver: 21.20.16.4526
Device speed: 750 MHZ
Device cores: 18 CU
Device score: 612
Device ID: 1
Device name: Intel(R) Pentium(R) CPU N4200 @ 1.10GHz
Device type: CPU
Device vendor: Intel(R) Corporation
Device driver: 6.6.0.336
Device speed: 1100 MHZ
Device cores: 4 CU
Device score: 512
Selected platform: Intel(R) OpenCL
Selected device: Intel(R) HD Graphics 505

with OpenCL 1.2 capability.
Loaded existing SGEMM tuning for batch size 16.
1 warning generated.
Wavefront/Warp size: 8

Max workgroup size: 256
Max workgroup dimensions: 256 256 256
Benchmark time 236ms, 2 nodes, 8 nps, move e2e4
Benchmark time 351ms, 5 nodes, 14 nps, move e2e4
Benchmark time 658ms, 7 nodes, 10 nps, move e2e4
Benchmark time 670ms, 9 nodes, 13 nps, move e2e4
Benchmark time 968ms, 14 nodes, 14 nps, move e2e4
Benchmark time 1259ms, 32 nodes, 25 nps, move e2e4
Benchmark time 1645ms, 47 nodes, 28 nps, move e2e4
Benchmark time 1878ms, 68 nodes, 36 nps, move e2e4
Benchmark time 2175ms, 90 nodes, 41 nps, move e2e4
Benchmark time 2489ms, 122 nodes, 49 nps, move e2e4
Benchmark time 2770ms, 151 nodes, 54 nps, move e2e4
Benchmark time 3109ms, 190 nodes, 61 nps, move e2e4
Benchmark time 3389ms, 240 nodes, 70 nps, move e2e4
Benchmark time 3667ms, 295 nodes, 80 nps, move e2e4
Benchmark time 4050ms, 349 nodes, 86 nps, move e2e4
Benchmark time 4430ms, 402 nodes, 90 nps, move e2e4
Benchmark time 4946ms, 526 nodes, 106 nps, move e2e4
Benchmark time 5385ms, 595 nodes, 110 nps, move e2e4
Benchmark time 5546ms, 658 nodes, 118 nps, move e2e4
Benchmark time 5569ms, 681 nodes, 122 nps, move e2e4
Benchmark time 5966ms, 759 nodes, 127 nps, move e2e4
Benchmark time 6644ms, 933 nodes, 140 nps, move e2e4
Benchmark time 7048ms, 1033 nodes, 146 nps, move e2e4
Benchmark time 7099ms, 1093 nodes, 153 nps, move e2e4
bestmove e2e4
Benchmark final time 7.45788s calculating 155.808 nodes per second.

- add your values to the above list
Hope we're not just the biological boot loader for digital super intelligence. Unfortunately, that is increasingly probable - Elon Musk
Leo
Posts: 1080
Joined: Fri Sep 16, 2016 6:55 pm
Location: USA/Minnesota
Full name: Leo Anger

Re: Lc0 OpenCL benchmark with 128x10 network

Post by Leo »

Good idea.
Advanced Micro Devices fan.
Max
Posts: 247
Joined: Tue Apr 13, 2010 10:41 am

Re: Lc0 OpenCL benchmark with 128x10 network

Post by Max »

Leo wrote: Thu Aug 08, 2019 5:54 pmGood idea.
Leo, where is your competitor? 8-)

We got a new reverse leader :wink: who offers less?

Code: Select all

NPS	GPU (OpenCL)		System					OS
==================================================================================
 595	Intel 6100		MacBook Air 13" 2015, i5-5250U 		macOS 13.6
 353	Nvidia GT 650M		MacBook Pro 15" 2012, i7-3615QM		macOS 12.6
 155	Intel HD 505		Acer Spin 1, Pentium N4200		Win10 1903
  11	Intel HD		Medion E1232T, Celeron N2807		Win10 1903
Hope we're not just the biological boot loader for digital super intelligence. Unfortunately, that is increasingly probable - Elon Musk
Max
Posts: 247
Joined: Tue Apr 13, 2010 10:41 am

Re: Lc0 OpenCL benchmark with 128x10 network

Post by Max »

Somebody send me this result without more infos, except that it runs Windows 10.

Do you have any idea, what this system / gpu could be?
lc0.exe
_
| _ | |
|_ |_ |_| v0.22.0 built Aug 5 2019
go nodes 10
Found pb network file: ./weights_run2_56265.pb.gz
Creating backend [opencl]...
OpenCL, maximum batch size set to 16.
Initializing OpenCL.
Detected 1 OpenCL platforms.
Platform version: OpenCL 2.0 AMD-APP (1800.11)
Platform profile: FULL_PROFILE
Platform name: AMD Accelerated Parallel Processing
Platform vendor: Advanced Micro Devices, Inc.
Device ID: 0
Device name: Cedar
Device type: GPU
Device vendor: Advanced Micro Devices, Inc.
Device driver: 1800.11 (VM)
Device speed: 500 MHZ
Device cores: 2 CU
Device score: 1120
Device ID: 1
Device name: Intel(R) Atom(TM) CPU D525 @ 1.80GHz
Device type: CPU
Device vendor: GenuineIntel
Device driver: 1800.11 (sse2)
Device speed: 1796 MHZ
Device cores: 4 CU
Device score: 520
Selected platform: AMD Accelerated Parallel Processing
Selected device: Cedar
with OpenCL 2.0 capability.
Started OpenCL SGEMM tuner with batch size 16.
Will try 578 valid configurations.
(1/578) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=16 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 17585.2 us (7.6 GFLOPS)
(2/578) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 16402.0 us (8.2 GFLOPS)
(4/578) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=16 NDIMB=8 NDIMC=8 NWG=32 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 15095.9 us (8.9 GFLOPS)
(66/578) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=2 VWN=1 9641.3 us (13.9 GFLOPS)
(67/578) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=2 VWN=1 8514.6 us (15.8 GFLOPS)
(114/578) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=4 VWN=1 5334.1 us (25.2 GFLOPS)
(222/578) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=4 VWN=2 4277.0 us (31.4 GFLOPS)
(224/578) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=32 SA=0 SB=0 STRM=0 STRN=0 VWM=4 VWN=2 3832.1 us (35.0 GFLOPS)
(281/578) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=0 SB=0 STRM=0 STRN=0 VWM=4 VWN=4 3397.2 us (39.5 GFLOPS)
(282/578) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=32 SA=0 SB=0 STRM=0 STRN=0 VWM=4 VWN=4 3286.1 us (40.8 GFLOPS)
Wavefront/Warp size: 32

Max workgroup size: 128
Max workgroup dimensions: 128 128 128
info depth 1 seldepth 2 time 720 nodes 2 score cp 30 hashfull 0 nps 2 tbhits 0 pv e2e4 e7e6
info depth 2 seldepth 3 time 958 nodes 3 score cp 40 hashfull 0 nps 3 tbhits 0 pv e2e4 e7e6 d2d4
info depth 2 seldepth 4 time 1505 nodes 4 score cp 35 hashfull 0 nps 2 tbhits 0 pv e2e4 e7e6 d2d4 d7d5
info depth 3 seldepth 5 time 2017 nodes 6 score cp 31 hashfull 0 nps 2 tbhits 0 pv e2e4 e7e6 d2d4 d7d5 b1c3
info depth 3 seldepth 6 time 2621 nodes 11 score cp 34 hashfull 0 nps 4 tbhits 0 pv e2e4 e7e6 b1c3 d7d5 d2d4 g8f6
bestmove e2e4 ponder e7e6
quit
lc0.exe benchmark
_
| _ | |
|_ |_ |_| v0.22.0 built Aug 5 2019
Found pb network file: ./weights_run2_56265.pb.gz
Creating backend [opencl]...
OpenCL, maximum batch size set to 16.
Initializing OpenCL.
Detected 1 OpenCL platforms.
Platform version: OpenCL 2.0 AMD-APP (1800.11)
Platform profile: FULL_PROFILE
Platform name: AMD Accelerated Parallel Processing
Platform vendor: Advanced Micro Devices, Inc.
Device ID: 0
Device name: Cedar
Device type: GPU
Device vendor: Advanced Micro Devices, Inc.
Device driver: 1800.11 (VM)
Device speed: 500 MHZ
Device cores: 2 CU
Device score: 1120
Device ID: 1
Device name: Intel(R) Atom(TM) CPU D525 @ 1.80GHz
Device type: CPU
Device vendor: GenuineIntel
Device driver: 1800.11 (sse2)
Device speed: 1796 MHZ
Device cores: 4 CU
Device score: 520
Selected platform: AMD Accelerated Parallel Processing
Selected device: Cedar
with OpenCL 2.0 capability.
Loaded existing SGEMM tuning for batch size 16.
Wavefront/Warp size: 32

Max workgroup size: 128
Max workgroup dimensions: 128 128 128
Benchmark time 772ms, 2 nodes, 2 nps, move e2e4
Benchmark time 1010ms, 3 nodes, 2 nps, move e2e4
Benchmark time 1321ms, 5 nodes, 3 nps, move e2e4
Benchmark time 1627ms, 8 nodes, 4 nps, move e2e4
Benchmark time 1893ms, 14 nodes, 7 nps, move e2e4
Benchmark time 1903ms, 16 nodes, 8 nps, move e2e4
Benchmark time 2207ms, 24 nodes, 10 nps, move e2e4
Benchmark time 2443ms, 39 nodes, 15 nps, move e2e4
Benchmark time 2935ms, 60 nodes, 20 nps, move e2e4
Benchmark time 3596ms, 91 nodes, 25 nps, move e2e4
Benchmark time 4116ms, 110 nodes, 26 nps, move e2e4
Benchmark time 4123ms, 131 nodes, 31 nps, move e2e4
Benchmark time 4701ms, 179 nodes, 38 nps, move e2e4
Benchmark time 5279ms, 223 nodes, 42 nps, move e2e4
Benchmark time 5835ms, 279 nodes, 47 nps, move e2e4
Benchmark time 6371ms, 329 nodes, 51 nps, move e2e4
Benchmark time 6664ms, 349 nodes, 52 nps, move e2e4
Benchmark time 7046ms, 384 nodes, 54 nps, move e2e4
Benchmark time 7722ms, 469 nodes, 60 nps, move e2e4
Benchmark time 8298ms, 569 nodes, 68 nps, move e2e4
Benchmark time 8527ms, 624 nodes, 73 nps, move e2e4
bestmove e2e4
Benchmark final time 9.00009s calculating 74.777 nodes per second.

Code: Select all

NPS	GPU (OpenCL)		System					OS
==================================================================================
 595	Intel 6100		MacBook Air 13" 2015, i5-5250U 		macOS 13.6
 353	Nvidia GT 650M		MacBook Pro 15" 2012, i7-3615QM		macOS 12.6
 155	Intel HD 505		Acer Spin 1, Pentium N4200		Win10 1903
  74	?			?, Atom D525				Win10
  11	Intel HD		Medion E1232T, Celeron N2807		Win10 1903
Hope we're not just the biological boot loader for digital super intelligence. Unfortunately, that is increasingly probable - Elon Musk
smatovic
Posts: 2645
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Lc0 OpenCL benchmark with 128x10 network

Post by smatovic »

Max wrote: Tue Aug 13, 2019 12:28 pm Somebody send me this result without more infos, except that it runs Windows 10.

Do you have any idea, what this system / gpu could be?
...
https://www.techpowerup.com/gpu-specs/amd-cedar.g113

AMD Cedar, with outdated TeraScale architecture from 2011.

--
Srdja
De Noose Daniel
Posts: 29
Joined: Tue Dec 13, 2016 10:36 am

Re: Lc0 OpenCL benchmark with 128x10 network

Post by De Noose Daniel »

Sony Vaio Ultrabook 13"
CPU :
I5 4200U 1.6GHz
4Gb Ram
4CU
OpenCL 1.2 capabylity

GPU :
Intel HD Graphics 4400
20CU

412 nodes / second
De Noose Daniel
Posts: 29
Joined: Tue Dec 13, 2016 10:36 am

Re: Lc0 OpenCL benchmark with 128x10 network

Post by De Noose Daniel »

Dell Latitude E5570
CPU :
I5 6200U 2.3GHz
8Gb Ram
4CU
OpenCL 2.0 capabylity

GPU :
Intel HD Graphics 520
24CU

437 nodes / second
mar
Posts: 2555
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Lc0 OpenCL benchmark with 128x10 network

Post by mar »

GTX 1070, win 10:

Code: Select all

Benchmark final time 5.34555s calculating 8754.75 nodes per second.
Martin Sedlak
De Noose Daniel
Posts: 29
Joined: Tue Dec 13, 2016 10:36 am

Re: Lc0 OpenCL benchmark with 128x10 network

Post by De Noose Daniel »

HP EliteBook 8570w
CPU :
I7 3740QM 2.7GHz
8Gb Ram
8CU
OpenCL 2.0 capabylity

GPU :
AMD Firepro M4000
8CU

705 nodes / second
De Noose Daniel
Posts: 29
Joined: Tue Dec 13, 2016 10:36 am

Re: Lc0 OpenCL benchmark with 128x10 network

Post by De Noose Daniel »

Lenovo Thinkpad T430

CPU :
I7 3520M 2.9GHz
8Gb Ram
4CU
OpenCL 1.2 capabylity

GPU :
Intel HD Graphics 4000
16CU

260 nodes / second