Lc0 OpenCL benchmark with 128x10 network

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

De Noose Daniel
Posts: 29
Joined: Tue Dec 13, 2016 10:36 am

Re: Lc0 OpenCL benchmark with 128x10 network

Post by De Noose Daniel »

Asus ROG G73S

CPU :
I7 2630QM 2.0GHz
16Gb Ram
8CU
OpenCL 1.2 capabylity

GPU :
Nvidia Geforce GTX 460M
4CU

505 nodes / second
Max
Posts: 247
Joined: Tue Apr 13, 2010 10:41 am

Re: Lc0 OpenCL benchmark with 128x10 network

Post by Max »

Thanks for the benchmarks guys .. gimme more :mrgreen:

And thanks for the link to the AMD Cedar. In the meantime I was informed, that this is a mediacenter.
Interesting combo: Leela Chess Zero on a slow Atom D525 paired with AMD Radeon. But hey .. it runs Netflix. 8-)

Code: Select all

NPS	GPU (OpenCL)		System					OS
==================================================================================
8754	Nvidia GTX 1070							Win10
 705	AMD Firepro M4000	HP EliteBook 8570w, i7-3740QM		Win10
 595	Intel 6100		MacBook Air 13" 2015, i5-5250U 		macOS 13.6
 545	Nvidia Geforce GTX 460M	Asus ROG G73S, i7-2630QM		Win10
 437	Intel HD 520		Dell Latitude E5570, i5-6200U		Win10
 412	Intel HD 4400		Sony Vaio Ultrabook 13", i5-4200U	Win10
 353	Nvidia GT 650M		MacBook Pro 15" 2012, i7-3615QM		macOS 12.6
 260	Intel HD 4000		Lenovo Thinkpad T430, i7-3520M		Win10
 155	Intel HD 505		Acer Spin 1, Pentium N4200		Win10 1903
  74	ATI Radeon HD 5430M	Arctic MediaCenter MC001, Atom D525	Win10
  11	Intel HD		Medion E1232T, Celeron N2807		Win10 1903
Value for Asus ROG G73S was updated to 545 nps via PM.
Hope we're not just the biological boot loader for digital super intelligence. Unfortunately, that is increasingly probable - Elon Musk
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Lc0 OpenCL benchmark with 128x10 network

Post by mar »

GTX 1080, Win10:

Code: Select all

Benchmark final time 5.18833s calculating 10703.1 nodes per second.
GTX 1050 Ti, Win10 laptop:

Code: Select all

Benchmark final time 5.48895s calculating 3986.38 nodes per second.
Intel HD 630, Win10 laptop:

Code: Select all

Benchmark final time 7.49431s calculating 573.902 nodes per second.
Martin Sedlak
User avatar
pedrox
Posts: 1056
Joined: Fri Mar 10, 2006 6:07 am
Location: Basque Country (Spain)

Re: Lc0 OpenCL benchmark with 128x10 network

Post by pedrox »

GTX 750 Ti, Win10, AMD FX-8300:

Code: Select all

Benchmark final time 5.66103s calculating 2493.19 nodes per second.
Max
Posts: 247
Joined: Tue Apr 13, 2010 10:41 am

Re: Lc0 OpenCL benchmark with 128x10 network

Post by Max »

Can't new Intel and AMD GPUs compete under OpenCL?

Code: Select all

NPS	GPU (OpenCL)		System					OS
==================================================================================
10703	Nvidia GTX 1080		Desktop					Win10
 8754	Nvidia GTX 1070		Desktop					Win10
 3986	Nvidia GTX 1050 Ti	Laptop					Win10
 2493	Nvidia GTX 750 Ti	Desktop, AMD FX-8300			Win10
  705	AMD Firepro M4000	HP EliteBook 8570w, i7-3740QM		Win10
  595	Intel 6100		MacBook Air 13" 2015, i5-5250U 		macOS 13.6
  573	Intel HD 630		Laptop					Win10
  545	Nvidia GTX 460M		Asus ROG G73S, i7-2630QM		Win10
  437	Intel HD 520		Dell Latitude E5570, i5-6200U		Win10
  412	Intel HD 4400		Sony Vaio Ultrabook 13", i5-4200U	Win10
  353	Nvidia GT 650M		MacBook Pro 15" 2012, i7-3615QM		macOS 12.6
  260	Intel HD 4000		Lenovo Thinkpad T430, i7-3520M		Win10
  155	Intel HD 505		Acer Spin 1, Pentium N4200		Win10
   74	ATI Radeon HD 5430M	Arctic MediaCenter MC001, Atom D525	Win10
   11	Intel HD		Medion E1232T, Celeron N2807		Win10
Hope we're not just the biological boot loader for digital super intelligence. Unfortunately, that is increasingly probable - Elon Musk
Modern Times
Posts: 3546
Joined: Thu Jun 07, 2012 11:02 pm

Re: Lc0 OpenCL benchmark with 128x10 network

Post by Modern Times »

GTX 1050
AMD FX-8350
Windows 10
NPS 3,579
C:\temp4>lc0.exe benchmark
_
| _ | |
|_ |_ |_| v0.22.0 built Aug 5 2019
Found pb network file: ./weights_run2_56215.pb.gz
Creating backend [opencl]...
OpenCL, maximum batch size set to 16.
Initializing OpenCL.
Detected 1 OpenCL platforms.
Platform version: OpenCL 1.2 CUDA 10.2.120
Platform profile: FULL_PROFILE
Platform name: NVIDIA CUDA
Platform vendor: NVIDIA Corporation
Device ID: 0
Device name: GeForce GTX 1050
Device type: GPU
Device vendor: NVIDIA Corporation
Device driver: 430.86
Device speed: 1518 MHZ
Device cores: 5 CU
Device score: 1112
Selected platform: NVIDIA CUDA
Selected device: GeForce GTX 1050
with OpenCL 1.2 capability.
Loaded existing SGEMM tuning for batch size 16.
Wavefront/Warp size: 32

Max workgroup size: 1024
Max workgroup dimensions: 1024 1024 64
Benchmark time 101ms, 2 nodes, 19 nps, move e2e4
Benchmark time 130ms, 4 nodes, 30 nps, move e2e4
Benchmark time 146ms, 9 nodes, 61 nps, move e2e4
Benchmark time 161ms, 18 nodes, 111 nps, move e2e4
Benchmark time 181ms, 29 nodes, 160 nps, move e2e4
Benchmark time 189ms, 33 nodes, 174 nps, move e2e4
Benchmark time 205ms, 45 nodes, 219 nps, move e2e4
Benchmark time 229ms, 68 nodes, 296 nps, move e2e4
Benchmark time 255ms, 78 nodes, 305 nps, move e2e4
Benchmark time 277ms, 102 nodes, 368 nps, move e2e4
Benchmark time 299ms, 138 nodes, 461 nps, move e2e4
Benchmark time 322ms, 173 nodes, 537 nps, move e2e4
Benchmark time 344ms, 223 nodes, 648 nps, move e2e4
Benchmark time 379ms, 275 nodes, 725 nps, move e2e4
Benchmark time 406ms, 313 nodes, 770 nps, move e2e4
Benchmark time 409ms, 355 nodes, 867 nps, move e2e4
Benchmark time 427ms, 378 nodes, 885 nps, move e2e4
Benchmark time 450ms, 470 nodes, 1044 nps, move e2e4
Benchmark time 483ms, 569 nodes, 1178 nps, move e2e4
Benchmark time 534ms, 730 nodes, 1367 nps, move e2e4
Benchmark time 559ms, 825 nodes, 1475 nps, move e2e4
Benchmark time 580ms, 941 nodes, 1622 nps, move e2e4
Benchmark time 670ms, 1212 nodes, 1808 nps, move e2e4
Benchmark time 701ms, 1357 nodes, 1935 nps, move e2e4
Benchmark time 805ms, 1698 nodes, 2109 nps, move e2e4
Benchmark time 1052ms, 2783 nodes, 2645 nps, move e2e4
Benchmark time 1080ms, 2882 nodes, 2668 nps, move e2e4
Benchmark time 1101ms, 2922 nodes, 2653 nps, move e2e4
Benchmark time 1124ms, 2959 nodes, 2632 nps, move e2e4
Benchmark time 1144ms, 2982 nodes, 2606 nps, move e2e4
Benchmark time 1194ms, 3243 nodes, 2716 nps, move e2e4
Benchmark time 1239ms, 3531 nodes, 2849 nps, move e2e4
Benchmark time 1411ms, 4178 nodes, 2961 nps, move e2e4
Benchmark time 1536ms, 4661 nodes, 3034 nps, move e2e4
Benchmark time 1851ms, 5684 nodes, 3070 nps, move e2e4
Benchmark time 1852ms, 5782 nodes, 3122 nps, move e2e4
Benchmark time 1916ms, 6024 nodes, 3144 nps, move e2e4
Benchmark time 3175ms, 10822 nodes, 3408 nps, move e2e4
Benchmark time 3586ms, 12274 nodes, 3422 nps, move e2e4
Benchmark time 3820ms, 13107 nodes, 3431 nps, move e2e4
Benchmark time 5509ms, 19776 nodes, 3589 nps, move e2e4
bestmove e2e4
Benchmark final time 5.55779s calculating 3579.48 nodes per second.
Max
Posts: 247
Joined: Tue Apr 13, 2010 10:41 am

Re: Lc0 OpenCL benchmark with 128x10 network

Post by Max »

Code: Select all

NPS	GPU (OpenCL)		System					OS
===================================================================================
10703	Nvidia GTX 1080		Desktop					Win10
 9061	Nvidia Tesla T4		Google Colab (*)			Linux <-new
 8754	Nvidia GTX 1070		Desktop					Win10
 3986	Nvidia GTX 1050 Ti	Laptop					Win10
 3579	Nvidia GTX 1050		Desktop, AMD FX-8350			Win10
 2493	Nvidia GTX 750 Ti	Desktop, AMD FX-8300			Win10
  705	AMD Firepro M4000	HP EliteBook 8570w, i7-3740QM		Win10
  595	Intel 6100		MacBook Air 13" 2015, i5-5250U 		macOS 13.6
  573	Intel HD 630		Laptop					Win10
  545	Nvidia GTX 460M		Asus ROG G73S, i7-2630QM		Win10
  437	Intel HD 520		Dell Latitude E5570, i5-6200U		Win10
  412	Intel HD 4400		Sony Vaio Ultrabook 13", i5-4200U	Win10
  353	Nvidia GT 650M		MacBook Pro 15" 2012, i7-3615QM		macOS 12.6
  260	Intel HD 4000		Lenovo Thinkpad T430, i7-3520M		Win10
  155	Intel HD 505		Acer Spin 1, Pentium N4200		Win10
   74	ATI Radeon HD 5430M	Arctic MediaCenter MC001, Atom D525	Win10
   11	Intel HD		Medion E1232T, Celeron N2807		Win10
(*) Benchmark of a Nvidia Tesla T4 @ Google Colab with 3 backends for 3 different weights sizes:

backend = opencl

Code: Select all

size	net	nps
----------------------
128x10	56215	9061.2
256x20	42850	1043.7
320x24	60260	1013.9
backend = cudnn

Code: Select all

size	net	nps
----------------------
128x10	56215	22502.7
256x20	42850	 4026.5
320x24	60260	 2189.9
backend = cudnn-fp16

Code: Select all

size	net	nps
----------------------
128x10	56215	73823.8
256x20	42850	11956.2
320x24	60260	 5956.6
./lc0/build/lc0 benchmark -w 56215 -b opencl

Code: Select all

       _
|   _ | |
|_ |_ |_| v0.22.0 built Aug 16 2019
Loading weights file from: 56215
Creating backend [opencl]...
OpenCL, maximum batch size set to 16.
Initializing OpenCL.
Detected 1 OpenCL platforms.
Platform version: OpenCL 1.2 CUDA 10.0.211
Platform profile: FULL_PROFILE
Platform name:    NVIDIA CUDA
Platform vendor:  NVIDIA Corporation
Device ID:      0
Device name:    Tesla T4
Device type:    GPU
Device vendor:  NVIDIA Corporation
Device driver:  410.79
Device speed:   1590 MHZ
Device cores:   40 CU
Device score:   1112
Selected platform: NVIDIA CUDA
Selected device: Tesla T4
with OpenCL 1.2 capability.
Loaded existing SGEMM tuning for batch size 16.
Wavefront/Warp size: 32

Max workgroup size: 1024
Max workgroup dimensions: 1024 1024 64
Benchmark time 88ms, 2 nodes, 22 nps, move e2e4
Benchmark time 100ms, 5 nodes, 50 nps, move e2e4
Benchmark time 112ms, 9 nodes, 80 nps, move e2e4
Benchmark time 122ms, 18 nodes, 147 nps, move e2e4
Benchmark time 134ms, 29 nodes, 216 nps, move e2e4
Benchmark time 145ms, 33 nodes, 227 nps, move e2e4
Benchmark time 150ms, 45 nodes, 300 nps, move e2e4
Benchmark time 164ms, 59 nodes, 359 nps, move e2e4
Benchmark time 178ms, 73 nodes, 410 nps, move e2e4
Benchmark time 194ms, 102 nodes, 525 nps, move e2e4
Benchmark time 208ms, 135 nodes, 649 nps, move e2e4
Benchmark time 229ms, 173 nodes, 755 nps, move e2e4
Benchmark time 245ms, 218 nodes, 889 nps, move e2e4
Benchmark time 261ms, 267 nodes, 1022 nps, move e2e4
Benchmark time 291ms, 301 nodes, 1034 nps, move e2e4
Benchmark time 291ms, 339 nodes, 1164 nps, move e2e4
Benchmark time 312ms, 399 nodes, 1278 nps, move e2e4
Benchmark time 323ms, 477 nodes, 1476 nps, move e2e4
Benchmark time 331ms, 552 nodes, 1667 nps, move e2e4
Benchmark time 344ms, 665 nodes, 1933 nps, move e2e4
Benchmark time 358ms, 781 nodes, 2181 nps, move e2e4
Benchmark time 369ms, 889 nodes, 2409 nps, move e2e4
Benchmark time 405ms, 1155 nodes, 2851 nps, move e2e4
Benchmark time 430ms, 1318 nodes, 3065 nps, move e2e4
Benchmark time 478ms, 1762 nodes, 3686 nps, move e2e4
Benchmark time 499ms, 1890 nodes, 3787 nps, move e2e4
Benchmark time 564ms, 2502 nodes, 4436 nps, move e2e4
Benchmark time 581ms, 2701 nodes, 4648 nps, move e2e4
Benchmark time 614ms, 2910 nodes, 4739 nps, move e2e4
Benchmark time 636ms, 3091 nodes, 4860 nps, move e2e4
Benchmark time 663ms, 3382 nodes, 5101 nps, move e2e4
Benchmark time 684ms, 3536 nodes, 5169 nps, move e2e4
Benchmark time 707ms, 3784 nodes, 5352 nps, move e2e4
Benchmark time 737ms, 4053 nodes, 5499 nps, move e2e4
Benchmark time 896ms, 5448 nodes, 6080 nps, move e2e4
Benchmark time 951ms, 5904 nodes, 6208 nps, move e2e4
Benchmark time 1675ms, 12273 nodes, 7327 nps, move e2e4
Benchmark time 1770ms, 13118 nodes, 7411 nps, move e2e4
Benchmark time 1823ms, 13714 nodes, 7522 nps, move e2e4
Benchmark time 4173ms, 36470 nodes, 8739 nps, move e2e4
Benchmark time 5104ms, 45864 nodes, 8985 nps, move e2e4
Benchmark time 5302ms, 47989 nodes, 9051 nps, move e2e4
bestmove e2e4
Benchmark final time 5.31629s calculating 9061.2 nodes per second.
Hope we're not just the biological boot loader for digital super intelligence. Unfortunately, that is increasingly probable - Elon Musk
User avatar
Giorgio Medeot
Posts: 52
Joined: Fri Jan 29, 2010 2:01 pm
Location: Ivrea, Italy

Re: Lc0 OpenCL benchmark with 128x10 network

Post by Giorgio Medeot »

Intel HD Graphics 620
Win10
HP EliteBook 850 G4
NPS 487.4

Code: Select all

C:\Chess\lc0> .\lc0.exe benchmark
       _
|   _ | |
|_ |_ |_| v0.22.0 built Aug  5 2019
Found pb network file: C:\Chess\lc0/56215.pb.gz
Creating backend [opencl]...
OpenCL, maximum batch size set to 16.
Initializing OpenCL.
Detected 1 OpenCL platforms.
Platform version: OpenCL 2.1
Platform profile: FULL_PROFILE
Platform name:    Intel(R) OpenCL
Platform vendor:  Intel(R) Corporation
Device ID:      0
Device name:    Intel(R) HD Graphics 620
Device type:    GPU
Device vendor:  Intel(R) Corporation
Device driver:  25.20.100.6472
Device speed:   1050 MHZ
Device cores:   24 CU
Device score:   621
Device ID:      1
Device name:    Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
Device type:    CPU
Device vendor:  Intel(R) Corporation
Device driver:  7.6.0.0
Device speed:   2700 MHZ
Device cores:   4 CU
Device score:   521
Selected platform: Intel(R) OpenCL
Selected device: Intel(R) HD Graphics 620
with OpenCL 2.1 capability.
Loaded existing SGEMM tuning for batch size 16.
Wavefront/Warp size: 8

Max workgroup size: 256
Max workgroup dimensions: 256 256 256
Benchmark time 90ms, 17 nodes, 188 nps, move b1c3
Benchmark time 222ms, 20 nodes, 90 nps, move f2f3
Benchmark time 274ms, 75 nodes, 273 nps, move b1c3
Benchmark time 377ms, 102 nodes, 270 nps, move b1c3
Benchmark time 644ms, 147 nodes, 228 nps, move g2g3
Benchmark time 659ms, 152 nodes, 230 nps, move g2g3
Benchmark time 781ms, 183 nodes, 234 nps, move e2e4
Benchmark time 880ms, 313 nodes, 355 nps, move a2a3
Benchmark time 1197ms, 494 nodes, 412 nps, move d2d4
Benchmark time 1298ms, 502 nodes, 386 nps, move d2d4
Benchmark time 1538ms, 582 nodes, 378 nps, move d2d4
Benchmark time 6565ms, 3032 nodes, 461 nps, move d2d4
Benchmark time 6626ms, 3280 nodes, 495 nps, move d2d4
Benchmark time 7474ms, 3627 nodes, 485 nps, move c2c3
Benchmark time 7862ms, 3721 nodes, 473 nps, move c2c3
Benchmark time 8862ms, 4266 nodes, 481 nps, move c2c3
bestmove c2c3
Benchmark final time 9.16905s calculating 487.401 nodes per second.
Max
Posts: 247
Joined: Tue Apr 13, 2010 10:41 am

Re: Lc0 OpenCL benchmark with 128x10 network

Post by Max »

Nvidia Tesla K80, the slower companion of Nvidia Tesla T4 from Google Colab.

Code: Select all

NPS	GPU (OpenCL)		System					OS
===================================================================================
10703	Nvidia GTX 1080		Desktop					Win10
 9150	Nvidia Tesla T4		Google Colab (*)			Linux
 8754	Nvidia GTX 1070		Desktop					Win10
 4829	Nvidia Tesla K80	Google Colab (*)			Linux <-new
 3986	Nvidia GTX 1050 Ti	Laptop					Win10
 3579	Nvidia GTX 1050		Desktop, AMD FX-8350			Win10
 2493	Nvidia GTX 750 Ti	Desktop, AMD FX-8300			Win10
  705	AMD Firepro M4000	HP EliteBook 8570w, i7-3740QM		Win10
  595	Intel 6100		MacBook Air 13" 2015, i5-5250U 		macOS 13.6
  573	Intel HD 630		Laptop					Win10
  545	Nvidia GTX 460M		Asus ROG G73S, i7-2630QM		Win10
  487	Intel HD 620		HP EliteBook 850 G4			Win10
  437	Intel HD 520		Dell Latitude E5570, i5-6200U		Win10
  412	Intel HD 4400		Sony Vaio Ultrabook 13", i5-4200U	Win10
  353	Nvidia GT 650M		MacBook Pro 15" 2012, i7-3615QM		macOS 12.6
  260	Intel HD 4000		Lenovo Thinkpad T430, i7-3520M		Win10
  155	Intel HD 505		Acer Spin 1, Pentium N4200		Win10
   74	ATI Radeon HD 5430M	Arctic MediaCenter MC001, Atom D525	Win10
   11	Intel HD		Medion E1232T, Celeron N2807		Win10
(*) Nvidia Tesla T4 & K80 @ Google Colab

Benchmark of Nvidia Tesla K80 with 2 backends (K80 does't support the faster cudnn-fp16) for 3 different weights sizes:

backend = opencl

Code: Select all

size	net	nps
----------------------
128x10	56215	4829.4
256x20	42850	 584.0
320x24	60260	 493.9
backend = cudnn

Code: Select all

size	net	nps
----------------------
128x10	56215	6888.3
256x20	42850	1296.3
320x24	60260	 787.8
./lc0/build/lc0 benchmark -w 56215 -b opencl

Code: Select all

       _
|   _ | |
|_ |_ |_| v0.22.0 built Aug 19 2019
Loading weights file from: 56215
Creating backend [opencl]...
OpenCL, maximum batch size set to 16.
Initializing OpenCL.
Detected 1 OpenCL platforms.
Platform version: OpenCL 1.2 CUDA 10.0.211
Platform profile: FULL_PROFILE
Platform name:    NVIDIA CUDA
Platform vendor:  NVIDIA Corporation
Device ID:      0
Device name:    Tesla K80
Device type:    GPU
Device vendor:  NVIDIA Corporation
Device driver:  410.79
Device speed:   823 MHZ
Device cores:   13 CU
Device score:   1112
Selected platform: NVIDIA CUDA
Selected device: Tesla K80
with OpenCL 1.2 capability.
Loaded existing SGEMM tuning for batch size 16.
Wavefront/Warp size: 32

Max workgroup size: 1024
Max workgroup dimensions: 1024 1024 64
Benchmark time 67ms, 2 nodes, 29 nps, move e2e4
Benchmark time 96ms, 3 nodes, 31 nps, move e2e4
Benchmark time 122ms, 6 nodes, 49 nps, move e2e4
Benchmark time 122ms, 9 nodes, 73 nps, move e2e4
Benchmark time 136ms, 18 nodes, 132 nps, move e2e4
Benchmark time 153ms, 29 nodes, 189 nps, move e2e4
Benchmark time 166ms, 33 nodes, 198 nps, move e2e4
Benchmark time 174ms, 45 nodes, 258 nps, move e2e4
Benchmark time 191ms, 59 nodes, 308 nps, move e2e4
Benchmark time 209ms, 73 nodes, 349 nps, move e2e4
Benchmark time 228ms, 102 nodes, 447 nps, move e2e4
Benchmark time 248ms, 135 nodes, 544 nps, move e2e4
Benchmark time 269ms, 171 nodes, 635 nps, move e2e4
Benchmark time 287ms, 199 nodes, 693 nps, move e2e4
Benchmark time 310ms, 246 nodes, 793 nps, move e2e4
Benchmark time 354ms, 306 nodes, 864 nps, move e2e4
Benchmark time 354ms, 336 nodes, 949 nps, move e2e4
Benchmark time 381ms, 389 nodes, 1020 nps, move e2e4
Benchmark time 397ms, 478 nodes, 1204 nps, move e2e4
Benchmark time 419ms, 545 nodes, 1300 nps, move e2e4
Benchmark time 455ms, 663 nodes, 1457 nps, move e2e4
Benchmark time 484ms, 780 nodes, 1611 nps, move e2e4
Benchmark time 550ms, 1065 nodes, 1936 nps, move e2e4
Benchmark time 615ms, 1316 nodes, 2139 nps, move e2e4
Benchmark time 695ms, 1737 nodes, 2499 nps, move e2e4
Benchmark time 734ms, 1890 nodes, 2574 nps, move e2e4
Benchmark time 775ms, 2076 nodes, 2678 nps, move e2e4
Benchmark time 800ms, 2249 nodes, 2811 nps, move e2e4
Benchmark time 830ms, 2439 nodes, 2938 nps, move e2e4
Benchmark time 896ms, 2773 nodes, 3094 nps, move e2e4
Benchmark time 927ms, 2927 nodes, 3157 nps, move e2e4
Benchmark time 969ms, 3123 nodes, 3222 nps, move e2e4
Benchmark time 1004ms, 3261 nodes, 3248 nps, move e2e4
Benchmark time 1045ms, 3486 nodes, 3335 nps, move e2e4
Benchmark time 1321ms, 4965 nodes, 3758 nps, move e2e4
Benchmark time 1484ms, 5804 nodes, 3911 nps, move e2e4
Benchmark time 2703ms, 11878 nodes, 4394 nps, move e2e4
Benchmark time 2976ms, 13207 nodes, 4437 nps, move e2e4
Benchmark time 5299ms, 25396 nodes, 4792 nps, move e2e4
Benchmark time 5411ms, 25751 nodes, 4759 nps, move e2e4
bestmove e2e4
Benchmark final time 5.42483s calculating 4829.46 nodes per second.
Hope we're not just the biological boot loader for digital super intelligence. Unfortunately, that is increasingly probable - Elon Musk
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: Lc0 OpenCL benchmark with 128x10 network

Post by MikeB »

I know this was for opencl benchmarks, but just for kicks I ran it with a 2060 RTX Super (cudnn-fp16)

Code: Select all

michaelb7@Threadripper-32:~/cluster.mfb$ lc0 benchmark
       _
|   _ | |
|_ |_ |_| v0.23.2+git.c8d9095 built Jan  9 2020
Found pb network file: ./128x10-se-distill-ccrl-11248.pb.gz
Creating backend [cudnn-auto]...
Switching to [cudnn-fp16]...
CUDA Runtime version: 10.2.0
Cudnn version: 7.6.5
Latest version of CUDA supported by the driver: 10.2.0
GPU: GeForce RTX 2060 SUPER
GPU memory: 7.7923 Gb
GPU clock frequency: 1695 MHz
GPU compute capability: 7.5
Benchmark time 25ms, 7 nodes, 1000 nps, move d2d4
Benchmark time 26ms, 19 nodes, 2375 nps, move g2g3
Benchmark time 27ms, 35 nodes, 3888 nps, move g2g3
Benchmark time 29ms, 59 nodes, 5363 nps, move e2e3
Benchmark time 30ms, 73 nodes, 5615 nps, move g2g3
Benchmark time 32ms, 100 nodes, 7142 nps, move g2g3
Benchmark time 33ms, 111 nodes, 7400 nps, move e2e4
Benchmark time 34ms, 120 nodes, 7500 nps, move e2e4
Benchmark time 35ms, 148 nodes, 8705 nps, move g1f3
Benchmark time 36ms, 169 nodes, 9388 nps, move g1f3
Benchmark time 40ms, 259 nodes, 11772 nps, move c2c4
Benchmark time 41ms, 300 nodes, 13043 nps, move g1f3
Benchmark time 42ms, 325 nodes, 13541 nps, move g1f3
Benchmark time 44ms, 411 nodes, 15807 nps, move c2c4
Benchmark time 47ms, 515 nodes, 17758 nps, move g1f3
Benchmark time 56ms, 771 nodes, 20289 nps, move g1f3
Benchmark time 60ms, 1023 nodes, 23790 nps, move g1f3
Benchmark time 84ms, 1525 nodes, 23106 nps, move g1f3
Benchmark time 88ms, 1744 nodes, 24914 nps, move g1f3
Benchmark time 93ms, 2051 nodes, 27346 nps, move d2d4
Benchmark time 95ms, 2142 nodes, 27818 nps, move d2d4
Benchmark time 106ms, 2602 nodes, 29568 nps, move d2d4
Benchmark time 123ms, 3359 nodes, 31688 nps, move d2d4
Benchmark time 216ms, 9451 nodes, 47732 nps, move d2d4
Benchmark time 223ms, 10023 nodes, 48892 nps, move d2d4
Benchmark time 243ms, 11017 nodes, 48747 nps, move d2d4
Benchmark time 249ms, 11604 nodes, 50233 nps, move d2d4
Benchmark time 337ms, 18663 nodes, 58504 nps, move d2d4
Benchmark time 501ms, 34510 nodes, 71449 nps, move d2d4
Benchmark time 663ms, 48414 nodes, 75060 nps, move d2d4
Benchmark time 742ms, 55440 nodes, 76574 nps, move d2d4
Benchmark time 1224ms, 112779 nodes, 93437 nps, move d2d4
Benchmark time 1246ms, 115673 nodes, 94196 nps, move d2d4
Benchmark time 1518ms, 149134 nodes, 99422 nps, move d2d4
Benchmark time 2005ms, 211803 nodes, 106594 nps, move d2d4
Benchmark time 2274ms, 246094 nodes, 109084 nps, move d2d4
Benchmark time 3475ms, 394188 nodes, 114026 nps, move d2d4
Benchmark time 7000ms, 795262 nodes, 113901 nps, move d2d4
Benchmark time 10000ms, 1106784 nodes, 110877 nps, move d2d4
bestmove d2d4
Benchmark final time 10.0043s calculating 110656 nodes per second.
Image