Stockfish Great Speed Improvements but which Version?

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Hai
Posts: 693
Joined: Sun Aug 04, 2013 1:19 pm

Re: Stockfish Great Speed Improvements but which Version?

Post by Hai »

smatovic wrote: Tue Aug 08, 2023 6:54 pm Homebrew for macOS?

Code: Select all

brew install stockfish

Code: Select all

brew install stockfish --head
https://formulae.brew.sh/formula/stockfish#default

AFAIK, all of your mentioned ARM silicon use NEON SIMD units, and as far as I can see, there is one section in SF NNUE code base for NEON present.

See:
https://github.com/official-stockfish/S ... efile#L320
and:
https://github.com/official-stockfish/S ... mmon.h#L44

So, short answer, compile by yourself for native arch, via Homebrew for example.

--
Srdja
I have read on macrumors that Stockfish is far away from being optimized when using SIMD.
Hai
Posts: 693
Joined: Sun Aug 04, 2013 1:19 pm

Re: Stockfish Great Speed Improvements but which Version?

Post by Hai »

smatovic wrote: Wed Aug 09, 2023 5:47 pm From the SF Makefile:

Code: Select all

ifeq ($(ARCH),apple-silicon)
	arch = arm64
	prefetch = yes
	popcnt = yes
	neon = yes
	dotprod = yes
	arm_version = 8
endif
"dotprod = yes" -> I assume if you compile from source (Homebrew) for Apple M-series, it will consider dotproduct-optimization, but I am not into the details.

--
Srdja
dotprod is new so will see a lot improvements here.
Probably Stockfish team can add more things like neon, dotprod… and optimize Stockfish a lot.
Hai
Posts: 693
Joined: Sun Aug 04, 2013 1:19 pm

Re: Stockfish Great Speed Improvements but which Version?

Post by Hai »

smatovic wrote: Thu Aug 10, 2023 8:18 am
My take, you will need a new approach to make use of the whole compute power available in Apple M-series silicon, CPU+SIMD+GPU+TPU via unified memory. Currently SF uses CPU+SIMD, and Lc0 uses CPU+GPU.

--
Srdja
Don‘t forget the: Neural Engine, Metal 3, …
Modern Times
Posts: 3742
Joined: Thu Jun 07, 2012 11:02 pm

Re: Stockfish Great Speed Improvements but which Version?

Post by Modern Times »

smatovic wrote: Thu Aug 10, 2023 8:18 am
Magnum wrote: Thu Aug 10, 2023 7:53 am ....
What could be improved?
My take, you will need a new approach to make use of the whole compute power available in Apple M-series silicon, CPU+SIMD+GPU+TPU via unified memory. Currently SF uses CPU+SIMD, and Lc0 uses CPU+GPU.

--
Srdja
How about Lc0 ? Apart from the fact that is needs CUDA for best performance, does this Apple unified architecture help it ? I'm comparing to say an Intel CPU with integrated graphics and no discrete graphics card (like an Intel 12700F say)
Hai
Posts: 693
Joined: Sun Aug 04, 2013 1:19 pm

Re: Stockfish Great Speed Improvements but which Version?

Post by Hai »

Ras wrote: Thu Aug 10, 2023 8:55 am M1 and M2 have already been improved software-wise. They are simply mediocre devices for chess, that's the ugly truth. Even an ROG Ally handheld console is better.

M3 might become a different story if Apple decides to use ARMv9 along with its scalable vector extensions instead of the ARMv8 of M1/M2. However, it remains to be seen what the exact implementation would be, and how useful it would be because "scalable" means exactly that, a range of possible implementations.
It should be obvious that Apple wouldn’t sell a device which can be easily beaten by a handheld.
If you calculate the one +75.63% speed up + the one +5% speed up from 2022 = +80.63%.
Now for fair comparison:
Implement them in the old Stockfish 14.1 and you will end up exactly between these two:
22.787.162 Intel Core i9-13900 ddr5 4800 CL36 32threads sse3-clang Maxim Masiutin L
22.492.977 Intel Core i9-13900 ddr5 4800 CL36 32threads icx Maxim Masiutin L

= A MacBook with M1 MAX is as fast as the i9-13900.
= You will reach 45.000.000 nodes/second if you have an M1 Ultra!!

https://ipmanchess.yolasite.com/amd--in ... ckfish.php

Apple M3 will use ARMv9. That’s not avoidable for many reasons.
smatovic
Posts: 3312
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Stockfish Great Speed Improvements but which Version?

Post by smatovic »

Modern Times wrote: Thu Aug 10, 2023 7:59 pm
smatovic wrote: Thu Aug 10, 2023 8:18 am
Magnum wrote: Thu Aug 10, 2023 7:53 am ....
What could be improved?
My take, you will need a new approach to make use of the whole compute power available in Apple M-series silicon, CPU+SIMD+GPU+TPU via unified memory. Currently SF uses CPU+SIMD, and Lc0 uses CPU+GPU.

--
Srdja
How about Lc0 ? Apart from the fact that is needs CUDA for best performance, does this Apple unified architecture help it ? I'm comparing to say an Intel CPU with integrated graphics and no discrete graphics card (like an Intel 12700F say)
Lc0 has already a dedicated Metal backend for Apple M-series GPUs and Ankan did some microbenchmarking of the neural engine. Apple has also notable memory bandwidth, to make use of all this you will need something new, Idk how that has to look like.

--
Srdja
Magnum
Posts: 195
Joined: Thu Feb 04, 2021 10:24 pm
Full name: Arnold Magnum

Re: Stockfish Great Speed Improvements but which Version?

Post by Magnum »

………
Magnum
Posts: 195
Joined: Thu Feb 04, 2021 10:24 pm
Full name: Arnold Magnum

Re: Stockfish Great Speed Improvements but which Version?

Post by Magnum »

smatovic wrote: Thu Aug 10, 2023 8:31 pm
Modern Times wrote: Thu Aug 10, 2023 7:59 pm
smatovic wrote: Thu Aug 10, 2023 8:18 am
Magnum wrote: Thu Aug 10, 2023 7:53 am ....
What could be improved?
My take, you will need a new approach to make use of the whole compute power available in Apple M-series silicon, CPU+SIMD+GPU+TPU via unified memory. Currently SF uses CPU+SIMD, and Lc0 uses CPU+GPU.

--
Srdja
We have enough Apple people here. Stockfish, LC0 and other developers only needs to ask here if these people can do some tests.
How about Lc0 ? Apart from the fact that is needs CUDA for best performance, does this Apple unified architecture help it ? I'm comparing to say an Intel CPU with integrated graphics and no discrete graphics card (like an Intel 12700F say)
Lc0 has already a dedicated Metal backend for Apple M-series GPUs and Ankan did some microbenchmarking of the neural engine. Apple has also notable memory bandwidth, to make use of all this you will need something new, Idk how that has to look like.

--
Srdja
We have enough Apple people here. Stockfish, LC0 and other developers only needs to ask here if these people can do some tests.
Magnum
Posts: 195
Joined: Thu Feb 04, 2021 10:24 pm
Full name: Arnold Magnum

Re: Stockfish Great Speed Improvements but which Version?

Post by Magnum »

Ras wrote: Thu Aug 10, 2023 8:55 am
Magnum wrote: Thu Aug 10, 2023 7:53 amActually it looks like developers can improve the speed of Stockfish on Apple M1, M2, M3 devices a lot.
M1 and M2 have already been improved software-wise. They are simply mediocre devices for chess, that's the ugly truth. Even an ROG Ally handheld console is better.

M3 might become a different story if Apple decides to use ARMv9 along with its scalable vector extensions instead of the ARMv8 of M1/M2. However, it remains to be seen what the exact implementation would be, and how useful it would be because "scalable" means exactly that, a range of possible implementations.
The new Stockfish DEV. which got the improvement for ARM CPUs now runs about 75% faster on my Apple device. The high kN/s are insane.
smatovic
Posts: 3312
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Stockfish Great Speed Improvements but which Version?

Post by smatovic »

Magnum wrote: Fri Aug 11, 2023 9:37 am ...
We have enough Apple people here. Stockfish, LC0 and other developers only needs to ask here if these people can do some tests.
If you intend to optimize for an given hardware architecture, you really need hands on the device. You can read papers, about instructions, latency and throughput, but in reality you need to do microbenchmarking for this stuff.

Nvidia was generous in past with hardware donations to researchers, makes sense, to establish their own ecosphere, Apple sponsors meanwhile the 3D project Blender, to implement an Blender Render with Metal backend on M-series....but I doubt Apple has any serious interest in computer chess projects, might change, if enough people send them nudgy email requests though ;)

--
Srdja