M1 Apple Silicon for Chess?

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

acepoint_de
Posts: 86
Joined: Tue Jun 11, 2013 1:14 am

Re: M1 Apple Silicon for Chess?

Post by acepoint_de »

Ckappe wrote: Fri Feb 26, 2021 11:54 am Thanks for the banchmark page https://acepoint.de/engine-benchmarks-on-apples-m1/

But why do you test single thread only for CPU-engines and 2-threads only for GPU??
Could you post the actuall all cores benchmarks as well?
I will. Maybe I was too stupid or not that involved in engine testing, but only after some research in the source code I figured out how to do the benchmark with more than one thread:

benchmark.cpp (stockfish-13)

Code: Select all

95 /// setup_bench() builds a list of UCI commands to be run by bench. There
 96 /// are five parameters: TT size in MB, number of search threads that
 97 /// should be used, the limit value spent for each position, a file name
 98 /// where to look for positions in FEN format, the type of the limit:
 99 /// depth, perft, nodes and movetime (in millisecs), and evaluation type
100 /// mixed (default), classical, NNUE.
101 ///
102 /// bench -> search default positions up to depth 13
103 /// bench 64 1 15 -> search default positions up to depth 15 (TT = 64MB)
104 /// bench 64 4 5000 current movetime -> search current position with 4 threads for 5 sec
105 /// bench 64 1 100000 default nodes -> search default positions for 100K nodes each
106 /// bench 16 1 5 default perft -> run a perft 5 on default positions
Ciao

acepoint
acepoint_de
Posts: 86
Joined: Tue Jun 11, 2013 1:14 am

Re: M1 Apple Silicon for Chess?

Post by acepoint_de »

Ckappe wrote: Fri Feb 26, 2021 11:54 amCould you post the actuall all cores benchmarks as well?
Here you go: https://acepoint.de/benchmarks-with-mor ... -apple-m1/

Ciao

acepoint
Ckappe
Posts: 81
Joined: Sun Feb 14, 2021 11:50 am
Full name: Rütger Andersen

Re: M1 Apple Silicon for Chess?

Post by Ckappe »

acepoint_de wrote: Fri Feb 26, 2021 3:59 pm
Ckappe wrote: Fri Feb 26, 2021 11:54 amCould you post the actuall all cores benchmarks as well?
Here you go: https://acepoint.de/benchmarks-with-mor ... -apple-m1/

Ciao

acepoint
Thanks, much appreciated.. So the cFish with NEON optimization performas worse than C++ clones like Maguro & BlackDiamond? Or did you test them without NNUE?

Also very surprising that the single core performance increase for some enginges when you use two cores... :-)
George Sobala
Posts: 44
Joined: Sat Feb 03, 2018 2:42 pm
Location: Yorkshire, England

Re: M1 Apple Silicon for Chess?

Post by George Sobala »

I appreciate the amount of work you have put into these benchmarks, but here are some comments.

I consistently get benches 5-10% faster than yours with SF and Cfish. And that is on a non-fanned potentially-throttling MacBook Air.

Some things to check:

Have you really shut down all possible background apps? Including things in your taskbar?

Are these profile-optimised builds?

Have you redirected output to /dev/null (e.g. "cfish bench 64 8 > /dev/null" ) ? The scrolling output in the terminal window uses up a degree of CPU which varies from one OS to another.

There can be a lot of variability from run to run with the short depth in the default bench. Have you considered averaging several runs, or searching to a greater depth? I acknowledge that this takes longer.
User avatar
AlexChess
Posts: 1561
Joined: Sat Feb 06, 2021 8:06 am
Full name: Alex Morales

Re: M1 Apple Silicon for Chess?

Post by AlexChess »

acepoint_de wrote: Fri Feb 26, 2021 2:12 pm
Ckappe wrote: Fri Feb 26, 2021 11:54 am Thanks for the banchmark page https://acepoint.de/engine-benchmarks-on-apples-m1/

But why do you test single thread only for CPU-engines and 2-threads only for GPU??
Could you post the actuall all cores benchmarks as well?
I will. Maybe I was too stupid or not that involved in engine testing, but only after some research in the source code I figured out how to do the benchmark with more than one thread:

benchmark.cpp (stockfish-13)

Code: Select all

95 /// setup_bench() builds a list of UCI commands to be run by bench. There
 96 /// are five parameters: TT size in MB, number of search threads that
 97 /// should be used, the limit value spent for each position, a file name
 98 /// where to look for positions in FEN format, the type of the limit:
 99 /// depth, perft, nodes and movetime (in millisecs), and evaluation type
100 /// mixed (default), classical, NNUE.
101 ///
102 /// bench -> search default positions up to depth 13
103 /// bench 64 1 15 -> search default positions up to depth 15 (TT = 64MB)
104 /// bench 64 4 5000 current movetime -> search current position with 4 threads for 5 sec
105 /// bench 64 1 100000 default nodes -> search default positions for 100K nodes each
106 /// bench 16 1 5 default perft -> run a perft 5 on default positions
Ciao

acepoint
Hi!
Someone has compiled the latest Stockfish 13 with FF NNUE embedded inside!!! Maybe now you can compile it for Mac M1

https://www.chess2u.com/t16869-stockfish_ff2
Source code: https://pixeldrain.com/u/xqskCd3S
Chess engines and dedicated chess computers fan since 1981 :D macOS Sequoia 16GB-512GB, Windows 11 & Ubuntu ARM64.
ProteusSF Dev Forum
acepoint_de
Posts: 86
Joined: Tue Jun 11, 2013 1:14 am

Re: M1 Apple Silicon for Chess?

Post by acepoint_de »

Ckappe wrote: Fri Feb 26, 2021 4:29 pm Thanks, much appreciated.. So the cFish with NEON optimization performas worse than C++ clones like Maguro & BlackDiamond? Or did you test them without NNUE?
benchmark.c of cfish:

Code: Select all

109 // - Evaluation: classical, nnue (hybrid), pure (NNUE only), mixed (default).
I suggest you better ask the developers regarding such specific questions. I have no time to investigate for each Stockfish-based engine whether «default» bench is different to the others.
Also very surprising that the single core performance increase for some engines when you use two cores... :-)
These benchmarks will rarely give exact the same number, they vary up to +/-50,000 nps, and I'm surprised that you don't know this. If you want exact numbers you have to reboot the system after each run, do each run ten times and take the mean. Even longer test suites don't give exactly the same results.

Ciao

acepoint
acepoint_de
Posts: 86
Joined: Tue Jun 11, 2013 1:14 am

Re: M1 Apple Silicon for Chess?

Post by acepoint_de »

Hi George,
George Sobala wrote: Fri Feb 26, 2021 4:46 pmI consistently get benches 5-10% faster than yours with SF and Cfish. And that is on a non-fanned potentially-throttling MacBook Air.
For these short benchmarks the difference between Air and Pro should only be visible in

Code: Select all

bench 64 8
But 5-10% is a lot.
Some things to check:

Have you really shut down all possible background apps? Including things in your taskbar?
nope, but there were no CPU-intensive apps running. But you are right, for getting comparable precise results I should shut down everything else.
Are these profile-optimised builds?
Stockfish compiled with

Code: Select all

make build ARCH=apple-silicon
no modifications in the makefile. In Cfish I had to

Code: Select all

make build ARCH=apple-silicon NUMA=no
otherwise I get compile errors. I might get corrected, but as far as I understand the M1-architecture this should have no effect.
Have you redirected output to /dev/null (e.g. "cfish bench 64 8 > /dev/null" ) ? The scrolling output in the terminal window uses up a degree of CPU which varies from one OS to another.
Nope, good idea.
There can be a lot of variability from run to run with the short depth in the default bench. Have you considered averaging several runs, or searching to a greater depth? I acknowledge that this takes longer.
Yes, that's what I already wrote to Ckappe. I know this from benchmarks on Windows systems. Perhaps I'll do a longer run (10 times each) at the weekend.

Ciao

acepoint

PS. This is the output of stockfish arm64 (homebrew), even 10% less than mine:

Code: Select all

Total time (ms) : 1768
Nodes searched  : 3557925
Nodes/second    : 2012401
Ckappe
Posts: 81
Joined: Sun Feb 14, 2021 11:50 am
Full name: Rütger Andersen

Re: M1 Apple Silicon for Chess?

Post by Ckappe »

acepoint_de wrote: Fri Feb 26, 2021 5:38 pm
Ckappe wrote: Fri Feb 26, 2021 4:29 pm Thanks, much appreciated.. So the cFish with NEON optimization performas worse than C++ clones like Maguro & BlackDiamond? Or did you test them without NNUE?
benchmark.c of cfish:

Code: Select all

109 // - Evaluation: classical, nnue (hybrid), pure (NNUE only), mixed (default).
I suggest you better ask the developers regarding such specific questions. I have no time to investigate for each Stockfish-based engine whether «default» bench is different to the others.
Also very surprising that the single core performance increase for some engines when you use two cores... :-)
These benchmarks will rarely give exact the same number, they vary up to +/-50,000 nps, and I'm surprised that you don't know this. If you want exact numbers you have to reboot the system after each run, do each run ten times and take the mean. Even longer test suites don't give exactly the same results.

Ciao

acepoint

I asked you as I cannot really ask Michael what options you used and what bench code you used etc..

For black-diamond etc. it seems NNUE=false if you don't specify that for bench... Using mixed for some, NNUE for some, classic for some doesnt make to much sens.. But I fully understand your point that doing it more seriously takes lots more time :-) And I appreciate what you have done :-)

I have done many benches of SF and never seen the two threads perf be more than twice the single core one given a good test-setup, bench.. Maybe you have very different conditions between runs? Or as stated before an extrem small sample size for the avg nps bench in bench..

I am not very impressed what Apple has accomplished with the M1.. A TDP of 39W accoriding to Apple on full load and desprite using 5nm, and 8-core design - my hopes for their more mid-range/high-end stuff coming sometime is pretty low right now.. I hope they will do better!
Ckappe
Posts: 81
Joined: Sun Feb 14, 2021 11:50 am
Full name: Rütger Andersen

Re: M1 Apple Silicon for Chess?

Post by Ckappe »

Great update on the bench table.. thx.

I did a quick comparative bench run of the 4900H CPU (ASUS Duo laptop). default bench = mixed). to compare...

"Cfish-18022021 AVX2.exe bench 10 16"

==========================
Total time (ms) : 7209
Nodes searched : 233263127
Nodes/second : 32357210

This is more than twice the performance of the M1 Cfish build? with the same number of cores and 45W TDP CPU compared to Apple's 39W TDP CPU. I guess I will have to wait for the M3, just like my old BMW lol :-)

But on the other hand, I am not wishing Apple silicon to become a "need" for me as chess-enthusiast, as I think Apples un-openness is pretty bad behavior.. and their constant effort to lock-in and overuse DRM tech on most HW and software these days is something that anyone loving openness should shy away from imop. I kind of look at Apple as the Chessbase of mobile-devices today :-(...
User avatar
AlexChess
Posts: 1561
Joined: Sat Feb 06, 2021 8:06 am
Full name: Alex Morales

Re: M1 Apple Silicon for Chess?

Post by AlexChess »

Double speed, double cost:

1499€ against 999€ of Macbook Air M1 . But I have bought Mac mini M1 (paid 700€) because in "lockdownded" mobility my smartphone is enough for me and I like a big LCD.
And I'm more interested in chess engines rating lists than abstracts speed benchmarks :D

https://1drv.ms/u/s!AkW3Hj0Gl_ewzx2qKkI ... 3?e=7ILfQu Super Blitz TOP 32 3 mins+3 secs (on Windows 10 ARM in Parallels 16.3 M1 - TO BE CONTINUED)

https://www.ultrabookreview.com/35985-a ... s-laptops/
Last edited by AlexChess on Sat Feb 27, 2021 9:05 am, edited 1 time in total.
Chess engines and dedicated chess computers fan since 1981 :D macOS Sequoia 16GB-512GB, Windows 11 & Ubuntu ARM64.
ProteusSF Dev Forum