Ipmanchess Apple M3, M4 Benchmarks

Discussion of anything and everything relating to chess playing software and machines.

Moderators: chrisw, Rebel, Ras

User avatar
Dariusz
Posts: 379
Joined: Sat Jun 13, 2015 10:08 am
Location: Poland
Full name: Dariusz Domagała

Re: Ipmanchess Apple M3, M4 Benchmarks

Post by Dariusz »

Nodes/second : 4663
Regards, Darius
https://chessengeria.eu
Werewolf
Posts: 1943
Joined: Thu Sep 18, 2008 10:24 pm

Re: Ipmanchess Apple M3, M4 Benchmarks

Post by Werewolf »

Dariusz wrote: Mon Jan 06, 2025 10:08 pm Nodes/second : 4663
That's very good. The 4090 gets around 10,000 nps.
Hai
Posts: 637
Joined: Sun Aug 04, 2013 1:19 pm

Re: Ipmanchess Apple M3, M4 Benchmarks

Post by Hai »

Werewolf wrote: Tue Jan 07, 2025 10:21 pm
Dariusz wrote: Mon Jan 06, 2025 10:08 pm Nodes/second : 4663
That's very good. The 4090 gets around 10,000 nps.
LC0 SPEED
1. Dariusz uses only one M4 PRO = 20 GPU cores = 4663 nps.
2. That's very good. The 4090 gets around 10,000 nps.
3. MacBook 16-inch M4 MAX = 40 GPU cores = 9326 nps.
4. @Dariusz did you do the “lc0 backendbench”? + You could probably get +10% more speed if you do some LC0 optimizations.
Have you tested CPU threads = 1 instead of 2? The developers figured out a long time ago that this makes much more sense for certain reasons related to Apple hardware and LC0 is also much faster.
5. After the release of Apple M1, the LC0 developers mentioned that they could probably make LC0 up to 5 times faster on Apple GPU, but they would need Apple M chip hardware and only if enough people are interested/have Apple hardware. I think now is the right time to make it 5 times faster.
6. But the developers are currently working on using the Apple Neural Engine as well in the same time, which would mean a speed increase of about +33.33%.

7. See also KataGo:

https://github.com/ChinChangYang/KataGo ... Backend.md
https://github.com/lightvector/KataGo

Discord:
„This is absolutely crazy, but also incredibly exciting! I just have to share it with you all. The GPU and Neural Engine (NE) can run in parallel with a modified demux backend, and the performance is impressive. From my benchmarking, the combined GPU+NE setup reaches a throughput of 587.922 nps, which is a huge step up from the GPU-only performance of 434.691 nps.
That said, there’s still room for improvement in the source code to handle the imbalance in processing rates between the GPU and NE. While the NE achieves a throughput of 210.861 nps, the GPU ends up waiting for the NE to finish if the workload is evenly distributed. To solve this, I’ve modified the source code so the GPU processes two-thirds of the data, while the NE handles one-third. This way, both GPU and NE finish their batches almost simultaneously, minimizing the time either one spends waiting for the other.
It’s an exciting optimization that brings everything closer to peak performance!
```
_
| _ | |
|_ |_ |_|[0m v0.32.0-dev+git.dirty built Jan 6 2025
Found pb network file: /Users/chinchangyang/Code/lc0-ccy/xcuserdata/lc0/DerivedData/lc0/Build/Products/Release/BT4-1024x15x32h-swa-6147500.pb.gz
Weights file has multihead format, updating format flag
Creating backend [demux]...
Creating backend [metal]...
Initialized metal backend on device Apple M3 Max
Creating backend [coreml]...
Compiling model: lc0.mlpackage/ -- file:///Users/chinchangyang/Code/lc0-ccy/xcuserdata/lc0/DerivedData/lc0/Build/Products/Release/
Compiled model URL: file:///var/folders/dv/kdr9x4yn4s106_94ydk5jnjc0000gn/T/lc0_43964B54-0554-4B72-B151-911B2941BDC8.mlmodelc
Initializing model with the compiled model URL...
Model successfully initialized
Benchmark batch size 30 with inference average time 53.6801ms - throughput 558.867 nps.
Benchmark batch size 33 with inference average time 56.8639ms - throughput 580.333 nps.
Benchmark batch size 36 with inference average time 61.2326ms - throughput 587.922 nps.
Hai
Posts: 637
Joined: Sun Aug 04, 2013 1:19 pm

Re: Ipmanchess Apple M3, M4 Benchmarks

Post by Hai »

Can someone provide a benchmark with Apple MacBook Pro 16-inch M4 MAX :D to ipmanchess, then we can better compare Apple's most powerful chip with other cpus and devices.
User avatar
towforce
Posts: 12143
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK
Full name: Graham Laight

Re: Ipmanchess Apple M3, M4 Benchmarks

Post by towforce »

Hai wrote: Fri Jan 10, 2025 10:20 pm Can someone provide a benchmark with Apple MacBook Pro 16-inch M4 MAX :D to ipmanchess, then we can better compare Apple's most powerful chip with other cpus and devices.

Per the second post in this thread, there's a good correlation between Geekbench scores and chess benchmarks: Apple M4 Max processor has a Geekbench single-core score of 4,060 and a multi-core score of 26,675. These scores are for the 16-core version of the M4 Max processor.
Want to attract exceptional people? Be exceptional.
Hai
Posts: 637
Joined: Sun Aug 04, 2013 1:19 pm

Re: Ipmanchess Apple M3, M4 Benchmarks

Post by Hai »

towforce wrote: Fri Jan 10, 2025 10:39 pm
Hai wrote: Fri Jan 10, 2025 10:20 pm Can someone provide a benchmark with Apple MacBook Pro 16-inch M4 MAX :D to ipmanchess, then we can better compare Apple's most powerful chip with other cpus and devices.

Per the second post in this thread, there's a good correlation between Geekbench scores and chess benchmarks: Apple M4 Max processor has a Geekbench single-core score of 4,060 and a multi-core score of 26,675. These scores are for the 16-core version of the M4 Max processor.
It looks like it's the same with new Intel CPUs. viewtopic.php?p=972364#p972364
21.475.064 Intel Core Ultra 7 155H @4.8GHz LPDDR5 6400 CL36 22threads https://ipmanchess.yolasite.com/amd--in ... ckfish.php