Those nps number I provided was done using the bench command using its default values so yes, that is single thread. I confirmed by running "bench" and "bench 16 1" on my Intel and get roughly the same nps results.
I use the bench command mostly to confirm the "nodes searched" as reported on abrok site to ensure I am using the correct commit.
Until someone starts producing an ARM build for the Snapdragon what we are looking is how much performance of the x64 build is lost via Prism and the popcnt numbers are good enough as a guide. The other useful thing we learned is AVX2 and BMI builds does not even start so that is another big chunk of performance lost.
Interestingly, the macOS Game Porting Toolkit (GPTK) 2 now supports AVX2 so I wonder whether that has any relevance back to chess engines and whether that support in itself gives equivalent performance to native x64 CPUs.
wickedpotus wrote: ↑Tue Jul 23, 2024 12:52 am
What point is there to run a multithreaded engine like Stockfish with only one thread on a multicore CPU?
It's like trying to use a powerful sports car with only one gear.
As Charles Wong mentioned, by using single thread you can compare performance of x86-64 SSE native vs. x86-64 SSE via MS Prism on ARM vs. x86-64 AVX2 native. You loose ~20% for SSE over AVX2, you loose ~30% via MS Prism over native. These numbers are good to know to roughly inter and extrapolate Stockfish NPS for different hardware.
wickedpotus wrote: ↑Tue Jul 23, 2024 12:52 am
What point is there to run a multithreaded engine like Stockfish with only one thread on a multicore CPU?
It's like trying to use a powerful sports car with only one gear.
As Charles Wong mentioned, by using single thread you can compare performance of x86-64 SSE native vs. x86-64 SSE via MS Prism on ARM vs. x86-64 AVX2 native. You loose ~20% for SSE over AVX2, you loose ~30% via MS Prism over native. These numbers are good to know to roughly inter and extrapolate Stockfish NPS for different hardware.
--
Srdja
Quite silly, as we all know that you just cannot extrapolate single-core performance across different CPU architectures, hyperthreading, etc.
If you want to benchmark a 24-core CPU, of course, you use all 24 cores (if the software permits), to the max. If you want to benchmark a 24-thread CPU, of course, you want all threads 100% utilized. Everything else is just plain silly and most likely leads to flawed or even completely wrong conclusions. Running a single-threaded program on one core will also not perform the same as if you run the same program with affinity on each of all cores at once.
wickedpotus wrote: ↑Mon Aug 05, 2024 10:57 pm
Quite silly, as we all know that you just cannot extrapolate single-core performance across different CPU architectures, hyperthreading, etc.
[...]
You are right in a sense that you can not extrapolate given NPS from one ISA implementation to another.