So, you say that formula doesn't apply anywhere near 16 or 32 threads. Couldn't you just remember 3 numbers for doublings? "Approximation" a la Hyatt.bob wrote:
Where do "I claim 1.9x speedup from 16 cores to 32 cores?" Never have, never will. Unless you talk about my simple approximation, which has NEVER been a claim of anything, just a simple estimate. You misuse the word "claim" badly here. I claim that is just a rough estimate. and I've made the claim much weaker for 16-32 and beyond as well.
Now guy, you said that you got x32 effective speedup (time to strength) on 64 core Itanium box. Then smirked when I said it IS very high effective speed-up, with you citing some papers from 80es for DTS on completely different architecture. You know what x32 _effective_speed_up means in terms of _average_ effective speed-up for doubling time to 64 threads?
1.78, per each doubling, 6 doublings.
That is VERY HIGH effective speed-up for 6 doublings. Either hardware is very different or Crafty is exceptionally, almost linearly scaling engine. What V. Rajlich reported years ago for effective speed-up of Rybka was a series of doublings to 16 cores looking like that:
1.8*1.7*1.6*1.5
The total is 7.3, already below half of 16 cores used. That's why I consider x32 for Crafty on 64 core box very high.
Would you stand for a bet:
If Andreas bothers to run 32 cores test on his Dual Opteron, I claim Komodo will gain 1.34 effective speed-up, or 38 Elo points, with log model, your claim is that it will gain 1.78 in effective speed-up from 16 to 32 threads (from your Itanium 64 core tests with claimed x32 effective speed-up), or 75 Elo points on that hardware at Andreas' 60''+0.05'' time control. Whoever is closer in numbers wins the bet, whoever loses has to post here "I AM AN IDIOT".
I may open a plea thread to ask Andreas to perform such a test, either Komodo 32 threads against Komodo 1 thread, or even better, Komodo 32 threads against Komodo 16 threads.
Agreed on such a bet?