Thanks Miguel, makes a lot of sense. Concerned only with that your SU will inevitable decline with more cores at some time. I am not sure this is a general asymptotic behavior, but could be.michiguel wrote:That was a good observation Kai. We were brainstorming about how to introduce draws and deal with the fact that draw rates change with increasing strength. If you look at the physical chemical "model" of Ordo, it becomes obvious that TRUE strength is related to the ratio W/L (that is related to the difference in "energetic" levels between W and L.Laskos wrote:Very interesting. Can you hint how you derived f?michiguel wrote:An image in a decent size (I could not edit it after 15 min, arghh).
The equation is
speed up = n / (f * (n^2 - n) + 1)
log2 (W/L) = k * log2 (Speedup)
k is a proportionality constant and f is a factor that mean the "apparent" amount of extra work (non parallelizable) that each extra cpu introduce.
For
K8, f = 0.0011 (0.11%) k = 1.82
K7, f = 0.0016 (0.16%) k = 1.62
SF, f = 0.0087 (0.87%) k = 2.09 (starts great)
Za, f = 0.0100 (1.00%) k = 1.84
Cr, f = 0.0160 (1.60%) k = 1.82
Miguel
You assumed an Amdahl-like behavior:
S = (R + 1) * n / (R + n) like in this model?
http://www.talkchess.com/forum/viewtopi ... 29&t=48780
About Wilos, I speculated that Win/Loss ratio remains pretty constant beyond ultra-fast controls several years ago on another forum. More than one year ago, I speculated that here too:
http://www.talkchess.com/forum/viewtopi ... 27&t=48733
With some data. Inversing with draw rate and Elo gain, I even saw a very slow increase with time control of Win/Loss ratio, but almost flat. Wilos will get rid of all these strength, time control, hardware, etc. issues present in Elos. From your graphs and interpolation, it seems that Zappa and Crafty apparent good Elo gain to 8 cores occurs just because they are weaker, and have less draws. In Wilos, their gain is worse than that of top engines, and to 16 cores Komodo 8 is standing out.
As you point out, Z and C seem to have a better scaling at the beginning because they are weaker than SF and K. Exactly.
One thing that Andreas could do is to test Engine_X vs Engine_X (with twice the time, not twice the cores). This will be a theoretical 100% speed up for two cores. If we get the ratio we will get an "effective" speed up.
Sorry, I was going give the derivation but I got trapped with work
WhereCode: Select all
Tipical Amdahl can be shown as S + P SU = --------- S + P/n Numerator is the work with one thread, denominator is the work with "n" threads
SU = speed up
S = serial fraction of the work that cannot be parallelized
P = work that can be parallelized
n = threads
Of course, S + P = 1
But, Amdahl cannot explain that the speed up may go down at a certain number of threads.
Then, I assumed that the work with n threads starts to have spurious work added for each thread added, and that it is a work that cannot be parallelized. That is, for each thread added, every thread slows down a bit in areas doing nothing. So, I add a term f*(n-1) in the denominator and place 1 in the numerator since S+P=1.I can stop here, but I assumed that S is near zero because the fitting hinted me it was reasonable (that is, the amdahl term S in negligible compared to the new term f*(n-1)) then.Code: Select all
1 SU = ------------------- S + P/n + f (n-1)
P = 1 and the term S disappears:multiplying n in the num and den:Code: Select all
1 SU = ------------------- 1/n + f (n-1)
MiguelCode: Select all
n SU = ------------------- 1 + f (n^2-n)
EDIT: maybe adding a term P*f*(n-1) instead of f*(n-1) is more reasonable? So each core wastes something proportional to the parallelizable stuff? Maybe it won't fit data so well, but will look asymptotically more reasonable?