michiguel wrote:bob wrote:Laskos wrote:bob wrote:Laskos wrote:bob wrote:Laskos wrote:Albert Silver wrote:Laskos wrote:And on topic, Houdini 1.5 parallelization _practically_ works better than Crafty's
Data?
Don't have right now, but I did the following:
NPS and time to depth are useless.
In other words, you have no data. Time to depth is the _only_ way to measure SMP performance. and I do mean the _ONLY_ way.
A little silly from you. You must know that the ELO gain and time to solution are MUCH more reliable. Depth "8 core" is not the same as Depth "1 core" (maybe in Crafty it's the same, that would be pretty unusual).
Kai
You do realize that for _most_ positions, time to depth and time to solution are _identical_?
No, I don't quite realize.
Only those odd positions where a program changes its mind due to pure non-determinism will be different. Choose enough positions and those won't be a problem. If there is a clear best move, a program should find that move at the same depth, same PV, same score, most of the time. Hence a lot of runs to wash out the sampling error.
Anyway, my tests do not assume anything, and is reasonable to say that are more trustworthy than using time to depth.
Kai
Funny idea. SMP speedup is a direct measure of how much faster something runs. You are using something that is indirectly coupled to speed, without an accurate way of translating from Elo to Speedup at all, to measure something that can only be measured as:
speedup = 1-cpu-time / n-cpu-time
no other computation works for this.
That would work if the speedup is consistent, i.e. you always have, say 2x speed up for every single position (like doubling the clock of your computer), but you don't. And you know this very well. Some position could be 1.5x some 2.5x etc.
If the speed up dispersion is different, the effect on ELO would be different even if the speed up average is the same. This is just to name one of the many parameters that could make this more complicated than it looks like by simple inspection.
I agree 100% with Kai, time to depth does not tell the whole story.
Miguel
I don't typically see a huge change in speedup over a lot of positions. But in any case, the positions I use are consecutive positions from a real game (from the CB DTS paper circa 1995 or so) I run 'em all, several times, compute the average speedup from those so that no one position swamps the weighting.
Time to depth tells the only story that is meaningful, however. If you play games (and I have done a _ton_ of this on our cluster, playing 1, 2 4 and 8 cpu versions of Crafty against the usual gauntlet using just one cpu) you get a rough idea of Elo gain per CPU. But that has nothing to do with the actual parallel search performance when trying to discuss the concept of "speedup" which is the measure _everyone_ uses to measure parallel algorithm effectiveness. Once you get that Elo gain, there is no formula that translates that to a speedup. To create that formula, you need the speedup data as well, and then fit a curve to both so that one can predict the other and vice-versa. But by then you already have the actual speedup data everyone wants.
The elo per cpu is not very accurate, and doesn't compare program to program, and a program actually might gain more elo in one part of the game than in another. Too many variables. Too many averages of averages. Everyone has a good idea of what happens when a specific program runs 2x faster. Some get +50, some +70, some even more, some even less. But running 2x faster is a known quantity, which for any given program has a resulting implied Elo gain. Given good speedup data, I can predict the Elo gain more accurately than using the Elo gain to predict the speedup, with no speedup data to use for verification.
yes, I like to use games for all testing and decision-making. But for parallel search, speedup is by far the best measurement to base programming decisions on. There is a direct coupling. The Elo measurement is an indirect coupling.