The point is that that 2x is not the upper bound. You'll have 2x more time to dedicate to searching the best move and maybe 20x more time to dedicate to searching alternative moves.bob wrote:Sorry, but no you won't. The math is simple and comes directly from Amdahl's law and the original alpha/beta analysis by knuth and moore, followed by the parallel analysis I did in the Journal of parallel computing. The first move takes 50% of the time. And that is going to be done on one processor. if you can somehow solve the "window" problem, so that you can search the remainder of the ply-1 moves in parallel with a correct alpha/beta window, then you can shrink those to the time required to search one of them. But you have no alpha/beta window, so all moves searched in parallel (ignoring the first move) are going to take about as long as the first move (each one will take that long) because you do _not_ yet have the alpha bound from the first move to let you get the quick cutoffs on the rest of the move.Vasik Rajlich wrote:Sure. If you're not changing your mind, it doesn't matter what kind of speedup you have.bob wrote:Here's what you need to make that happen.Vasik Rajlich wrote:By effective speedup I mean the time handicap you could give to the original entity and score 50%. So, even if you do nothing other than split at the root and if the first move typically takes 50% of your search time, you could still get an effective speedup of >2. Not that that's what Rybka is doingbob wrote:Can we stay in the real world? Splitting at the root can not produce a 2.5x speedup, when the best move at the root takes _way_ over 50% of the total search time. There is theory. There is practice. And there is nonsense. For the event I am talking about, this claim is "nonsense". You might get the uninformed to buy this stuff, but not someone that has been doing it for 30+ years now (my first parallel search played its first ACM event in 1978....)Vasik Rajlich wrote:The effective speedup is probably somewhere between 2.5:1 and 3:1 for 5 nodes, which is what Lukas had when he tested all of this.Uri Blass wrote:I read this post and I can say 2 things.Dirt wrote:There is something of an explanation here.Vasik Rajlich wrote:Where did that come from ??bob wrote:I don't buy the "this hurts Rybka" idea, because the cluster rybka is a joke. And a poor joke at that. There have been some decent cluster-based programs. But Rybka is simply not one of them.
Vas
1)I think that it is impossible to know the algorithm rybka is using based on output from a single position.
It is possible that something similiar that is not exactly the same is used
when some illogical moves that lose the queen are analyzed but this is not all the story and the algorithm is based partly on "split only at the root" and partly on another idea.
2)I remember that Vas said 100 elo based on testing at fast time control and I suspect that at fast time control you get clearly more than 50 elo per doubling so practically 5 nodes do not give 4:1 speed improvement but clearly less than it(maybe 2.5:1).
Uri
Now he's up to 9 nodes BTW
Vas
Vas
(1) you need to change your mind at least once at the root during the last couple of iterations. More changes is better.
That's just life without shared memory. Any cluster implementation is going to have problems in a position like that.
(2) you have to hope that the hash information from the first move does not affect any other move. Fine 70 is a good example of where this can be a problem.
With infinite # of processors and splitting only at the root, you will get a lot more than 1.5x.
I think you'd be very lucky to get a speedup of 1.5x with any number of processors, which is not zero of course, but it is not something that will make me quake in my boots either.
Vas
Best case is 2x faster, since you could assume that any root move takes as long as any other when you do not have a good alpha bound. And what you can get "peak" is not going to be what you get "on average". >2x is just not going to happen except for rare cases. I have a position somewhere where you can get a 20x speedup like that. First move takes forever to find a very deep mate. Second move is a shorter mate in 1/20th the total nodes searched. Searching both at the same time finds the second mate before the first has even been "sniffed". But that is simply an exception. For the general case, 2x is the very best you can hope for, and it is not going to happen often...
I can give you a citation for the paper I wrote that uses the math from Knuth / Moore and extends it to cover alpha/beta in parallel. It is easy enough to understand and explains exactly why >2x is a rare case, not the norm...
It's not hard to do a simulation. Crafty A always plays its move after 10 seconds of search using his normal algorithm, while Crafty B spends 10 seconds on every single root move and then plays the one with the best score.
I would be curious to know the result here.
Vas