Let me ask a related question. I think it's pretty obvious that with a perfectly ordered tree the speedup will not be superlinear. In CUT nodes there will be only one child anyway, so no opportunity there, and in ALL nodes every child has to be searched anyway, with the same bounds, so at each node you can get at most a speedup of N (N=processors).
But what about worst case move ordering? So, ALL nodes are still ALL nodes, but in (would be) CUT nodes the cutoff occurs as late as possible. What makes superlinear speedup happen in these types of positions?
I would like the discussion to focus on the algorithm itself. I'm not interested in hardware limitations, clocks, etc.
--
James
superlinear speedups
Moderator: Ras
related question
Last edited by jswaff on Sat Nov 01, 2008 7:11 pm, edited 1 time in total.
-
- Posts: 2851
- Joined: Wed Mar 08, 2006 10:01 pm
- Location: Irvine, CA, USA
Re: superlinear speedups
You're suggesting that the performance could be superlinear with regard to memory? Maybe in some very odd cases, but I can't "easily envsion" that in general.michiguel wrote:First it has to be defined what "CPU" means in this context. For instance, if you double the number of processing units with the cache memory that comes with it, and the algorithm takes advantage of more fast memory, you can easily envision a possible superlinear speedup. Some memory sorting techniques could fall in this category (speed dependent on fast temp. storage too).
Miguel
-
- Posts: 313
- Joined: Wed Mar 08, 2006 8:18 pm
Re: related question
With perfect move ordering there is no opportunity for superlinear.jswaff wrote:Let me ask a related question. I think it's pretty obvious that with a perfectly ordered tree the speedup will not be superlinear.
So it's irrelevant.
Interesting question.In CUT nodes there will be only one child anyway, so no opportunity there, and in ALL nodes every child has to be searched anyway, with the same bounds, so at each node you can get at most a speedup of N (N=processors).
But what about worst case move ordering? So, ALL nodes are still ALL nodes, but in (would be) CUT nodes the cutoff occurs as late as possible. What makes superlinear speedup happen in these types of positions?
Actually, it might be worth your time to actually find a parallel chess program and run some tests.
Do a fake eval that gives a programmable score to the nodes so you can control how the moves are searched.
0% would be progressively worse, with the best move at the very last and the worst at the beginning.
50% would be mid way, with an even distribution on either side.
100% would be progessively better, with the best always at the beginning and the worst at the end.
With a bit more thought, you might be able to come up with a pretty good experiment worth writing up and posting.
You'll need a realistic way to simulate the parallelism of N cpu's, though.
I understand. But it makes the results invald for real world situations.I would like the discussion to focus on the algorithm itself. I'm not interested in hardware limitations, clocks, etc.
--
James
If you ignore the real world, then it just wont match. Reminds me of the old days when people used to do large scale FFT's. Once you reached the point where it had to be done on tape (I did say old days...), the old methods were technically valid but no long realistic and the performance was no longer what it should. Different methods were required that took into account the real world latencies and parallelisms.
Re: related question
Hmm... that's an interesting idea that might give some insight. I'll think about that. Thanks for the suggestion.Carey wrote: Do a fake eval that gives a programmable score to the nodes so you can control how the moves are searched.
0% would be progressively worse, with the best move at the very last and the worst at the beginning.
50% would be mid way, with an even distribution on either side.
100% would be progessively better, with the best always at the beginning and the worst at the end.
I understand, but my question was about the theory.I understand. But it makes the results invald for real world situations.
If you ignore the real world, then it just wont match. Reminds me of the old days when people used to do large scale FFT's. Once you reached the point where it had to be done on tape (I did say old days...), the old methods were technically valid but no long realistic and the performance was no longer what it should. Different methods were required that took into account the real world latencies and parallelisms.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: superlinear speedups
It is a simple idea.jswaff wrote:I'm looking for a formal proof refuting the possibility of superlinear speedups using parallel search. Can anybody point me to one?
Actually, I would even settle for a good informal argument. Anybody?
--
James
Super-linear speedups come from bad move ordering. The idea is this. You have a position where the best move is not obvious until depth N. When you start iteration N, you have to walk thru moves 1, 2, ... m until you discover move m. In a parallel search, you might split at this ply and search moves 1, 2, ... m in parallel, and you find that M is better much sooner. But as I said, it is an artifact of bad move ordering. A better search algorithm will eliminate that.
In the "improved search" at each ply in the tree, you "slice" your effort up so that you search move 1 for a short time, then move 2, then 3, then ..., then M, and then keep repeating this. Eventually you discover that M is best before you have completed the others.
The "improved search" is worse in most cases, but in those positions where you see a super-linear speedup, it will reduce the parallel speedup back down to N where it should be...
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: superlinear speedups
jswaff wrote:Oh sure, it's possible in some positions, due to imperfect move ordering. So let me clarify: I'm talking about the general case, with well ordered trees. I'd like to hear even an informal argument as to why you won't see superlinear speedups (in general) with good move ordering.
The math is a bit messy. I wrote a paper for the Journal of Parallel Computing back around 1987 or so that covers this math in detail. Super-linear speedup happens when you have poor move ordering. With best-first ordering, the "improved search algorithm" I explained in another post in this thread will not work and will actually be slower.
Edit: Back to the proof: I think it should be possible to prove using perfectly ordered trees (in which CUT nodes examine just one child).
--
James
Carey wrote:First, you can't prove that something like that is impossible. All you can prove is that given certain constraints and certain knowledge and certain techniques, it's not possible within those bounds. Even that would probably be pretty difficult.jswaff wrote:I'm looking for a formal proof refuting the possibility of superlinear speedups using parallel search. Can anybody point me to one?
Actually, I would even settle for a good informal argument. Anybody?
--
James
Second, Hyatt and others have long encountered the occasional position that got superlinear speed ups.
True, those positions are rare. Just random positions that happen to work well for that particular search method etc.
But the fact that even one occured would suggest that you can not prove that superlinear speedups are impossible.
You might be able to prove they are rare.... But since they exist it would be difficult to prove they are impossible.
Of course, I'm not an expert in mathematical proofs or parallel search, so others may have a different opinion.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: superlinear speedups
The randomness is really not the issue.. HGM hit it perfectly with his time-slicing search, which has been the refutation for consistent super-linear speedups since parallel search started. It is not the best way to search in general, but it directly solves the super-linear problem and brings the speedup back down to whatever is reasonable for the number of processors used...Carey wrote:Only if you know about it ahead of time.hgm wrote:In principle you can simate all that randomness as well. It doesn't matter how large a hit you take in making that detailed a simulation. With super-linear speedup you will earn back any factor, for large-enough N.
Once you start adding all sorts of extra conditions and situations, the proof has to go up greatly in complexity.
And then if you change the hardware that behaves differently, then you have to change your simulation & proof, etc.
But it's entirely possible for your algorithm to suck.Obviously you cannot prove what is not true. But I think the proof I gave above does prove this:
"If you have super-linear speedup, your algorithm sucks for any number of processors."
That it's a poor algorithm or a poor (or odd) conversion going from single to parallel says nothing about the validity of the results.
I'm not saying superlinear is possible or practical.
Just that proving it would be very difficult and would likely require a significant number of caveats and could end up being very specfic rather than general.
-
- Posts: 313
- Joined: Wed Mar 08, 2006 8:18 pm
Re: superlinear speedups
Basically yes.bob wrote:The randomness is really not the issue.. HGM hit it perfectly with his time-slicing search, which has been the refutation for consistent super-linear speedups since parallel search started. It is not the best way to search in general, but it directly solves the super-linear problem and brings the speedup back down to whatever is reasonable for the number of processors used...Carey wrote:Only if you know about it ahead of time.hgm wrote:In principle you can simate all that randomness as well. It doesn't matter how large a hit you take in making that detailed a simulation. With super-linear speedup you will earn back any factor, for large-enough N.
Once you start adding all sorts of extra conditions and situations, the proof has to go up greatly in complexity.
And then if you change the hardware that behaves differently, then you have to change your simulation & proof, etc.
But it's entirely possible for your algorithm to suck.Obviously you cannot prove what is not true. But I think the proof I gave above does prove this:
"If you have super-linear speedup, your algorithm sucks for any number of processors."
That it's a poor algorithm or a poor (or odd) conversion going from single to parallel says nothing about the validity of the results.
I'm not saying superlinear is possible or practical.
Just that proving it would be very difficult and would likely require a significant number of caveats and could end up being very specfic rather than general.
But if you are going to prove something, then the conditions of the proof have to match the real world.
Otherwise all you did was prove something that does not, can not, will not exist and your proof either becomes worthless or merely theoretical for idealic conditions.
It'd be like saying you can turn lead into gold, but only at absolute zero degrees Kelvin. The catch is, you can never reach absolute zero.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: superlinear speedups
It only lead to "nowhere" in Vincent's mind. The description of an algorithm to improve the original program so that a super-linear speedup is impossible is a topic covered in most parallel programming books. If you get a super-linear speedup on some algorithm, you just simulate the interleaved execution serially on the original algorithm using just one processor, and the super-linear speedup goes away. As it must...michiguel wrote:There was a huge thread some years ago about this that lead to nowhere.jswaff wrote:I'm looking for a formal proof refuting the possibility of superlinear speedups using parallel search. Can anybody point me to one?
Actually, I would even settle for a good informal argument. Anybody?
--
James
Miguel
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: related question
For worst-case move ordering, it is trivial to get a perfect speedup. If you can find my Ph.D. dissertation, I ran exactly this test as part of the performance analysis. with perfect ordering, a perfect speedup is also easy(this was also done in my dissertation). It is the "non-perfect-ordering" circumstances that make decent speedup difficult to obtain.jswaff wrote:Let me ask a related question. I think it's pretty obvious that with a perfectly ordered tree the speedup will not be superlinear. In CUT nodes there will be only one child anyway, so no opportunity there, and in ALL nodes every child has to be searched anyway, with the same bounds, so at each node you can get at most a speedup of N (N=processors).
But what about worst case move ordering? So, ALL nodes are still ALL nodes, but in (would be) CUT nodes the cutoff occurs as late as possible. What makes superlinear speedup happen in these types of positions?
I would like the discussion to focus on the algorithm itself. I'm not interested in hardware limitations, clocks, etc.
--
James