superlinear speedups

jswaff · Post by **jswaff** » Sat Nov 01, 2008 7:07 pm

Let me ask a related question. I think it's pretty obvious that with a perfectly ordered tree the speedup will not be superlinear. In CUT nodes there will be only one child anyway, so no opportunity there, and in ALL nodes every child has to be searched anyway, with the same bounds, so at each node you can get at most a speedup of N (N=processors).

But what about worst case move ordering? So, ALL nodes are still ALL nodes, but in (would be) CUT nodes the cutoff occurs as late as possible. What makes superlinear speedup happen in these types of positions?

I would like the discussion to focus on the algorithm itself. I'm not interested in hardware limitations, clocks, etc.

--
James

Dirt · Post by **Dirt** » Sat Nov 01, 2008 7:08 pm

michiguel wrote:First it has to be defined what "CPU" means in this context. For instance, if you double the number of processing units with the cache memory that comes with it, and the algorithm takes advantage of more fast memory, you can easily envision a possible superlinear speedup. Some memory sorting techniques could fall in this category (speed dependent on fast temp. storage too).

Miguel

You're suggesting that the performance could be superlinear with regard to memory? Maybe in some very odd cases, but I can't "easily envsion" that in general.

Carey · Post by **Carey** » Sat Nov 01, 2008 8:16 pm

jswaff wrote:Let me ask a related question. I think it's pretty obvious that with a perfectly ordered tree the speedup will not be superlinear.

With perfect move ordering there is no opportunity for superlinear.

So it's irrelevant.

In CUT nodes there will be only one child anyway, so no opportunity there, and in ALL nodes every child has to be searched anyway, with the same bounds, so at each node you can get at most a speedup of N (N=processors).

But what about worst case move ordering? So, ALL nodes are still ALL nodes, but in (would be) CUT nodes the cutoff occurs as late as possible. What makes superlinear speedup happen in these types of positions?

Interesting question.

Actually, it might be worth your time to actually find a parallel chess program and run some tests.

Do a fake eval that gives a programmable score to the nodes so you can control how the moves are searched.

0% would be progressively worse, with the best move at the very last and the worst at the beginning.

50% would be mid way, with an even distribution on either side.

100% would be progessively better, with the best always at the beginning and the worst at the end.

With a bit more thought, you might be able to come up with a pretty good experiment worth writing up and posting.

You'll need a realistic way to simulate the parallelism of N cpu's, though.

I would like the discussion to focus on the algorithm itself. I'm not interested in hardware limitations, clocks, etc.

--
James

I understand. But it makes the results invald for real world situations.

If you ignore the real world, then it just wont match. Reminds me of the old days when people used to do large scale FFT's. Once you reached the point where it had to be done on tape (I did say old days...), the old methods were technically valid but no long realistic and the performance was no longer what it should. Different methods were required that took into account the real world latencies and parallelisms.

jswaff · Post by **jswaff** » Sat Nov 01, 2008 9:02 pm

Carey wrote: Do a fake eval that gives a programmable score to the nodes so you can control how the moves are searched.

0% would be progressively worse, with the best move at the very last and the worst at the beginning.

50% would be mid way, with an even distribution on either side.

100% would be progessively better, with the best always at the beginning and the worst at the end.

Hmm... that's an interesting idea that might give some insight. I'll think about that. Thanks for the suggestion.

I understand. But it makes the results invald for real world situations.

If you ignore the real world, then it just wont match. Reminds me of the old days when people used to do large scale FFT's. Once you reached the point where it had to be done on tape (I did say old days...), the old methods were technically valid but no long realistic and the performance was no longer what it should. Different methods were required that took into account the real world latencies and parallelisms.

I understand, but my question was about the theory.

bob · Post by **bob** » Sat Nov 01, 2008 11:13 pm

jswaff wrote:I'm looking for a formal proof refuting the possibility of superlinear speedups using parallel search. Can anybody point me to one?

Actually, I would even settle for a good informal argument. Anybody?

--
James

It is a simple idea.

Super-linear speedups come from bad move ordering. The idea is this. You have a position where the best move is not obvious until depth N. When you start iteration N, you have to walk thru moves 1, 2, ... m until you discover move m. In a parallel search, you might split at this ply and search moves 1, 2, ... m in parallel, and you find that M is better much sooner. But as I said, it is an artifact of bad move ordering. A better search algorithm will eliminate that.

In the "improved search" at each ply in the tree, you "slice" your effort up so that you search move 1 for a short time, then move 2, then 3, then ..., then M, and then keep repeating this. Eventually you discover that M is best before you have completed the others.

The "improved search" is worse in most cases, but in those positions where you see a super-linear speedup, it will reduce the parallel speedup back down to N where it should be...

bob · Post by **bob** » Sat Nov 01, 2008 11:16 pm

jswaff wrote:Oh sure, it's possible in some positions, due to imperfect move ordering. So let me clarify: I'm talking about the general case, with well ordered trees. I'd like to hear even an informal argument as to why you won't see superlinear speedups (in general) with good move ordering.

The math is a bit messy. I wrote a paper for the Journal of Parallel Computing back around 1987 or so that covers this math in detail. Super-linear speedup happens when you have poor move ordering. With best-first ordering, the "improved search algorithm" I explained in another post in this thread will not work and will actually be slower.

Edit: Back to the proof: I think it should be possible to prove using perfectly ordered trees (in which CUT nodes examine just one child).

--
James

Carey wrote:
jswaff wrote:I'm looking for a formal proof refuting the possibility of superlinear speedups using parallel search. Can anybody point me to one?

Actually, I would even settle for a good informal argument. Anybody?

--
James
First, you can't prove that something like that is impossible. All you can prove is that given certain constraints and certain knowledge and certain techniques, it's not possible within those bounds. Even that would probably be pretty difficult.

Second, Hyatt and others have long encountered the occasional position that got superlinear speed ups.

True, those positions are rare. Just random positions that happen to work well for that particular search method etc.

But the fact that even one occured would suggest that you can not prove that superlinear speedups are impossible.

You might be able to prove they are rare.... But since they exist it would be difficult to prove they are impossible.

Of course, I'm not an expert in mathematical proofs or parallel search, so others may have a different opinion.

bob · Post by **bob** » Mon Nov 03, 2008 12:57 am

Carey wrote:
hgm wrote:In principle you can simate all that randomness as well. It doesn't matter how large a hit you take in making that detailed a simulation. With super-linear speedup you will earn back any factor, for large-enough N.
Only if you know about it ahead of time.

Once you start adding all sorts of extra conditions and situations, the proof has to go up greatly in complexity.

And then if you change the hardware that behaves differently, then you have to change your simulation & proof, etc.

Obviously you cannot prove what is not true. But I think the proof I gave above does prove this:

"If you have super-linear speedup, your algorithm sucks for any number of processors."
But it's entirely possible for your algorithm to suck.

That it's a poor algorithm or a poor (or odd) conversion going from single to parallel says nothing about the validity of the results.

I'm not saying superlinear is possible or practical.

Just that proving it would be very difficult and would likely require a significant number of caveats and could end up being very specfic rather than general.

The randomness is really not the issue.. HGM hit it perfectly with his time-slicing search, which has been the refutation for consistent super-linear speedups since parallel search started. It is not the best way to search in general, but it directly solves the super-linear problem and brings the speedup back down to whatever is reasonable for the number of processors used...

Carey · Post by **Carey** » Mon Nov 03, 2008 3:28 am

bob wrote:
Carey wrote:
hgm wrote:In principle you can simate all that randomness as well. It doesn't matter how large a hit you take in making that detailed a simulation. With super-linear speedup you will earn back any factor, for large-enough N.
Only if you know about it ahead of time.

Once you start adding all sorts of extra conditions and situations, the proof has to go up greatly in complexity.

And then if you change the hardware that behaves differently, then you have to change your simulation & proof, etc.

Obviously you cannot prove what is not true. But I think the proof I gave above does prove this:

"If you have super-linear speedup, your algorithm sucks for any number of processors."
But it's entirely possible for your algorithm to suck.

That it's a poor algorithm or a poor (or odd) conversion going from single to parallel says nothing about the validity of the results.

I'm not saying superlinear is possible or practical.

Just that proving it would be very difficult and would likely require a significant number of caveats and could end up being very specfic rather than general.
The randomness is really not the issue.. HGM hit it perfectly with his time-slicing search, which has been the refutation for consistent super-linear speedups since parallel search started. It is not the best way to search in general, but it directly solves the super-linear problem and brings the speedup back down to whatever is reasonable for the number of processors used...

Basically yes.

But if you are going to prove something, then the conditions of the proof have to match the real world.

Otherwise all you did was prove something that does not, can not, will not exist and your proof either becomes worthless or merely theoretical for idealic conditions.

It'd be like saying you can turn lead into gold, but only at absolute zero degrees Kelvin. The catch is, you can never reach absolute zero.

bob · Post by **bob** » Mon Nov 03, 2008 7:03 am

michiguel wrote:
jswaff wrote:I'm looking for a formal proof refuting the possibility of superlinear speedups using parallel search. Can anybody point me to one?

Actually, I would even settle for a good informal argument. Anybody?

--
James
There was a huge thread some years ago about this that lead to nowhere.

Miguel

It only lead to "nowhere" in Vincent's mind. The description of an algorithm to improve the original program so that a super-linear speedup is impossible is a topic covered in most parallel programming books. If you get a super-linear speedup on some algorithm, you just simulate the interleaved execution serially on the original algorithm using just one processor, and the super-linear speedup goes away. As it must...

bob · Post by **bob** » Mon Nov 03, 2008 7:25 pm

jswaff wrote:Let me ask a related question. I think it's pretty obvious that with a perfectly ordered tree the speedup will not be superlinear. In CUT nodes there will be only one child anyway, so no opportunity there, and in ALL nodes every child has to be searched anyway, with the same bounds, so at each node you can get at most a speedup of N (N=processors).

But what about worst case move ordering? So, ALL nodes are still ALL nodes, but in (would be) CUT nodes the cutoff occurs as late as possible. What makes superlinear speedup happen in these types of positions?

I would like the discussion to focus on the algorithm itself. I'm not interested in hardware limitations, clocks, etc.

--
James

For worst-case move ordering, it is trivial to get a perfect speedup. If you can find my Ph.D. dissertation, I ran exactly this test as part of the performance analysis. with perfect ordering, a perfect speedup is also easy(this was also done in my dissertation). It is the "non-perfect-ordering" circumstances that make decent speedup difficult to obtain.

superlinear speedups

related question

Re: superlinear speedups

Re: related question

Re: related question

Re: superlinear speedups

Re: superlinear speedups

Re: superlinear speedups

Re: superlinear speedups

Re: superlinear speedups

Re: related question