Questions on SMP search

Laskos · Post by **Laskos** » Wed Apr 27, 2011 9:12 pm

bob wrote:
Laskos wrote:
bob wrote:
Laskos wrote:
Albert Silver wrote:
Laskos wrote:And on topic, Houdini 1.5 parallelization _practically_ works better than Crafty's
Data?
Don't have right now, but I did the following:
NPS and time to depth are useless.

In other words, you have no data. Time to depth is the _only_ way to measure SMP performance. and I do mean the _ONLY_ way.

A little silly from you. You must know that the ELO gain and time to solution are MUCH more reliable. Depth "8 core" is not the same as Depth "1 core" (maybe in Crafty it's the same, that would be pretty unusual).

Kai
You do realize that for _most_ positions, time to depth and time to solution are _identical_?

No, I don't quite realize.

Only those odd positions where a program changes its mind due to pure non-determinism will be different. Choose enough positions and those won't be a problem. If there is a clear best move, a program should find that move at the same depth, same PV, same score, most of the time. Hence a lot of runs to wash out the sampling error.

Anyway, my tests do not assume anything, and is reasonable to say that are more trustworthy than using time to depth.

Kai

Milos · Post by **Milos** » Wed Apr 27, 2011 9:44 pm

marcelk wrote:And of course a big hurray for Ben-Hur Carlos for putting up with all of this deviation in his nice SMP thread while making his engine the way we old people do: 100% from the ground up without secretly copying other's people's work. Our game in Leiden last year was very enjoyable and his progress is steady, fast and intimidating...

Since you are not illiterate in coding, you obviously never looked Redqueen's code otherwise you would not write that BS you wrote...

bob · Post by **bob** » Wed Apr 27, 2011 9:48 pm

Laskos wrote:
bob wrote:
Laskos wrote:
bob wrote:
Laskos wrote:
Albert Silver wrote:
Laskos wrote:And on topic, Houdini 1.5 parallelization _practically_ works better than Crafty's
Data?
Don't have right now, but I did the following:
NPS and time to depth are useless.

In other words, you have no data. Time to depth is the _only_ way to measure SMP performance. and I do mean the _ONLY_ way.

A little silly from you. You must know that the ELO gain and time to solution are MUCH more reliable. Depth "8 core" is not the same as Depth "1 core" (maybe in Crafty it's the same, that would be pretty unusual).

Kai
You do realize that for _most_ positions, time to depth and time to solution are _identical_?
No, I don't quite realize.

Only those odd positions where a program changes its mind due to pure non-determinism will be different. Choose enough positions and those won't be a problem. If there is a clear best move, a program should find that move at the same depth, same PV, same score, most of the time. Hence a lot of runs to wash out the sampling error.
Anyway, my tests do not assume anything, and is reasonable to say that are more trustworthy than using time to depth.

Kai

Funny idea. SMP speedup is a direct measure of how much faster something runs. You are using something that is indirectly coupled to speed, without an accurate way of translating from Elo to Speedup at all, to measure something that can only be measured as:

speedup = 1-cpu-time / n-cpu-time

no other computation works for this.

marcelk · Post by **marcelk** » Wed Apr 27, 2011 9:56 pm

bob wrote:
Laskos wrote:
bob wrote:
Laskos wrote:
Albert Silver wrote:
Laskos wrote:And on topic, Houdini 1.5 parallelization _practically_ works better than Crafty's
Data?
Don't have right now, but I did the following:
NPS and time to depth are useless.

In other words, you have no data. Time to depth is the _only_ way to measure SMP performance. and I do mean the _ONLY_ way.

A little silly from you. You must know that the ELO gain and time to solution are MUCH more reliable. Depth "8 core" is not the same as Depth "1 core" (maybe in Crafty it's the same, that would be pretty unusual).

Kai
You do realize that for _most_ positions, time to depth and time to solution are _identical_? Only those odd positions where a program changes its mind due to pure non-determinism will be different. Choose enough positions and those won't be a problem. If there is a clear best move, a program should find that move at the same depth, same PV, same score, most of the time. Hence a lot of runs to wash out the sampling error.

What I started doing recently, and I'm not claiming it is sound yet but it looks promising, is measuring score difference with as oracle an insanely long search of say 60 minutes.

I take this oracle reference and then measure the difference with, say, a 3 minute search on 1 core. I square these errors, average them and square root at the end. That is the 'performance indicator for the 3 minute single core search. Closer to 0 is better.

I compare that then with what 30 second searches on a 6-core give. This would be a larger number because a single thread search is more effective. But at some longer interval, say 60 seconds, the indicators are the same. Then I say that I have a 'speedup equivalent' of 3.0 on 6 cores.

The idea is that I get more than 1 bit of information out of every position search, and thus need fewer positions to get a good indicator. The downside is the need for long oracle searches. I had a few computers calculate those while I was traveling for a few weeks.

bob · Post by **bob** » Wed Apr 27, 2011 10:02 pm

marcelk wrote:
bob wrote:
Laskos wrote:
bob wrote:
Laskos wrote:
Albert Silver wrote:
Laskos wrote:And on topic, Houdini 1.5 parallelization _practically_ works better than Crafty's
Data?
Don't have right now, but I did the following:
NPS and time to depth are useless.

In other words, you have no data. Time to depth is the _only_ way to measure SMP performance. and I do mean the _ONLY_ way.

A little silly from you. You must know that the ELO gain and time to solution are MUCH more reliable. Depth "8 core" is not the same as Depth "1 core" (maybe in Crafty it's the same, that would be pretty unusual).

Kai
You do realize that for _most_ positions, time to depth and time to solution are _identical_? Only those odd positions where a program changes its mind due to pure non-determinism will be different. Choose enough positions and those won't be a problem. If there is a clear best move, a program should find that move at the same depth, same PV, same score, most of the time. Hence a lot of runs to wash out the sampling error.
What I started doing recently, and I'm not claiming it is sound yet but it looks promising, is measuring score difference with as oracle an insanely long search of say 60 minutes.

I take this oracle reference and then measure the difference with, say, a 3 minute search on 1 core. I square these errors, average them and square root at the end. That is the 'performance indicator for the 3 minute single core search. Closer to 0 is better.

I compare that then with what 30 second searches on a 6-core give. This would be a larger number because a single thread search is more effective. But at some longer interval, say 60 seconds, the indicators are the same. Then I say that I have a 'speedup equivalent' of 3.0 on 6 cores.

The idea is that I get more than 1 bit of information out of every position search, and thus need fewer positions to get a good indicator. The downside is the need for long oracle searches. I had a few computers calculate those while I was traveling for a few weeks.

What about the positions where there is really a "right" score and everything else is wrong? Mate-in-N as one example. But any position where you can win material, or win a positional concession, or whatever. I am not so interested in the variance of the score, because one can produce enough of that by just playing the same game N times, ponder=on, and just taking a different amount of time before entering each move.

The only goal of SMP is to search faster. You can use time to complete a specific depth, or time to find the best move, but you do need a "time" measurement since the SMP performance metric is "speedup". Defined as

speedup = 1-cpu-time / N-cpu-time.

Normally N-cpu-time is smaller so the speedup is a number > 1.0, hopefully.

michiguel · Post by **michiguel** » Wed Apr 27, 2011 10:13 pm

bob wrote:
Laskos wrote:
bob wrote:
Laskos wrote:
bob wrote:
Laskos wrote:
Albert Silver wrote:
Laskos wrote:And on topic, Houdini 1.5 parallelization _practically_ works better than Crafty's
Data?
Don't have right now, but I did the following:
NPS and time to depth are useless.

In other words, you have no data. Time to depth is the _only_ way to measure SMP performance. and I do mean the _ONLY_ way.

A little silly from you. You must know that the ELO gain and time to solution are MUCH more reliable. Depth "8 core" is not the same as Depth "1 core" (maybe in Crafty it's the same, that would be pretty unusual).

Kai
You do realize that for _most_ positions, time to depth and time to solution are _identical_?
No, I don't quite realize.

Only those odd positions where a program changes its mind due to pure non-determinism will be different. Choose enough positions and those won't be a problem. If there is a clear best move, a program should find that move at the same depth, same PV, same score, most of the time. Hence a lot of runs to wash out the sampling error.
Anyway, my tests do not assume anything, and is reasonable to say that are more trustworthy than using time to depth.

Kai
Funny idea. SMP speedup is a direct measure of how much faster something runs. You are using something that is indirectly coupled to speed, without an accurate way of translating from Elo to Speedup at all, to measure something that can only be measured as:

speedup = 1-cpu-time / n-cpu-time

no other computation works for this.

That would work if the speedup is consistent, i.e. you always have, say 2x speed up for every single position (like doubling the clock of your computer), but you don't. And you know this very well. Some position could be 1.5x some 2.5x etc.

If the speed up dispersion is different, the effect on ELO would be different even if the speed up average is the same. This is just to name one of the many parameters that could make this more complicated than it looks like by simple inspection.

I agree 100% with Kai, time to depth does not tell the whole story.

Miguel

marcelk · Post by **marcelk** » Wed Apr 27, 2011 10:18 pm

bob wrote:
marcelk wrote:
bob wrote:
Laskos wrote:
bob wrote:
Laskos wrote:
Albert Silver wrote:
Laskos wrote:And on topic, Houdini 1.5 parallelization _practically_ works better than Crafty's
Data?
Don't have right now, but I did the following:
NPS and time to depth are useless.

In other words, you have no data. Time to depth is the _only_ way to measure SMP performance. and I do mean the _ONLY_ way.

A little silly from you. You must know that the ELO gain and time to solution are MUCH more reliable. Depth "8 core" is not the same as Depth "1 core" (maybe in Crafty it's the same, that would be pretty unusual).

Kai
You do realize that for _most_ positions, time to depth and time to solution are _identical_? Only those odd positions where a program changes its mind due to pure non-determinism will be different. Choose enough positions and those won't be a problem. If there is a clear best move, a program should find that move at the same depth, same PV, same score, most of the time. Hence a lot of runs to wash out the sampling error.
What I started doing recently, and I'm not claiming it is sound yet but it looks promising, is measuring score difference with as oracle an insanely long search of say 60 minutes.

I take this oracle reference and then measure the difference with, say, a 3 minute search on 1 core. I square these errors, average them and square root at the end. That is the 'performance indicator for the 3 minute single core search. Closer to 0 is better.

I compare that then with what 30 second searches on a 6-core give. This would be a larger number because a single thread search is more effective. But at some longer interval, say 60 seconds, the indicators are the same. Then I say that I have a 'speedup equivalent' of 3.0 on 6 cores.

The idea is that I get more than 1 bit of information out of every position search, and thus need fewer positions to get a good indicator. The downside is the need for long oracle searches. I had a few computers calculate those while I was traveling for a few weeks.
What about the positions where there is really a "right" score and everything else is wrong? Mate-in-N as one example. But any position where you can win material, or win a positional concession, or whatever. I am not so interested in the variance of the score, because one can produce enough of that by just playing the same game N times, ponder=on, and just taking a different amount of time before entering each move.

The only goal of SMP is to search faster. You can use time to complete a specific depth, or time to find the best move, but you do need a "time" measurement since the SMP performance metric is "speedup". Defined as

speedup = 1-cpu-time / N-cpu-time.

Normally N-cpu-time is smaller so the speedup is a number > 1.0, hopefully.

I have selected the test positions to have no scores outside abs(score)>2.5 I don't care too much what the engine does outside that range.

It is not just for SMP testing, that is a special case. I judge any search change with it at the moment.

To do a full validation of the method takes me some more time, but I'm setting it up, hopefully some data by this summer. A similar method works for me for judging evaluation changes, but that is a simpler problem.

bob · Post by **bob** » Wed Apr 27, 2011 10:31 pm

michiguel wrote:
bob wrote:
Laskos wrote:
bob wrote:
Laskos wrote:
bob wrote:
Laskos wrote:
Albert Silver wrote:
Laskos wrote:And on topic, Houdini 1.5 parallelization _practically_ works better than Crafty's
Data?
Don't have right now, but I did the following:
NPS and time to depth are useless.

In other words, you have no data. Time to depth is the _only_ way to measure SMP performance. and I do mean the _ONLY_ way.

A little silly from you. You must know that the ELO gain and time to solution are MUCH more reliable. Depth "8 core" is not the same as Depth "1 core" (maybe in Crafty it's the same, that would be pretty unusual).

Kai
You do realize that for _most_ positions, time to depth and time to solution are _identical_?
No, I don't quite realize.

Only those odd positions where a program changes its mind due to pure non-determinism will be different. Choose enough positions and those won't be a problem. If there is a clear best move, a program should find that move at the same depth, same PV, same score, most of the time. Hence a lot of runs to wash out the sampling error.
Anyway, my tests do not assume anything, and is reasonable to say that are more trustworthy than using time to depth.

Kai
Funny idea. SMP speedup is a direct measure of how much faster something runs. You are using something that is indirectly coupled to speed, without an accurate way of translating from Elo to Speedup at all, to measure something that can only be measured as:

speedup = 1-cpu-time / n-cpu-time

no other computation works for this.
That would work if the speedup is consistent, i.e. you always have, say 2x speed up for every single position (like doubling the clock of your computer), but you don't. And you know this very well. Some position could be 1.5x some 2.5x etc.

If the speed up dispersion is different, the effect on ELO would be different even if the speed up average is the same. This is just to name one of the many parameters that could make this more complicated than it looks like by simple inspection.

I agree 100% with Kai, time to depth does not tell the whole story.

Miguel

I don't typically see a huge change in speedup over a lot of positions. But in any case, the positions I use are consecutive positions from a real game (from the CB DTS paper circa 1995 or so) I run 'em all, several times, compute the average speedup from those so that no one position swamps the weighting.

Time to depth tells the only story that is meaningful, however. If you play games (and I have done a _ton_ of this on our cluster, playing 1, 2 4 and 8 cpu versions of Crafty against the usual gauntlet using just one cpu) you get a rough idea of Elo gain per CPU. But that has nothing to do with the actual parallel search performance when trying to discuss the concept of "speedup" which is the measure _everyone_ uses to measure parallel algorithm effectiveness. Once you get that Elo gain, there is no formula that translates that to a speedup. To create that formula, you need the speedup data as well, and then fit a curve to both so that one can predict the other and vice-versa. But by then you already have the actual speedup data everyone wants.

The elo per cpu is not very accurate, and doesn't compare program to program, and a program actually might gain more elo in one part of the game than in another. Too many variables. Too many averages of averages. Everyone has a good idea of what happens when a specific program runs 2x faster. Some get +50, some +70, some even more, some even less. But running 2x faster is a known quantity, which for any given program has a resulting implied Elo gain. Given good speedup data, I can predict the Elo gain more accurately than using the Elo gain to predict the speedup, with no speedup data to use for verification.

yes, I like to use games for all testing and decision-making. But for parallel search, speedup is by far the best measurement to base programming decisions on. There is a direct coupling. The Elo measurement is an indirect coupling.

Milos · Post by **Milos** » Wed Apr 27, 2011 10:37 pm

bob wrote:The only goal of SMP is to search faster. You can use time to complete a specific depth, or time to find the best move, but you do need a "time" measurement since the SMP performance metric is "speedup". Defined as

speedup = 1-cpu-time / N-cpu-time.

Normally N-cpu-time is smaller so the speedup is a number > 1.0, hopefully.

Wrong. Again you use your favorite logic fallacy - Ignoratio elenchi. Searching faster is irrelevant. The only relevant goal of SMP (and any other chess programming technique) is to increase the strength of a chess program.
As you claim yourself, ratio between "speed of search" and ELO is not linear. Ergo "speed of search" is not a relevant metric. Ergo your insisting on it is nothing but ignoratio elenchi. Q.E.D.

bob · Post by **bob** » Wed Apr 27, 2011 10:39 pm

marcelk wrote:
bob wrote:
marcelk wrote:
bob wrote:
Laskos wrote:
bob wrote:
Laskos wrote:
Albert Silver wrote:
Laskos wrote:And on topic, Houdini 1.5 parallelization _practically_ works better than Crafty's
Data?
Don't have right now, but I did the following:
NPS and time to depth are useless.

In other words, you have no data. Time to depth is the _only_ way to measure SMP performance. and I do mean the _ONLY_ way.

A little silly from you. You must know that the ELO gain and time to solution are MUCH more reliable. Depth "8 core" is not the same as Depth "1 core" (maybe in Crafty it's the same, that would be pretty unusual).

Kai
You do realize that for _most_ positions, time to depth and time to solution are _identical_? Only those odd positions where a program changes its mind due to pure non-determinism will be different. Choose enough positions and those won't be a problem. If there is a clear best move, a program should find that move at the same depth, same PV, same score, most of the time. Hence a lot of runs to wash out the sampling error.
What I started doing recently, and I'm not claiming it is sound yet but it looks promising, is measuring score difference with as oracle an insanely long search of say 60 minutes.

I take this oracle reference and then measure the difference with, say, a 3 minute search on 1 core. I square these errors, average them and square root at the end. That is the 'performance indicator for the 3 minute single core search. Closer to 0 is better.

I compare that then with what 30 second searches on a 6-core give. This would be a larger number because a single thread search is more effective. But at some longer interval, say 60 seconds, the indicators are the same. Then I say that I have a 'speedup equivalent' of 3.0 on 6 cores.

The idea is that I get more than 1 bit of information out of every position search, and thus need fewer positions to get a good indicator. The downside is the need for long oracle searches. I had a few computers calculate those while I was traveling for a few weeks.
What about the positions where there is really a "right" score and everything else is wrong? Mate-in-N as one example. But any position where you can win material, or win a positional concession, or whatever. I am not so interested in the variance of the score, because one can produce enough of that by just playing the same game N times, ponder=on, and just taking a different amount of time before entering each move.

The only goal of SMP is to search faster. You can use time to complete a specific depth, or time to find the best move, but you do need a "time" measurement since the SMP performance metric is "speedup". Defined as

speedup = 1-cpu-time / N-cpu-time.

Normally N-cpu-time is smaller so the speedup is a number > 1.0, hopefully.
I have selected the test positions to have no scores outside abs(score)>2.5 I don't care too much what the engine does outside that range.

It is not just for SMP testing, that is a special case. I judge any search change with it at the moment.

To do a full validation of the method takes me some more time, but I'm setting it up, hopefully some data by this summer. A similar method works for me for judging evaluation changes, but that is a simpler problem.

I don't use positions for anything other than very quick evaluation of a search change, or for measuring changes to the SMP code. Everything else I do is game-based. The problem with Elo and SMP is this.

Program A gains +50 Elo from using 4 cpus.
Program B gains +70 Elo from using 4 cpus.

Which one has the better SMP algorithm?

You can't answer that. A may search much deeper than B, and we know for certain that adding a ply has a diminishing return on elo gain. If B is much slower than A, then B might have a _worse_ SMP algorithm, but the speed gain helps B more than A.

That's the problem with using Elo, until you have a direct Elo to speed gain for each program. And once you have that, you don't need the Elo stuff.... Not to compare SMP effectiveness.

Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search