Measure of SMP scalability

Discussion of chess software programming and technical issues.

Moderators: hgm, Dann Corbit, Harvey Williamson

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Measure of SMP scalability

Post by bob »

syzygy wrote:
bob wrote:As I said, you ONLY want to argue. Go for it, if that makes you happy. The author said this "widening" was DIRECTLY caused by a bug he had already fixed. So using this data to show "widening" seems wrong. If not to you, fine. To me, it is clearly a bogus data point.
How is it a bogus data point if the program plays real chess?

Can you explain this?

If the measurements were performed incorrectly, then the data point is certainly bogus. But there is no indication that anything was wrong with the measurement.

This is really not difficult, I would think. Just be intellectually honest.

You could simply agree it's a data point. It's not a very interesting one for all I care, but to say it's bogus is just, well... like the Black Knight saying it's just a flesh wound...
How simple is it to follow this logic:

create a data point that supports some argument.

Author of program says "that data point is invalid, it is the direct result of a bug that I have fixed", which says the data point won't be reproduced if the program is tested again.

Why is this so hard to follow? I have a few remarkable SMP speedup samples tucked away. Caused by incorrect tree searching, where moves were pruned that should not have been. Due to a bug. Is a speedup > 6 on 4 cores, CONSISTENTLY, a good data point?

Your argument is silly.
syzygy
Posts: 5554
Joined: Tue Feb 28, 2012 11:56 pm

Re: Measure of SMP scalability

Post by syzygy »

bob wrote:How simple is it to follow this logic:

create a data point that supports some argument.

Author of program says "that data point is invalid, it is the direct result of a bug that I have fixed", which says the data point won't be reproduced if the program is tested again.
Be serious. Of course it will be reproduced if you use the same program. Since when is say Houdini 3's speedup only "real" if it is still there in Houdini 4? Does that make sense?
Why is this so hard to follow? I have a few remarkable SMP speedup samples tucked away. Caused by incorrect tree searching, where moves were pruned that should not have been. Due to a bug. Is a speedup > 6 on 4 cores, CONSISTENTLY, a good data point?
Oh come on, this is really apples and oranges. Kai measured engine strength. The strength may be due to a bug, but it is still strength. What your argument does prove is that engine strength is the only valid measure of smp efficiency. Time-to-reported-depth is trivial to fake, engine strength is not.

Let's take your argument one step further: yesterday I measured my engine's Elo and it was 3000. Today I discovered a bug, I fixed it, now it's Elo is 2500. I guess the 3000 was not real? (Yes, it did not really happen.)
User avatar
JuLieN
Posts: 2949
Joined: Mon May 05, 2008 12:16 pm
Location: Bordeaux (France)
Full name: Julien Marcel

Re: Measure of SMP scalability

Post by JuLieN »

[moderation]
A post containing an insult (and, automatically, its reply) was removed. To the original poster : please re-post without the insult.
To everybody : please, we're talking chess and computers here. No need to jump at each-other's throats. :)
"The only good bug is a dead bug." (Don Dailey)
[Blog: http://tinyurl.com/predateur ] [Facebook: http://tinyurl.com/fbpredateur ] [MacEngines: http://tinyurl.com/macengines ]
Henk
Posts: 7210
Joined: Mon May 27, 2013 10:31 am

Re: Measure of SMP scalability

Post by Henk »

syzygy wrote:
bob wrote:How simple is it to follow this logic:

create a data point that supports some argument.

Author of program says "that data point is invalid, it is the direct result of a bug that I have fixed", which says the data point won't be reproduced if the program is tested again.
Be serious. Of course it will be reproduced if you use the same program. Since when is say Houdini 3's speedup only "real" if it is still there in Houdini 4? Does that make sense?
Why is this so hard to follow? I have a few remarkable SMP speedup samples tucked away. Caused by incorrect tree searching, where moves were pruned that should not have been. Due to a bug. Is a speedup > 6 on 4 cores, CONSISTENTLY, a good data point?
Oh come on, this is really apples and oranges. Kai measured engine strength. The strength may be due to a bug, but it is still strength. What your argument does prove is that engine strength is the only valid measure of smp efficiency. Time-to-reported-depth is trivial to fake, engine strength is not.

Let's take your argument one step further: yesterday I measured my engine's Elo and it was 3000. Today I discovered a bug, I fixed it, now it's Elo is 2500. I guess the 3000 was not real? (Yes, it did not really happen.)
Programs with severe bugs cannot be trusted. And if they don't have them I don't trust them too.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Measure of SMP scalability

Post by Adam Hair »

bob wrote:
syzygy wrote:
bob wrote:As I said, you ONLY want to argue. Go for it, if that makes you happy. The author said this "widening" was DIRECTLY caused by a bug he had already fixed. So using this data to show "widening" seems wrong. If not to you, fine. To me, it is clearly a bogus data point.
How is it a bogus data point if the program plays real chess?

Can you explain this?

If the measurements were performed incorrectly, then the data point is certainly bogus. But there is no indication that anything was wrong with the measurement.

This is really not difficult, I would think. Just be intellectually honest.

You could simply agree it's a data point. It's not a very interesting one for all I care, but to say it's bogus is just, well... like the Black Knight saying it's just a flesh wound...
How simple is it to follow this logic:

create a data point that supports some argument.

Author of program says "that data point is invalid, it is the direct result of a bug that I have fixed", which says the data point won't be reproduced if the program is tested again.
Actually, Edsel stated that he expected Kai's result for Hannibal (time to depth was 1.33). You are the one who has claimed it to be invalid. Whether or not it is an unintentional bug, it still is a valid data point in the context of this discussion. And to reiterate, the general discussion has been about what is the most relevant measure of the effectiveness of the SMP implementation in a chess engine.

You have claimed that time to depth is a reasonable alternative to measuring Elo gain when determining the effectiveness. What Kai has pointed out with his data is that time to depth is not a good indicator of SMP effectiveness for some engines. That some engines gain more Elo with additional cores than their time to depth data would seem to indicate. And the answer for the Elo increase is that the search, while lacking in depth, is wider.

Let's look at Hannibal 1.3 for a moment. It has a bug that limits its SMP speedup for 4 cores to ~1.3. Given the typical gain of 50 to 100 Elo per doubling (the actual gain is related to the average depth being reached), Hannibal should gain something on the order of 19 to 39 Elo when using 4 cores. However, according to the CEGT 40/20 list, the gain is 58 Elo.

For Komodo, the expected increase would be 38 to 75 Elo. According to the CEGT 40/20 list, Komodo 5.1 4cpu is 3112. Now, the 1cpu version is not on this list, but it is known to be between Komodo CCT and Komodo 5.0 in strength. So, the rating for Komodo 5.1 1cpu would likely lie between 2994 and 3014. Thus, the increase is ~ 98 to 118 Elo.

All of these numbers are hazy for various reasons. But I do know from my tests and from others that the expected increase, when using those speed up numbers, would be towards the lower end of the given ranges when using the 40/20 (or equivalent) time control. So, these two engines are most likely gaining Elo over and above that predicted by the time to depth data. Thus, their SMP implementation is more effective than predicted by the usual measurement.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Measure of SMP scalability

Post by bob »

Adam Hair wrote:
bob wrote:
syzygy wrote:
bob wrote:As I said, you ONLY want to argue. Go for it, if that makes you happy. The author said this "widening" was DIRECTLY caused by a bug he had already fixed. So using this data to show "widening" seems wrong. If not to you, fine. To me, it is clearly a bogus data point.
How is it a bogus data point if the program plays real chess?

Can you explain this?

If the measurements were performed incorrectly, then the data point is certainly bogus. But there is no indication that anything was wrong with the measurement.

This is really not difficult, I would think. Just be intellectually honest.

You could simply agree it's a data point. It's not a very interesting one for all I care, but to say it's bogus is just, well... like the Black Knight saying it's just a flesh wound...
How simple is it to follow this logic:

create a data point that supports some argument.

Author of program says "that data point is invalid, it is the direct result of a bug that I have fixed", which says the data point won't be reproduced if the program is tested again.
Actually, Edsel stated that he expected Kai's result for Hannibal (time to depth was 1.33). You are the one who has claimed it to be invalid. Whether or not it is an unintentional bug, it still is a valid data point in the context of this discussion. And to reiterate, the general discussion has been about what is the most relevant measure of the effectiveness of the SMP implementation in a chess engine.

You have claimed that time to depth is a reasonable alternative to measuring Elo gain when determining the effectiveness. What Kai has pointed out with his data is that time to depth is not a good indicator of SMP effectiveness for some engines. That some engines gain more Elo with additional cores than their time to depth data would seem to indicate. And the answer for the Elo increase is that the search, while lacking in depth, is wider.

Let's look at Hannibal 1.3 for a moment. It has a bug that limits its SMP speedup for 4 cores to ~1.3. Given the typical gain of 50 to 100 Elo per doubling (the actual gain is related to the average depth being reached), Hannibal should gain something on the order of 19 to 39 Elo when using 4 cores. However, according to the CEGT 40/20 list, the gain is 58 Elo.

For Komodo, the expected increase would be 38 to 75 Elo. According to the CEGT 40/20 list, Komodo 5.1 4cpu is 3112. Now, the 1cpu version is not on this list, but it is known to be between Komodo CCT and Komodo 5.0 in strength. So, the rating for Komodo 5.1 1cpu would likely lie between 2994 and 3014. Thus, the increase is ~ 98 to 118 Elo.

All of these numbers are hazy for various reasons. But I do know from my tests and from others that the expected increase, when using those speed up numbers, would be towards the lower end of the given ranges when using the 40/20 (or equivalent) time control. So, these two engines are most likely gaining Elo over and above that predicted by the time to depth data. Thus, their SMP implementation is more effective than predicted by the usual measurement.
What are you talking about? He SPECIFICALLY said "this was a serious bug..." And that he had fixed it. How can you use data from a program with a KNOWN serious bug to support something that the bug caused?

I just fixed a new version of Crafty where I wrongly handled reductions and pruning at split points, and blew the tree up over 2x its normal size. Hell of a "widening example" wouldn't you say? And one that was caused by my not keeping up with "moves searched" at the split ply correctly. Should we use THAT data as well? Of course not.

When someone says "that version has a serious bug" no serious scientist on the planet would use that version to produce data and then report on it to support their argument.

This is ridiculousness at its worst. If you want to use that data, feel free. I don't think anyone "serious" will consider it valid however. It's bad enough that our data has unknown bugs lurking inside, we certainly don't want it to have a known significant bug included... At least I don't...

Not worth arguing further however... This seems to be an argument held just for the sake of arguing...

The Komodo data is interesting. The hannibal data is useless.
syzygy
Posts: 5554
Joined: Tue Feb 28, 2012 11:56 pm

Re: Measure of SMP scalability

Post by syzygy »

bob wrote:What are you talking about? He SPECIFICALLY said "this was a serious bug..." And that he had fixed it. How can you use data from a program with a KNOWN serious bug to support something that the bug caused?
The measurement is real.
I just fixed a new version of Crafty where I wrongly handled reductions and pruning at split points, and blew the tree up over 2x its normal size. Hell of a "widening example" wouldn't you say? And one that was caused by my not keeping up with "moves searched" at the split ply correctly. Should we use THAT data as well? Of course not.
Did the doubled tree size help in terms of Elo? That is the question and that is what was measured by Kai.
When someone says "that version has a serious bug" no serious scientist on the planet would use that version to produce data and then report on it to support their argument.
Penicillin never existed?
This is ridiculousness at its worst.
You seem to be somewhat alone in this.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Measure of SMP scalability

Post by michiguel »

bob wrote:
Adam Hair wrote:
bob wrote:
syzygy wrote:
bob wrote:As I said, you ONLY want to argue. Go for it, if that makes you happy. The author said this "widening" was DIRECTLY caused by a bug he had already fixed. So using this data to show "widening" seems wrong. If not to you, fine. To me, it is clearly a bogus data point.
How is it a bogus data point if the program plays real chess?

Can you explain this?

If the measurements were performed incorrectly, then the data point is certainly bogus. But there is no indication that anything was wrong with the measurement.

This is really not difficult, I would think. Just be intellectually honest.

You could simply agree it's a data point. It's not a very interesting one for all I care, but to say it's bogus is just, well... like the Black Knight saying it's just a flesh wound...
How simple is it to follow this logic:

create a data point that supports some argument.

Author of program says "that data point is invalid, it is the direct result of a bug that I have fixed", which says the data point won't be reproduced if the program is tested again.
Actually, Edsel stated that he expected Kai's result for Hannibal (time to depth was 1.33). You are the one who has claimed it to be invalid. Whether or not it is an unintentional bug, it still is a valid data point in the context of this discussion. And to reiterate, the general discussion has been about what is the most relevant measure of the effectiveness of the SMP implementation in a chess engine.

You have claimed that time to depth is a reasonable alternative to measuring Elo gain when determining the effectiveness. What Kai has pointed out with his data is that time to depth is not a good indicator of SMP effectiveness for some engines. That some engines gain more Elo with additional cores than their time to depth data would seem to indicate. And the answer for the Elo increase is that the search, while lacking in depth, is wider.

Let's look at Hannibal 1.3 for a moment. It has a bug that limits its SMP speedup for 4 cores to ~1.3. Given the typical gain of 50 to 100 Elo per doubling (the actual gain is related to the average depth being reached), Hannibal should gain something on the order of 19 to 39 Elo when using 4 cores. However, according to the CEGT 40/20 list, the gain is 58 Elo.

For Komodo, the expected increase would be 38 to 75 Elo. According to the CEGT 40/20 list, Komodo 5.1 4cpu is 3112. Now, the 1cpu version is not on this list, but it is known to be between Komodo CCT and Komodo 5.0 in strength. So, the rating for Komodo 5.1 1cpu would likely lie between 2994 and 3014. Thus, the increase is ~ 98 to 118 Elo.

All of these numbers are hazy for various reasons. But I do know from my tests and from others that the expected increase, when using those speed up numbers, would be towards the lower end of the given ranges when using the 40/20 (or equivalent) time control. So, these two engines are most likely gaining Elo over and above that predicted by the time to depth data. Thus, their SMP implementation is more effective than predicted by the usual measurement.
What are you talking about? He SPECIFICALLY said "this was a serious bug..." And that he had fixed it. How can you use data from a program with a KNOWN serious bug to support something that the bug caused?

I just fixed a new version of Crafty where I wrongly handled reductions and pruning at split points, and blew the tree up over 2x its normal size. Hell of a "widening example" wouldn't you say? And one that was caused by my not keeping up with "moves searched" at the split ply correctly. Should we use THAT data as well? Of course not.

When someone says "that version has a serious bug" no serious scientist on the planet would use that version to produce data and then report on it to support their argument.
That is what experimental science do all the time. They pick a protein, gene, organism, etc, and make a "mutant" (==bug) to study an altered behavior to correlate (or study cause) structure with function. Many times scientist do not make the mutants, but they observe what nature offers. Same thing. For instance, we would not have advanced so much in biochemistry without the ability to study "nature defects", which are the equivalent of "bugs".

So, this is a very interesting data point that should attract the attention of the curious.

Miguel

This is ridiculousness at its worst. If you want to use that data, feel free. I don't think anyone "serious" will consider it valid however. It's bad enough that our data has unknown bugs lurking inside, we certainly don't want it to have a known significant bug included... At least I don't...

Not worth arguing further however... This seems to be an argument held just for the sake of arguing...

The Komodo data is interesting. The hannibal data is useless.
syzygy
Posts: 5554
Joined: Tue Feb 28, 2012 11:56 pm

Re: Measure of SMP scalability

Post by syzygy »

michiguel wrote:So, this is a very interesting data point that should attract the attention of the curious.
The curious, also known as the scientifically inclined.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Measure of SMP scalability

Post by bob »

syzygy wrote:
bob wrote:What are you talking about? He SPECIFICALLY said "this was a serious bug..." And that he had fixed it. How can you use data from a program with a KNOWN serious bug to support something that the bug caused?
The measurement is real.
The data is bogus. So what is your point? Incorrect data can say anything. Yet means little.

I just fixed a new version of Crafty where I wrongly handled reductions and pruning at split points, and blew the tree up over 2x its normal size. Hell of a "widening example" wouldn't you say? And one that was caused by my not keeping up with "moves searched" at the split ply correctly. Should we use THAT data as well? Of course not.
Did the doubled tree size help in terms of Elo? That is the question and that is what was measured by Kai.
Doubling the tree ALWAYS increases the ELo, unless time is a factor.

When someone says "that version has a serious bug" no serious scientist on the planet would use that version to produce data and then report on it to support their argument.
Penicillin never existed?
Hyperbole and distortion certainly do.
This is ridiculousness at its worst.
You seem to be somewhat alone in this.
I'm not so sure about that. Most rational people would not think about analyzing data from a program the author says is broken...