Peculiarity of Komodo 5.1MP

Laskos · Post by **Laskos** » Thu Jun 27, 2013 5:22 pm

Houdini wrote:
Laskos wrote:I tested at 5''+0.1'' the effectiveness of Komodo MP implementation to that of Houdini, from 1 to 4 threads
Code: Select all
    Program                            Score     %     Elo

  1 Houdini 3  1 thread           : 640.5/1000  64.0   3050
  2 Komodo 5.1 1 thread           : 359.5/1000  35.9   2950

Houdini +100 points


    Program                            Score     %     Elo 

  1 Houdini 3  4 threads          : 642.5/1000  64.2   3051
  2 Komodo 5.1 4 threads          : 357.5/1000  35.8   2949

Houdini +102 points
The effectiveness is very comparable, within error margins.
Kai, thanks, very interesting results.

To isolate the impact of the SMP implementation from any TC scaling effects, you might consider comparing (5"+0.1" with 4 CPU) to (15"+0.3" with 1 CPU) - or whatever TC that produces approximately the same average depth for Houdini.

The fact that Houdini maintains its 100 point lead while you're effectively tripling the TC by 3 suggests that Houdini's SMP implementation is marginally more effective (if results were confirmed with a larger number of games).

Here it is:
15'' + 0.3''
1 thread

Code: Select all

    Program                             Score     %     Elo

  1 Houdini 3  1 thread           : 1258.5/2000  62.9   3046
  2 Komodo  5  1 thread           :  741.5/2000  37.1   2954

Houdini +92 Elo points

5'' + 0.1''
4 threads

Code: Select all

    Program                            Score     %     Elo 

  1 Houdini 3  4 threads          : 642.5/1000  64.2   3051
  2 Komodo 5.1 4 threads          : 357.5/1000  35.8   2949

Houdini +102 Elo points

So, from 1 to 4 threads Houdini gains 10+/-20 Elo points more than Komodo. Within error margins, but a suggestion (80% or so) that Houdini SMP implementation is a tiny bit better.

bob · Post by **bob** » Thu Jun 27, 2013 5:43 pm

duncan wrote:
bob wrote:
The test did not address your speculation.

The correct test is to take a group of parallel programs and play 'em in a large match, using 1 core per engine. Then take each engine, one at a time, and run 'em at 4 cores. Measure the Elo improvement. If Komodo gains more than the others, I'll stand ready to eat my hat. the test, as run, doesn't show a single thing about this topic, however...
is your hat safe ? or were you not referring to eqivalent gain.

duncan

I was referring to Elo gain, period. And it seems my hat is safe...

bob · Post by **bob** » Thu Jun 27, 2013 5:46 pm

michiguel wrote:
bob wrote:
Don wrote:
syzygy wrote:
bob wrote:So using "SMP efficiency" was the wrong term to use in the first post in this thread.
So this whole regrettable discussion turns around your personal definition of "SMP efficiency" that you have decided to impose upon all of us.

Any reasonable person understood perfectly well how the term was used in the first post.
This is typical of Bob's style when he is backed into a corner, he finds some technicality and digs in. It would be far easier for him to just to just yield a little ground and he could do this and still maintain his dignity but he will keep going until he cannot back down.
Don, did you not go to graduate school? And study parallel computing among other things?

Please provide ANY citation you want where "SMP efficiency" is defined any way other than what I quoted. Ever heard of Amdahl's law? Know what it is based on? That SAME definition.

Not all problems are governed by Amdhal's law (fixed task, time variable). Some are time constant, task variable (Guftafson's law)
http://en.wikipedia.org/wiki/Gustafson%27s_Law

There, with Guftafson's problems, the speedup is not measured by time to reach a constant task ("time to depth") but how much extra work (EDIT: or quality!!) you can fit into a constant time.

Chess engines when become parallel do not necessarily follow Amdhal's law since they do not do the exact same thing in MP mode, and they do not travel the exact same tree (and it could be engine dependent as it was demonstrated in this thread). In addition, same depth, does not imply same quality of work.

The goal on most problems is solve them quicker, but not in chess. The goal is to play stronger, in which playing quicker is one of the parameters that control strength, but not the only one. At least, not necessarily the only one. So, to be pedantic, since it looks what this thread became about, SMP efficiency is a concept that cannot (necessarily) be used in the "traditional" way here. We apply Guftafson's, Amdahl's, or neither?

Miguel

Jeez, why so sanctimonious here? The term has a specific meaning. It was used in the first post in this thread. I pointed out it was wrong. It is STILL wrong. I'm not in any sort of corner at all, just you guys that want to completely misuse a standard CIS term...

You completely missed the point. The term "SMP efficiency" is very specific. It is the basis for Amdahl's law. Doesn't matter WHAT the discipline or application, SMP efficiency always means the same thing.

The simplest thing to measure is time-to-depth for chess engines. However, one COULD measure "time to correct move" which would account for the point being over-discussed in this thread (wider, etc. search).

But we have NEITHER of those two things used in the post that started this discussion...

Don · Post by **Don** » Thu Jun 27, 2013 5:47 pm

bob wrote:
duncan wrote:
bob wrote:
The test did not address your speculation.

The correct test is to take a group of parallel programs and play 'em in a large match, using 1 core per engine. Then take each engine, one at a time, and run 'em at 4 cores. Measure the Elo improvement. If Komodo gains more than the others, I'll stand ready to eat my hat. the test, as run, doesn't show a single thing about this topic, however...
is your hat safe ? or were you not referring to eqivalent gain.

duncan
I was referring to Elo gain, period. And it seems my hat is safe...

In other words Komodo has to gain more than ANY other program before you will eat your hat. If it's about the same, then you feel it has horrible parallel scalability. Nothing in between. Is that about right?

bob · Post by **bob** » Thu Jun 27, 2013 5:50 pm

Cumnor wrote:
bob wrote:
Don wrote:
syzygy wrote:
bob wrote:So using "SMP efficiency" was the wrong term to use in the first post in this thread.
So this whole regrettable discussion turns around your personal definition of "SMP efficiency" that you have decided to impose upon all of us.

Any reasonable person understood perfectly well how the term was used in the first post.
This is typical of Bob's style when he is backed into a corner, he finds some technicality and digs in. It would be far easier for him to just to just yield a little ground and he could do this and still maintain his dignity but he will keep going until he cannot back down.
Don, did you not go to graduate school? And study parallel computing among other things?

Please provide ANY citation you want where "SMP efficiency" is defined any way other than what I quoted. Ever heard of Amdahl's law? Know what it is based on? That SAME definition.

Jeez, why so sanctimonious here? The term has a specific meaning. It was used in the first post in this thread. I pointed out it was wrong. It is STILL wrong. I'm not in any sort of corner at all, just you guys that want to completely misuse a standard CIS term...
Actually I was just reading about Amdahl's law http://www.clustermonkey.net/Parallel-P ... r-law.html

One of many good explanations of the problem. He's giving a classic "SMP efficiency" number when he uses (say) 40 mowers to complete the job in 21 minutes. Just under 3:1. It is a very specific measurement.

bob · Post by **bob** » Thu Jun 27, 2013 5:55 pm

syzygy wrote:
bob wrote:
syzygy wrote:That Komodo's SMP implementation is effective in terms of playing strength / elo gain per core follows from other threads.
Nope. Nothing to base that on.
What in "follows from other threads" is unclear to you?

bob wrote:Elo is based on time, where both players have equal time to play the game. This test does not even approximate that standard.
"Follows from other threads." I am sorry, but I am not going to spell it out any further.

Please don't. This thread is where this is being discussed. Nothing of interest regarding parallel performance here. And an incorrect usage of a precise term to boot...

syzygy wrote:
bob wrote:Ergo <zero information>
Nope. It does show that Komodo's SMP implementation is different. Komodo's 4-core search to a particular (reported) depth is clearly of higher quality than Komodo's 1-core search to the same (reported) depth.
Is a search without LMR "higher quality"? Most of us equate "quality" with "Elo". But we have ZERO data about the Elo gain... Because time is a part of the equation and it was left out.
This is a little bit like I am talking to a very small kid.
Do you know what was tested? Komodo at fixed depth with 4 threads versus Komodo at fixed depth with 1 thread. Komodo with 4 threads scored way better. This shows that "Komodo's 4-core search to a particular (reported) depth is clearly of higher quality than Komodo's 1-core search to the same (reported) depth".
[/quote

How does it show that? Suppose the 4 cpu test takes much longer than the 1 cpu test (no data was given regarding time)?? So is a much slower search "higher quality" just because it scores better? Even though if you play it in normal timed games it loses on time nearly every time. As I said, no data to conclude anything. Feel free to drop the "small kid" nonsense, unless that is the equivalent to a C "self-referential structure pointer".

If you insist on quibbling about these words then you're just slightly too sore for my taste.

And people were complaining about Henk...
I simply want to see (a) useful information and (b) correct usage of a commonly-mentioned term. Nothing more, nothing less. You, on the other hand, seem solely interested in arguing.

bob · Post by **bob** » Thu Jun 27, 2013 5:57 pm

Laskos wrote:
Modern Times wrote:
Laskos wrote:I think we should agree that SMP efficiency in chess is measured in Elos, not depths.
For an end user, yes that would be the definition. But Bob may call that "SMP Effectiveness" or something like that, which he views as different from his narrower technical definition of SMP efficiency.
I think for everybody, Bob included, was clear what was meant. Going from 1 to N cores, how large is the factor time(1cpu)/time(Ncpu) at equal _strength_. Bob started this "misunderstanding" game after 60-70 posts in this thread, and tha shows his character, no more.

The fact is time-to-depth is not a measure of the effectiveness of SMP implementation (as defined above), because parallelly one doesn't search the same tree as sequentially.

And again, WITHOUT TIME, your data says nothing about quality or quantity. If there is an 80 elo gain due to wider search, and a 120 elo loss caused by slowing the search down, one is not "winning" there. And without any reference to time, one can't conclude anything at all, other than the parallel search is "bigger". But is "bigger" the same as "better"? Not without a time reference.

bob · Post by **bob** » Thu Jun 27, 2013 7:41 pm

Don wrote:
bob wrote:
duncan wrote:
bob wrote:
The test did not address your speculation.

The correct test is to take a group of parallel programs and play 'em in a large match, using 1 core per engine. Then take each engine, one at a time, and run 'em at 4 cores. Measure the Elo improvement. If Komodo gains more than the others, I'll stand ready to eat my hat. the test, as run, doesn't show a single thing about this topic, however...
is your hat safe ? or were you not referring to eqivalent gain.

duncan
I was referring to Elo gain, period. And it seems my hat is safe...
In other words Komodo has to gain more than ANY other program before you will eat your hat. If it's about the same, then you feel it has horrible parallel scalability. Nothing in between. Is that about right?

What part of "If komodo gains more than the others" don't you understand? The claim here has been that somehow your search doesn't gain as much depth as others, but it gains more from this "widening."

It is an easy enough claim to back up, obviously. I know where my gains come from, and it is not from searching more nodes than the serial version. Searching more nodes is a loss for me, not a gain. If your SMP algorithm scales better than the rest due to your widening, it ought to be easy enough to prove. I might be able to set up a user account for you where you can test this hypothesis on big hardware, if you can't do it on what you have... Then we can decide whether I need to eat my hat or not.

At the moment, I am unconvinced. Real data can convince me, as always.

Hand-waving, not so much...

Don · Post by **Don** » Thu Jun 27, 2013 7:49 pm

bob wrote:
Don wrote:
bob wrote:
duncan wrote:
bob wrote:
The test did not address your speculation.

The correct test is to take a group of parallel programs and play 'em in a large match, using 1 core per engine. Then take each engine, one at a time, and run 'em at 4 cores. Measure the Elo improvement. If Komodo gains more than the others, I'll stand ready to eat my hat. the test, as run, doesn't show a single thing about this topic, however...
is your hat safe ? or were you not referring to eqivalent gain.

duncan
I was referring to Elo gain, period. And it seems my hat is safe...
In other words Komodo has to gain more than ANY other program before you will eat your hat. If it's about the same, then you feel it has horrible parallel scalability. Nothing in between. Is that about right?

What part of "If komodo gains more than the others" don't you understand? The claim here has been that somehow your search doesn't gain as much depth as others, but it gains more from this "widening."

I never made such a claim though.

It is an easy enough claim to back up, obviously. I know where my gains come from, and it is not from searching more nodes than the serial version. Searching more nodes is a loss for me, not a gain. If your SMP algorithm scales better than the rest due to your widening, it ought to be easy enough to prove.

I never made such a claim and I don't remember anyone else making it either, but it's possible that someone has. If so I'm not responsible for that.

I think I said right after release on some thread here that we don't have a strong sense of exactly how will it scales which is how I still feel. I am actually happy with the reports so far but I am under no illusion that Komodo scales better with cores than other programs.

I might be able to set up a user account for you where you can test this hypothesis on big hardware, if you can't do it on what you have... Then we can decide whether I need to eat my hat or not.

At the moment, I am unconvinced. Real data can convince me, as always.

Hand-waving, not so much...

How much CPU power do you have? I would love to do a big study. Do I have to compile it on your system? I can do that of course.

Don

Laskos · Post by **Laskos** » Thu Jun 27, 2013 8:44 pm

bob wrote:
Laskos wrote:
Modern Times wrote:
Laskos wrote:I think we should agree that SMP efficiency in chess is measured in Elos, not depths.
For an end user, yes that would be the definition. But Bob may call that "SMP Effectiveness" or something like that, which he views as different from his narrower technical definition of SMP efficiency.
I think for everybody, Bob included, was clear what was meant. Going from 1 to N cores, how large is the factor time(1cpu)/time(Ncpu) at equal _strength_. Bob started this "misunderstanding" game after 60-70 posts in this thread, and tha shows his character, no more.

The fact is time-to-depth is not a measure of the effectiveness of SMP implementation (as defined above), because parallelly one doesn't search the same tree as sequentially.
And again, WITHOUT TIME, your data says nothing about quality or quantity. If there is an 80 elo gain due to wider search, and a 120 elo loss caused by slowing the search down, one is not "winning" there. And without any reference to time, one can't conclude anything at all, other than the parallel search is "bigger". But is "bigger" the same as "better"? Not without a time reference.

First, to be your quality of unnecessary pedantic when at a loss with arguments, it's not "elo", do your homework if you talk of chess, it's "Elo" from Arpad Emrick Elo, the creator of the Elo rating system for two-player games such as chess.

If those +80 Elos from the first test don't give a clue of what's happening, then how I predicted that time-to-depth factor from 1 to 4 cores is 1.5-2 for Komodo instead of 3-3.2 for typical engines? It turned out to be 1.68. Read my earlier post, where I predicted that without evidence, being away from my desktop. If you do not think further than pedantic definitions, when everything I stated in my first post is pretty clear, sure you will be stuck in absurd arguments. "To think is hard" (Goethe).

Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP