Peculiarity of Komodo 5.1MP

Modern Times · Post by **Modern Times** » Tue Jun 25, 2013 1:42 pm

Laskos wrote:I think we should agree that SMP efficiency in chess is measured in Elos, not depths.

For an end user, yes that would be the definition. But Bob may call that "SMP Effectiveness" or something like that, which he views as different from his narrower technical definition of SMP efficiency.

Don · Post by **Don** » Tue Jun 25, 2013 2:10 pm

Modern Times wrote:
Laskos wrote:I think we should agree that SMP efficiency in chess is measured in Elos, not depths.
For an end user, yes that would be the definition. But Bob may call that "SMP Effectiveness" or something like that, which he views as different from his narrower technical definition of SMP efficiency.

Bob's definition may not be practical for real world applications but it is a valid concept. In computer science you often study some specific thing in isolation to understand its theoretical properties. Still, I don't believe that is what was happening here.

syzygy · Post by **syzygy** » Tue Jun 25, 2013 2:29 pm

Don wrote:
Modern Times wrote:
Laskos wrote:I think we should agree that SMP efficiency in chess is measured in Elos, not depths.
For an end user, yes that would be the definition. But Bob may call that "SMP Effectiveness" or something like that, which he views as different from his narrower technical definition of SMP efficiency.
Bob's definition may not be practical for real world applications but it is a valid concept. In computer science you often study some specific thing in isolation to understand its theoretical properties. Still, I don't believe that is what was happening here.

It certainly wasn't. Otherwise he could (and should) have simply stated that Kai was using an incorrect technical term but really meant <whatever term Bob would have liked better>. He could have been constructive. Instead he chose the path of "you are all wrong".

Note that for about the first half of the thread he insisted on "time-to-depth is the only way to measure smp efficiency" and even "time-to-depth is the only way to measure the COMPLETE smp implementation" rather than "time-to-depth is the technical definition of smp efficiency even though it may not capture elo gain at all" which he has turned to now to escape what he sees as "defeat".

Don · Post by **Don** » Tue Jun 25, 2013 2:51 pm

syzygy wrote:
Don wrote:
Modern Times wrote:
Laskos wrote:I think we should agree that SMP efficiency in chess is measured in Elos, not depths.
For an end user, yes that would be the definition. But Bob may call that "SMP Effectiveness" or something like that, which he views as different from his narrower technical definition of SMP efficiency.
Bob's definition may not be practical for real world applications but it is a valid concept. In computer science you often study some specific thing in isolation to understand its theoretical properties. Still, I don't believe that is what was happening here.
It certainly wasn't. Otherwise he could (and should) have simply stated that Kai was using an incorrect technical term but really meant <whatever term Bob would have liked better>. He could have been constructive. Instead he chose the path of "you are all wrong".

Note that for about the first half of the thread he insisted on "time-to-depth is the only way to measure smp efficiency" and even "time-to-depth is the only way to measure the COMPLETE smp implementation" rather than "time-to-depth is the technical definition of smp efficiency even though it may not capture elo gain at all" which he has turned to now to escape what he sees as "defeat".

This is his style - in fact not just his but most arguments on this forum go that way. Two people take a stand on some issue and then you will notice that both sides try to "close the box", which is term I made up which means they will try to re-define the other persons position in terms they can argue against better. Try to bend it and shape it their way.

It sometimes works if one of the parties is skilled enough because they will do it very gradually just by little changes in the wording of the other party and sometime the other party will accept the re-definition of their stance without realizing it. But the other side of this is that when one of the parties sees they are losing they will re-frame THEIR OWN statements. You have to be perceptive sometimes because that can be legitimate - especially if you were sloppy about what you said and it doesn't reflect what you actually meant. However it's difficult to convince someone that you meant something other than what you said even if it's true. But Bob is especially skilled at re-framing his position, you will see him swear nothing has changed as he gradually re-defines his statement in a way that you cannot pin him down. For him it comes in the forms of "revelations", he was talking about that the whole time and evidently just never bothered to say it. In reality we know better.

The thing that really give him away in this case is simply the fact that he started out being extremely critical. Once that happened nothing he said afterwards really mattered.

Rein Halbersma · Post by **Rein Halbersma** » Tue Jun 25, 2013 3:38 pm

Don,

I have been re-reading the old Cilk-Chess papers lately, in which you and Larry are frequently mentioned (and you wrote one yourself with prof. Leiserson!). I wonder if you still apply the concepts of Work and Span (or Critical Length) to measure the scaling properties of Komodo? AFAICS, the big misunderstanding (deliberate or otherwise) in this thread can be framed in terms of these concepts.

E.g. in one of the Cilk-Chess papers (I think the thesis by Kuszmaul) it was mentioned that making the program less parallel (e.g. by searching the first 2 moves serially and then doing the parallel stuff) was better for a small number of cores, but scaled worse for lots of cores. Rephrased: doing less Work at the cost of a longer Critical Length is good for a serial / 1-core program, but not necessarily for a parallel 4-core program. Or: searching wider might not be bad for a 4-core program if that extra width can be done in parallel (which a serial/1-core program cannot).

This behavior should only be observable if the available parallelism was already mostly exhausted at 4 cores. So I was thinking that todays super-selective super-reducing programs have less parallelism in the tree than in the old days (because the branching factor is much smaller), but maybe that is offset by the higher depths that they reach. Did you do any measurements on this?

Rein

Don · Post by **Don** » Tue Jun 25, 2013 4:08 pm

Rein Halbersma wrote:Don,

I have been re-reading the old Cilk-Chess papers lately, in which you and Larry are frequently mentioned (and you wrote one yourself with prof. Leiserson!). I wonder if you still apply the concepts of Work and Span (or Critical Length) to measure the scaling properties of Komodo? AICS, the big misunderstanding (deliberate or otherwise) in this thread can be framed in terms of these concepts.

E.g. in one of the Cilk-Chess papers (I think the thesis by Kuszmaul) it was mentioned that making the program less parallel (e.g. by searching the first 2 moves serially and then doing the parallel stuff) was better for a small number of cores, but scaled worse for lots of cores. Rephrased: doing less Work at the cost of a longer Critical Length is good for a serial / 1-core program, but not necessarily for a parallel 4-core program. Or: searching wider might not be bad for a 4-core program if that extra width can be done in parallel (which a serial/1-core program cannot).

This behavior should only be observable if the available parallelism was already mostly exhausted at 4 cores. So I was thinking that todays super-selective super-reducing programs have less parallelism in the tree than in the old days (because the branching factor is much smaller), but maybe that is offset by the higher depths that they reach. Did you do any measurements on this?

Rein

Those papers provided a great theoretical frameworks for talking about parallelism where the critical path is work that cannot be done in parallel. So no matter how many processors you have there is a bound on how fast you will get there. But those papers do not really apply to this discussion because it assumes that all efforts are applied strictly to speeding up a very specific calculation.

There is more than one way to skin a cat and we intend to at least TRY to find some innovations here. I'm not saying we will succeed because we are not the first to have such ideas nor will we be the last. Many ideas may not be workable, but they should be explored because ANYTHING a chess program does could be sped up. One example of this is the evaluation. If you have a complicated enough evaluation could could (at least in theory) have 2 processors working together on this if you can split this into separate functional pieces. Now I'm not saying that is easy to do or even possible (in fact I seriously doubt this is workable), but it does make the point that you do not have to always do things the way someone (such as Bob or anyone else) says it has to be done.

It has often been reported that beyond something like 128 or 256 processors there is almost no gain. If that is true, and I don't know that it is, one could consider the feasibility of a very old idea which is to conduct multiple searches with different aspiration windows or perhaps different algorithms. It might even work to simply let 256 processors do a search and the other 256 do the same search and go with the one that searches the deepest. The reason this can work is that parallelism is non-deterministic and it's extremely unlikely both searches would finish at exactly the same time and actually quite likely that one will finish significantly faster. Even this simple-minded "trick" would be better than using 512 cores if there is no speedup at all.

It's not out of the question to dedicate a portion of the machine to doing a zero width window search to prove the score won't drop too much and to provide a cue that it might have to be dealt with. Things like this can work if the processors are mostly wasted anyway. But even with 4 cores there is a lot of waste - as you can see that the ELO / node per second ratio is not favorable.

Don

Laskos · Post by **Laskos** » Wed Jun 26, 2013 2:40 am

bob wrote:
Time-to-depth IS the correct measure. If Komodo searches wider, then it searches less efficiently. One wants to measure the COMPLETE SMP implementation, not just how the tree is split.

I am back to my desktop i7, 4 physical cores. I computed the COMPLETE SMP implementation of Komodo 5.1MP as following:
30 positions repeated 5 times to _fixed_ depth 21 on 1 core and 4 cores. Total time is given below:

Code: Select all

Komodo 5.1MP 4 cores 108:50 minutes
Komodo 5.1MP 1 core  183:05 minutes

ratio: 1.68

If I am not missing another "Bob technicality", Komodo's factor time-to-depth from one to 4 cores is 1.68 compared to 3.0-3.2 time-to-depth factors from 1 to 4 cores of typical engines Houdini, Stockfish, Critter, Crafty, etc. (Crafty's given by you is 3.2). Therefore "the COMPLETE SMP implementation" is worse in Komodo Elo-wise (if I am not missing some technicality) by a factor of log(3.0~3.2)/log(1.68)=2.1~2.2. If a typical engine gains 100 points from 1 to 4 cores on 40/4 TC, then Komodo, according to you, should gain only 45-47 points. It's gaining the same 100 points.

Time-to-depth is a technical notion for idealized "SMP efficiency" which does not apply in a real world of chess engines. You could have objected to my use of "SMP efficiency" from the beginning or not use "COMPLETE SMP implementation" term, which is not in favor of your later "technicality" argument.

bob · Post by **bob** » Wed Jun 26, 2013 6:51 am

syzygy wrote:
bob wrote:So using "SMP efficiency" was the wrong term to use in the first post in this thread.
So this whole regrettable discussion turns around your personal definition of "SMP efficiency" that you have decided to impose upon all of us.

Any reasonable person understood perfectly well how the term was used in the first post.

"Personal definition" -> the definition given in EVERY discussion of parallel performance. How many direct citations would you like for me to post?

SMP efficiency has a direct meaning. If you don't like the definition, then how about using some other term that you can define to mean whatever you want.

I'm not "imposing" anything at all. I just want standard definitions of common terms or conversation is completely impossible.

bob · Post by **bob** » Wed Jun 26, 2013 6:53 am

Don wrote:
syzygy wrote:
bob wrote:So using "SMP efficiency" was the wrong term to use in the first post in this thread.
So this whole regrettable discussion turns around your personal definition of "SMP efficiency" that you have decided to impose upon all of us.

Any reasonable person understood perfectly well how the term was used in the first post.
This is typical of Bob's style when he is backed into a corner, he finds some technicality and digs in. It would be far easier for him to just to just yield a little ground and he could do this and still maintain his dignity but he will keep going until he cannot back down.

Don, did you not go to graduate school? And study parallel computing among other things?

Please provide ANY citation you want where "SMP efficiency" is defined any way other than what I quoted. Ever heard of Amdahl's law? Know what it is based on? That SAME definition.

Jeez, why so sanctimonious here? The term has a specific meaning. It was used in the first post in this thread. I pointed out it was wrong. It is STILL wrong. I'm not in any sort of corner at all, just you guys that want to completely misuse a standard CIS term...

bob · Post by **bob** » Wed Jun 26, 2013 6:54 am

Laskos wrote:
syzygy wrote:
bob wrote:So using "SMP efficiency" was the wrong term to use in the first post in this thread.
So this whole regrettable discussion turns around your personal definition of "SMP efficiency" that you have decided to impose upon all of us.

Any reasonable person understood perfectly well how the term was used in the first post.
Maybe one should say that time-to-depth is applicable to "SMP efficiency" engines, as defined by Bob, and doesn't apply to non-"SMP efficiency" engines like Komodo.

Or may we should just use the term as it is defined in CS literature? There's no such thing as "non-SMP efficiency".

Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP