Peculiarity of Komodo 5.1MP

Don · Post by **Don** » Mon Jun 24, 2013 9:29 pm

Laskos wrote:
Joerg Oster wrote:
bob wrote: If you want to post elo, just play 1 cpu against a gauntlet and then repeat with 4 cpus. Then you get actual Elo.
OK. Here it is.

Both gauntlets with TC = 15+0.2 sec, 128 MB Hash each, 999 games. Elo computed with Bayeselo.

Gauntlet 1 (Komodo 5.1 MP with 1 Thread)
Code: Select all
Rank Name          Elo    +    - games score oppo. draws 
   1 Critter1.6a    13   23   23   333   51%     6   45% 
   2 Komodo-1T       6   13   13   999   51%    -2   42% 
   3 Houdini1.5a     6   23   23   333   50%     6   40% 
   4 Stockfish     -25   23   23   333   45%     6   41%
Gauntlet 2 (Komodo 5.1 MP with 4 Threads)
Code: Select all
Rank Name          Elo    +    - games score oppo. draws 
   1 Komodo-4T     107   14   14   999   72%   -36   36% 
   2 Critter1.6a   -29   24   24   333   29%   107   37% 
   3 Houdini1.5a   -30   24   25   333   30%   107   35% 
   4 Stockfish     -49   24   25   333   27%   107   37%
Both gauntlets combined
Code: Select all
Rank Name          Elo    +    - games score oppo. draws 
   1 Komodo-4T     112   15   15   999   72%   -30   36% 
   2 Critter1.6a   -19   18   18   666   40%    45   41% 
   3 Komodo-1T     -22   14   14   999   51%   -30   42% 
   4 Houdini1.5a   -23   18   18   666   40%    45   37% 
   5 Stockfish     -48   18   18   666   36%    45   39%
So approximately +130 Elo.
I hope you, and others as well, find this useful.
And this +130 Elos going from 1 to 4 cores compares rather well with 90-110 Elos shown by "typical" engines like Houdini, Critter, Stockfish, Crafty, etc. on CCRL and CEGT 40/4. I don't have right now my 4 core i7 available, but I bet time-to depth of Komodo from 1 to 4 cores is around factor of 1.5 to 2, compared to 3-3.2 of typical engines, and Bob would conclude that SMP efficiency of Komodo is bad. The hard data shows that SMP efficiency of Komodo is very good while having much lower time-to-depth factor.

At longer time controls however we should keep in mind that Komodo will not achieve 130 ELO either so this is not really a fair comparison. But that does not invalidate your fine argument.

Larry I believe that Komodo MP scales no better than most of the other programs and we are just happen that it's in the same general ballpark. We do expect to be able to improve on it though and that is one of the things we will be working on for the next version.

Don

Houdini · Post by **Houdini** » Mon Jun 24, 2013 9:36 pm

Laskos wrote:And this +130 Elos going from 1 to 4 cores compares rather well with 90-110 Elos shown by "typical" engines like Houdini, Critter, Stockfish, Crafty, etc. on CCRL and CEGT 40/4.

The gain highly depends on the TC, 15"+0.2 is an order of magnitude faster than CEGT 40/4.

From preliminary results Komodo 5.1 with 4 cores is rated around 3115, whereas the 1 core engine will be situated between Komodo CCT and Komodo 5.0 at around 3025. This means about 90 Elo gain from 1 to 4 cores.

As comparison, on the CEGT 40/4 list Houdini gains about 105 Elo from 1 to 4 cores (3188 compared to 3082).

Don · Post by **Don** » Mon Jun 24, 2013 10:02 pm

I really need to proofread what I write before submitting. I'm a very fast typist and usually do not proofread - just type and move on.

Here is the corrected version:

Don wrote:
At longer time controls however we should keep in mind that Komodo will not achieve 130 ELO either so this is not really a fair comparison. But that does not invalidate your fine argument.

Larry and I believe that Komodo MP scales no better than most of the other programs and we are just happy that it's in the same general ballpark. We do expect to be able to improve on it though and that is one of the things we will be working on for the next version.

Don

bob · Post by **bob** » Mon Jun 24, 2013 10:22 pm

Joerg Oster wrote:
bob wrote: If you want to post elo, just play 1 cpu against a gauntlet and then repeat with 4 cpus. Then you get actual Elo.
OK. Here it is.

Both gauntlets with TC = 15+0.2 sec, 128 MB Hash each, 999 games. Elo computed with Bayeselo.

Gauntlet 1 (Komodo 5.1 MP with 1 Thread)
Code: Select all
Rank Name          Elo    +    - games score oppo. draws 
   1 Critter1.6a    13   23   23   333   51%     6   45% 
   2 Komodo-1T       6   13   13   999   51%    -2   42% 
   3 Houdini1.5a     6   23   23   333   50%     6   40% 
   4 Stockfish     -25   23   23   333   45%     6   41%
Gauntlet 2 (Komodo 5.1 MP with 4 Threads)
Code: Select all
Rank Name          Elo    +    - games score oppo. draws 
   1 Komodo-4T     107   14   14   999   72%   -36   36% 
   2 Critter1.6a   -29   24   24   333   29%   107   37% 
   3 Houdini1.5a   -30   24   25   333   30%   107   35% 
   4 Stockfish     -49   24   25   333   27%   107   37%
Both gauntlets combined
Code: Select all
Rank Name          Elo    +    - games score oppo. draws 
   1 Komodo-4T     112   15   15   999   72%   -30   36% 
   2 Critter1.6a   -19   18   18   666   40%    45   41% 
   3 Komodo-1T     -22   14   14   999   51%   -30   42% 
   4 Houdini1.5a   -23   18   18   666   40%    45   37% 
   5 Stockfish     -48   18   18   666   36%    45   39%
So approximately +130 Elo.
I hope you, and others as well, find this useful.

That is certainly "real data" for a change... The number of games is a tad small giving +/-15...

Laskos · Post by **Laskos** » Mon Jun 24, 2013 10:22 pm

Houdini wrote:
Laskos wrote:And this +130 Elos going from 1 to 4 cores compares rather well with 90-110 Elos shown by "typical" engines like Houdini, Critter, Stockfish, Crafty, etc. on CCRL and CEGT 40/4.
The gain highly depends on the TC, 15"+0.2 is an order of magnitude faster than CEGT 40/4.

From preliminary results Komodo 5.1 with 4 cores is rated around 3115, whereas the 1 core engine will be situated between Komodo CCT and Komodo 5.0 at around 3025. This means about 90 Elo gain from 1 to 4 cores.

As comparison, on the CEGT 40/4 list Houdini gains about 105 Elo from 1 to 4 cores (3188 compared to 3082).

Yes, the comparison is unfair, the point was just that Komodo scales with the number of threads comparably with the best and "typical" engines regarding SMP. If one goes with Bob's argument, Komodo's SMP performance would be abysmal. I am away from my i7 to show that purely time-to-depth Komodo would gain much less of what it is getting, but factually, it gains points from 1 to 4 cores comparably to the gains of Houdini. And again, I am not saying that Komodo's SMP implementation is the best. It's different.

bob · Post by **bob** » Mon Jun 24, 2013 10:31 pm

syzygy wrote:
bob wrote:
syzygy wrote:
bob wrote:1. Kai's data does NOT show anything relative to parallel search Elo gain.
Did I write that it did? Let's see:
syzygy wrote:I think the problem is that you have not bothered to read what this thread is about.

What Kai showed is ONLY that Komodo's SMP behaviour is different from SMP behaviour of other engines. This does not mean that Komodo's SMP implementation is any good or any bad. It does mean that it is different.
So no, I did not write that it did.

2. There has been ZERO evidence to show that such a "wider search" is stronger.
Hasn't there been? Do you realise that the only reason that we assume Komodo's search is wider is that Kai's experiment has shown Komodo's 4-core search to be stronger than its 1-core search at the same depth?
It should be intuitively obvious to anyone familiar with computer chess that a wider search is more accurate, when using fixed depth. It is also slower. There is no way to measure whether it is better or worse at fixed depth, because chess is timed. Ergo, the data at the beginning of this thread shows nothing other than suggesting that the komodo parallel search looks at significantly more nodes for a given depth than a serial search. It doesn't show whether it is better or not, because the time loss from the extra width could cost a ply or two, which is not accounted for.
This is what I wrote. So we agree. That Komodo's SMP implementation is effective in terms of playing strength / elo gain per core follows from other threads.

Nope. Nothing to base that on. Elo is based on time, where both players have equal time to play the game. This test does not even approximate that standard. Elo gain with slower search might end up less. Who knows? The data doesn't shed any light on that at all..

Ergo <zero information>
Nope. It does show that Komodo's SMP implementation is different. Komodo's 4-core search to a particular (reported) depth is clearly of higher quality than Komodo's 1-core search to the same (reported) depth.

Is a search without LMR "higher quality"? Most of us equate "quality" with "Elo". But we have ZERO data about the Elo gain... Because time is a part of the equation and it was left out.

If Komodo had been Rybka, this would have put the reported depth in doubt. Given that Komodo is not Rybka, it seems very reasonable to assume that the reported depths are accurate.

Never said nor implied that the reported depths were wrong. Just that the test was based on the WRONG measurement. When you do fixed depth searches, you can't slow one program down significantly and still compare the Elo results. Of course a program with a 2-1 or 3-1 time odds will have a higher Elo. While the search is no better (and is actually worse in traditional measurements if the slowdown offsets the gain...

Eh? No context switches whatsoever. Inside the program you just search a move on subtree 1, then a move on subtree 2, and repeat. Same program, no cache issues, no nothing.
Switches of search context, not the same program. Of course this decreases the effectiveness of caches and brings all kinds of other overhead that work out to a constant > 1. You know this.

EXACT same program. It is not hard to write such a program at all, I tested like this for years. One copy of the program, using two search states, making a move on one, then a move on the other, etc. Works well. No cache problems. almost immeasurable overhead it is so low.

bob wrote:
syzygy wrote:More importantly: nobody is saying that Komodo's 4-core search is better than a 4x faster 1-core search. Where did you get that from?
Did you bother to read the first post??? I quote:

So, time to depth is an incorrect way of calculating Komodo's MP efficiency.
SMP efficiency is ALWAYS defined as "time (1cpu) / time (Ncpus)"

ALWAYS.
First: how does that line from the first post contradict what I wrote? It does not.

Second: it should be clear that the majority of posters in this thread consider increased playing strength from using more core THE measure of SMP efficiency. How much stronger is a 4-core search compared to a 1-core search at the same time control.

Kai's test does NOT show that Komodo gains more from 4 cores than other engines at the same time control. Kai's test does NOT show that Komodo's 4-core search is better than a 4x faster 1-core search.

From other threads we know that Komodo's elo gain from going from 1 to 4 cores is at least comparable to the gain in other engines.

Kai's test shows that part of this gain for Komodo does not come from increased depth in the same time, but from a higher quality search at the same depth. Most likely (but Kai's test does not show this), another part of the gain comes from increased depth.

So, WHAT does it actually show? Not a thing I can see other than Komodo is doing something different and that searching a bigger tree is better if time is not measured. Not anything new there I can see.
Combine it with the fact observed in other threads that Komodo's SMP implementation is competitive with that of other engines...

bob · Post by **bob** » Mon Jun 24, 2013 10:35 pm

Laskos wrote:
bob wrote:
SMP efficiency is ALWAYS defined as "time (1cpu) / time (Ncpus)"

ALWAYS.

I can provide citations if you want. The book I use in my parallel programming course this summer has it in chapter 2.

You seem to be confused by "time (1cpu) / time (Ncpus)". The definition "time (1cpu) / time (Ncpus)" is correct in the sense that time ratio needed for 1cpu to get THE STRENGTH of Ncpus is defining the effectiveness of SMP implementation. Not time-to-depth, time-to-depth in general is useless.

I used a precise term, the exact one you used, "SMP efficiency". And that is ALWAYS defined as time(1cpu) / time(ncpus) when both do exactly the same computational assignment (same data, etc, or in the case of chess, the same position to the same depth).

Time to depth is the ONLY way one can calculate SMP efficiency. I can't repeat that often enough. It is the ONLY way. Comparing 1 cpu to 4 cpu elo measures something else entirely. Not just the SMP efficiency (speed-up) but also other qualitative issues about the SMP search that do not just affect the speed. And even includes the presence of bugs that don't show up frequently, but do show up every now and then...

Don · Post by **Don** » Mon Jun 24, 2013 10:39 pm

Houdini wrote:
Laskos wrote:And this +130 Elos going from 1 to 4 cores compares rather well with 90-110 Elos shown by "typical" engines like Houdini, Critter, Stockfish, Crafty, etc. on CCRL and CEGT 40/4.
The gain highly depends on the TC, 15"+0.2 is an order of magnitude faster than CEGT 40/4.

From preliminary results Komodo 5.1 with 4 cores is rated around 3115, whereas the 1 core engine will be situated between Komodo CCT and Komodo 5.0 at around 3025. This means about 90 Elo gain from 1 to 4 cores.

As comparison, on the CEGT 40/4 list Houdini gains about 105 Elo from 1 to 4 cores (3188 compared to 3082).

One of my testers claims that MP scaling is better than Stockfish and Critter but puts Houdini and Sjeng up there as having excellent MP scaling. I have no real experience with that so I cannot comment on it.

I previously had this idea that Stockfish was particularly good at MP scaling but part of this may be illusion because Stockfish improves very rapidly with increasing CPU time (more than any other program) up to a certain point so some of that apparent gain even happens in single processor mode when you double the time, at least when testing at relatively fast time controls.

Don · Post by **Don** » Mon Jun 24, 2013 10:44 pm

bob wrote:
Laskos wrote:
bob wrote:
SMP efficiency is ALWAYS defined as "time (1cpu) / time (Ncpus)"

ALWAYS.

I can provide citations if you want. The book I use in my parallel programming course this summer has it in chapter 2.

You seem to be confused by "time (1cpu) / time (Ncpus)". The definition "time (1cpu) / time (Ncpus)" is correct in the sense that time ratio needed for 1cpu to get THE STRENGTH of Ncpus is defining the effectiveness of SMP implementation. Not time-to-depth, time-to-depth in general is useless.
I used a precise term, the exact one you used, "SMP efficiency". And that is ALWAYS defined as time(1cpu) / time(ncpus) when both do exactly the same computational assignment (same data, etc, or in the case of chess, the same position to the same depth).

Ok, based on that definition you are correct. Can we end this conversation now?

Let's all tell Bob he is correct so that we can move on.

Time to depth is the ONLY way one can calculate SMP efficiency. I can't repeat that often enough. It is the ONLY way. Comparing 1 cpu to 4 cpu elo measures something else entirely. Not just the SMP efficiency (speed-up) but also other qualitative issues about the SMP search that do not just affect the speed. And even includes the presence of bugs that don't show up frequently, but do show up every now and then...

Laskos · Post by **Laskos** » Mon Jun 24, 2013 11:19 pm

bob wrote:
Laskos wrote:
bob wrote:
SMP efficiency is ALWAYS defined as "time (1cpu) / time (Ncpus)"

ALWAYS.

I can provide citations if you want. The book I use in my parallel programming course this summer has it in chapter 2.

You seem to be confused by "time (1cpu) / time (Ncpus)". The definition "time (1cpu) / time (Ncpus)" is correct in the sense that time ratio needed for 1cpu to get THE STRENGTH of Ncpus is defining the effectiveness of SMP implementation. Not time-to-depth, time-to-depth in general is useless.
I used a precise term, the exact one you used, "SMP efficiency". And that is ALWAYS defined as time(1cpu) / time(ncpus) when both do exactly the same computational assignment (same data, etc, or in the case of chess, the same position to the same depth).

Time to depth is the ONLY way one can calculate SMP efficiency. I can't repeat that often enough.

It's getting pretty absurd.

Say, from 1 core to 4 cores time-to-depth gain is
Crafty: 3.2 (you yourself got this factor)
Komodo:1.8

Elo points gain
Crafty: 100 points
Komodo: 100 points

1.8 is approximately sqrt(3.2), so the gain of Komodo should have been 100/2=50 points. What's more important, your absurd insistence on time-to-depth or the real Elo increase?

I think we should agree that SMP efficiency in chess is measured in Elos, not depths.

It is the ONLY way. Comparing 1 cpu to 4 cpu elo measures something else entirely. Not just the SMP efficiency (speed-up) but also other qualitative issues about the SMP search that do not just affect the speed. And even includes the presence of bugs that don't show up frequently, but do show up every now and then...

Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP