Peculiarity of Komodo 5.1MP

bob · Post by **bob** » Thu Jun 20, 2013 7:40 pm

Joerg Oster wrote:What settings for 'Min Split Depth' did you use for Houdini and Stockfish? Same for both?

Depending on the setting you compared a 1-core to an almost 1-core engine. More or less.

Interesting enough, Komodo5.1MP doesn't have such a parameter ...

Note that this is a TINY tuning parameter. It might take at least tens of thousands of games to measure the change after altering min split depth, unless you go way too low. Within +/- 2 of the default for any program, there is little gain/loss.

Don · Post by **Don** » Thu Jun 20, 2013 8:01 pm

bob wrote:
Joerg Oster wrote:What settings for 'Min Split Depth' did you use for Houdini and Stockfish? Same for both?

Depending on the setting you compared a 1-core to an almost 1-core engine. More or less.

Interesting enough, Komodo5.1MP doesn't have such a parameter ...
Note that this is a TINY tuning parameter. It might take at least tens of thousands of games to measure the change after altering min split depth, unless you go way too low. Within +/- 2 of the default for any program, there is little gain/loss.

Bob is correct on this. But for Komodo there is no explicit split depth although there are 2 parameters that we have pre-tuned and one of them has similar functionality. But even those two have only a small impact on the search.

Joerg Oster · Post by **Joerg Oster** » Thu Jun 20, 2013 9:04 pm

bob wrote:
Joerg Oster wrote:What settings for 'Min Split Depth' did you use for Houdini and Stockfish? Same for both?

Depending on the setting you compared a 1-core to an almost 1-core engine. More or less.

Interesting enough, Komodo5.1MP doesn't have such a parameter ...
Note that this is a TINY tuning parameter. It might take at least tens of thousands of games to measure the change after altering min split depth, unless you go way too low. Within +/- 2 of the default for any program, there is little gain/loss.

Yes, in general you are right.
But if you run fixed depth matches they may have big influence... Let's say I set it to 10 and run some games at fixed depth of 11. I would not expect a big difference between 1-core and 4-core version. Right?

syzygy · Post by **syzygy** » Thu Jun 20, 2013 10:13 pm

bob wrote:
syzygy wrote:
bob wrote:Time-to-depth IS the correct measure.
Give one reason why decreased time-to-depth is more important that elo gain from using more cores.
Absolutely trivial to do. When you change the selectivity of a program, but continue to search to the SAME depth, you take more time. That program, at that depth, searches a larger tree, and will play better. A full-width 12 ply search will whip the snot out of a current 12 ply search with LMR, pruning and such enabled. It will also take FAR longer to complete that 12 ply search.

This really is simple to understand I would think. If you make the tree wider, you gain less depth with 4 cores. If the wider search is more important than the extra depth 4x speed should give, then why not gain that extra width in the one-core search.

TIme to get serious here and not start an argument that is totally pointless and founded on flawed reasoning.

Let me get this straight. You say that time-to-depth is more important than engine strength?

I think the problem is that you have not bothered to read what this thread is about.

What Kai showed is ONLY that Komodo's SMP behaviour is different from SMP behaviour of other engines. This does not mean that Komodo's SMP implementation is any good or any bad. It does mean that it is different.

Other tests have shown that Komodo's SMP implementation even with 12 cores is not bad in terms of elo gained per core. Maybe it is not the best available, maybe it is the best available, but certainly it is competitive.

The latter is what really counts. That Komodo's implementation is different is not good or bad, but it is interesting.

dbxuau · Post by **dbxuau** » Fri Jun 21, 2013 1:53 am

syzygy wrote:
bob wrote:
syzygy wrote:
bob wrote:Time-to-depth IS the correct measure.
Give one reason why decreased time-to-depth is more important that elo gain from using more cores.
Absolutely trivial to do. When you change the selectivity of a program, but continue to search to the SAME depth, you take more time. That program, at that depth, searches a larger tree, and will play better. A full-width 12 ply search will whip the snot out of a current 12 ply search with LMR, pruning and such enabled. It will also take FAR longer to complete that 12 ply search.

This really is simple to understand I would think. If you make the tree wider, you gain less depth with 4 cores. If the wider search is more important than the extra depth 4x speed should give, then why not gain that extra width in the one-core search.

TIme to get serious here and not start an argument that is totally pointless and founded on flawed reasoning.
Let me get this straight. You say that time-to-depth is more important than engine strength?

I think the problem is that you have not bothered to read what this thread is about.

What Kai showed is ONLY that Komodo's SMP behaviour is different from SMP behaviour of other engines. This does not mean that Komodo's SMP implementation is any good or any bad. It does mean that it is different.

Other tests have shown that Komodo's SMP implementation even with 12 cores is not bad in terms of elo gained per core. Maybe it is not the best available, maybe it is the best available, but certainly it is competitive.

The latter is what really counts. That Komodo's implementation is different is not good or bad, but it is interesting.

I have to agree with this. It appears to be conclusive from my test results as well.

Laskos · Post by **Laskos** » Fri Jun 21, 2013 11:25 am

Testing to fixed depth has an advantage that I can use as many threads as I wish. I tested to fixed depth 15 Komodo on 1, 4, 16, 64 threads and the node count shows that with each quadrupling of cores node count at fixed depth increases by a factor of 2-4. In other words, most of resources of quadrupling threads for Komodo is going to widening, and less for depth. In most other engines node count at fixed depth for quadrupling increases moderately, by 1.2 or so, probably accounting for overhead, and most of resources go to deepening.

Code: Select all

  1 thread
  n/s: 1.329.027  
  TotTime: 71:30m    SolTime: 71:30m
  Ply: 0   Positions: 30   Avg Nodes:       0   Branching = 0.00
  Ply: 1   Positions: 30   Avg Nodes:     208   Branching = 0.00
  Ply: 2   Positions: 30   Avg Nodes:     475   Branching = 2.28
  Ply: 3   Positions: 30   Avg Nodes:    1162   Branching = 2.45
  Ply: 4   Positions: 30   Avg Nodes:    2369   Branching = 2.04
  Ply: 5   Positions: 30   Avg Nodes:    4022   Branching = 1.70
  Ply: 6   Positions: 30   Avg Nodes:    7420   Branching = 1.84
  Ply: 7   Positions: 30   Avg Nodes:   15120   Branching = 2.04
  Ply: 8   Positions: 30   Avg Nodes:   31341   Branching = 2.07
  Ply: 9   Positions: 30   Avg Nodes:   65919   Branching = 2.10
  Ply:10   Positions: 30   Avg Nodes:  138817   Branching = 2.11
  Ply:11   Positions: 30   Avg Nodes:  249815   Branching = 1.80
  Ply:12   Positions: 30   Avg Nodes:  486700   Branching = 1.95
  Ply:13   Positions: 30   Avg Nodes: 1025584   Branching = 2.11
  Ply:14   Positions: 30   Avg Nodes: 1761740   Branching = 1.72
  Ply:15   Positions: 30   Avg Nodes: 3037256   Branching = 1.72


  4 threads
  n/s: 4.426.797  
  TotTime: 51:18m    SolTime: 51:18m
  Ply: 0   Positions: 30   Avg Nodes:       0   Branching = 0.00
  Ply: 1   Positions: 30   Avg Nodes:     658   Branching = 0.00
  Ply: 2   Positions: 30   Avg Nodes:    1454   Branching = 2.21
  Ply: 3   Positions: 30   Avg Nodes:    3023   Branching = 2.08
  Ply: 4   Positions: 30   Avg Nodes:    5522   Branching = 1.83
  Ply: 5   Positions: 30   Avg Nodes:   10040   Branching = 1.82
  Ply: 6   Positions: 30   Avg Nodes:   19588   Branching = 1.95
  Ply: 7   Positions: 30   Avg Nodes:   37536   Branching = 1.92
  Ply: 8   Positions: 30   Avg Nodes:   70835   Branching = 1.89
  Ply: 9   Positions: 30   Avg Nodes:  150130   Branching = 2.12
  Ply:10   Positions: 30   Avg Nodes:  308354   Branching = 2.05
  Ply:11   Positions: 30   Avg Nodes:  569434   Branching = 1.85
  Ply:12   Positions: 30   Avg Nodes: 1048122   Branching = 1.84
  Ply:13   Positions: 30   Avg Nodes: 2170028   Branching = 2.07
  Ply:14   Positions: 30   Avg Nodes: 3712354   Branching = 1.71
  Ply:15   Positions: 30   Avg Nodes: 6877139   Branching = 1.85


  16 threads
  n/s: 6.327.252  
  TotTime: 2:22m    SolTime: 2:22m
  Ply: 0   Positions: 30   Avg Nodes:       0   Branching = 0.00
  Ply: 1   Positions: 30   Avg Nodes:    2477   Branching = 0.00
  Ply: 2   Positions: 30   Avg Nodes:    5747   Branching = 2.32
  Ply: 3   Positions: 30   Avg Nodes:   10780   Branching = 1.88
  Ply: 4   Positions: 30   Avg Nodes:   18771   Branching = 1.74
  Ply: 5   Positions: 30   Avg Nodes:   32023   Branching = 1.71
  Ply: 6   Positions: 30   Avg Nodes:   51858   Branching = 1.62
  Ply: 7   Positions: 30   Avg Nodes:   82638   Branching = 1.59
  Ply: 8   Positions: 30   Avg Nodes:  141090   Branching = 1.71
  Ply: 9   Positions: 30   Avg Nodes:  299991   Branching = 2.13
  Ply:10   Positions: 30   Avg Nodes:  644381   Branching = 2.15
  Ply:11   Positions: 30   Avg Nodes: 1222440   Branching = 1.90
  Ply:12   Positions: 30   Avg Nodes: 2516255   Branching = 2.06
  Ply:13   Positions: 30   Avg Nodes: 4823625   Branching = 1.92
  Ply:14   Positions: 30   Avg Nodes: 9469390   Branching = 1.96
  Ply:15   Positions: 30   Avg Nodes:17233024   Branching = 1.82


  64 threads
  n/s: 8.158.950  
  TotTime: 6:21m    SolTime: 6:21m
  Ply: 0   Positions: 30   Avg Nodes:       0   Branching = 0.00
  Ply: 1   Positions: 30   Avg Nodes:   19872   Branching = 0.00
  Ply: 2   Positions: 30   Avg Nodes:   43371   Branching = 2.18
  Ply: 3   Positions: 30   Avg Nodes:   71755   Branching = 1.65
  Ply: 4   Positions: 30   Avg Nodes:  121875   Branching = 1.70
  Ply: 5   Positions: 30   Avg Nodes:  499406   Branching = 4.10
  Ply: 6   Positions: 30   Avg Nodes:  691717   Branching = 1.39
  Ply: 7   Positions: 30   Avg Nodes: 1130866   Branching = 1.63
  Ply: 8   Positions: 30   Avg Nodes: 1616625   Branching = 1.43
  Ply: 9   Positions: 30   Avg Nodes: 2690179   Branching = 1.66
  Ply:10   Positions: 30   Avg Nodes: 4141765   Branching = 1.54
  Ply:11   Positions: 30   Avg Nodes: 7359012   Branching = 1.78
  Ply:12   Positions: 30   Avg Nodes:12926636   Branching = 1.76
  Ply:13   Positions: 30   Avg Nodes:22077431   Branching = 1.71
  Ply:14   Positions: 30   Avg Nodes:38625510   Branching = 1.75
  Ply:15   Positions: 30   Avg Nodes:67232624   Branching = 1.74

It's hard to say how it will behave at higher plies, 64-threaded seems to gain plies a bit faster than 1 threaded in latter plies, but up to ply 15 at _fixed_time_, 64-threaded on 64 core machine will be only 1-1.5 ply deeper than 1-threaded Komodo, but 20 times wider. A typical engine like Stockfish, Houdini, Critter, etc. would gain 4-6 plies, without widening.

Yes, seems a different SMP implementation.

Suj · Post by **Suj** » Fri Jun 21, 2013 3:36 pm

Split depth limitation and one other parameter were the biggest scaling issues for stockfish if you try and run them on more cores. More is 16 and above.

I remember doing a beta test around SF 1.6 time and Marco sent me over few beta's where I could configure more and this vastly improved the scaling.

bob · Post by **bob** » Fri Jun 21, 2013 4:03 pm

syzygy wrote:
bob wrote:
syzygy wrote:
bob wrote:Time-to-depth IS the correct measure.
Give one reason why decreased time-to-depth is more important that elo gain from using more cores.
Absolutely trivial to do. When you change the selectivity of a program, but continue to search to the SAME depth, you take more time. That program, at that depth, searches a larger tree, and will play better. A full-width 12 ply search will whip the snot out of a current 12 ply search with LMR, pruning and such enabled. It will also take FAR longer to complete that 12 ply search.

This really is simple to understand I would think. If you make the tree wider, you gain less depth with 4 cores. If the wider search is more important than the extra depth 4x speed should give, then why not gain that extra width in the one-core search.

TIme to get serious here and not start an argument that is totally pointless and founded on flawed reasoning.
Let me get this straight. You say that time-to-depth is more important than engine strength?

I said "Time to depth is the MOST IMPORTANT thing influencing engine strength." Which is NOT what you are suggesting. I simply claim that with a parallel search, at least for reasonable numbers of processors, effort spent on going deeper returns more in terms of Elo gain, than making the tree shape different (wider, more extensions, whatever).

I think the problem is that you have not bothered to read what this thread is about.

What Kai showed is ONLY that Komodo's SMP behaviour is different from SMP behaviour of other engines. This does not mean that Komodo's SMP implementation is any good or any bad. It does mean that it is different.

And NOWHERE will you find where I have written anything else. I simply stated, clearly and precisely, "This test shows absolutely NOTHING regarding Elo. NOTHING." Nothing more, nothing less. The suggestion was that Komodo's search gains from getting wider. It doesn't gain, it loses. That's a simple-to-understand concept.

Other tests have shown that Komodo's SMP implementation even with 12 cores is not bad in terms of elo gained per core. Maybe it is not the best available, maybe it is the best available, but certainly it is competitive.

The latter is what really counts. That Komodo's implementation is different is not good or bad, but it is interesting.

However, as I said, Elo strength comes from depth, not width, assuming your single-thread search has been tested enough that you are convinced your width is optimal for the current approach.

bob · Post by **bob** » Fri Jun 21, 2013 4:04 pm

dbxuau wrote:
syzygy wrote:
bob wrote:
syzygy wrote:
bob wrote:Time-to-depth IS the correct measure.
Give one reason why decreased time-to-depth is more important that elo gain from using more cores.
Absolutely trivial to do. When you change the selectivity of a program, but continue to search to the SAME depth, you take more time. That program, at that depth, searches a larger tree, and will play better. A full-width 12 ply search will whip the snot out of a current 12 ply search with LMR, pruning and such enabled. It will also take FAR longer to complete that 12 ply search.

This really is simple to understand I would think. If you make the tree wider, you gain less depth with 4 cores. If the wider search is more important than the extra depth 4x speed should give, then why not gain that extra width in the one-core search.

TIme to get serious here and not start an argument that is totally pointless and founded on flawed reasoning.
Let me get this straight. You say that time-to-depth is more important than engine strength?

I think the problem is that you have not bothered to read what this thread is about.

What Kai showed is ONLY that Komodo's SMP behaviour is different from SMP behaviour of other engines. This does not mean that Komodo's SMP implementation is any good or any bad. It does mean that it is different.

Other tests have shown that Komodo's SMP implementation even with 12 cores is not bad in terms of elo gained per core. Maybe it is not the best available, maybe it is the best available, but certainly it is competitive.

The latter is what really counts. That Komodo's implementation is different is not good or bad, but it is interesting.
I have to agree with this. It appears to be conclusive from my test results as well.

WHAT is "conclusive"? Certainly not an Elo gain without depth increase. As I said, turn LMR off, turn null-move off, double all search extensions. At the SAME depth, this new program will be far stronger. AND far slower.

bob · Post by **bob** » Fri Jun 21, 2013 4:08 pm

Don wrote:
bob wrote:
Joerg Oster wrote:What settings for 'Min Split Depth' did you use for Houdini and Stockfish? Same for both?

Depending on the setting you compared a 1-core to an almost 1-core engine. More or less.

Interesting enough, Komodo5.1MP doesn't have such a parameter ...
Note that this is a TINY tuning parameter. It might take at least tens of thousands of games to measure the change after altering min split depth, unless you go way too low. Within +/- 2 of the default for any program, there is little gain/loss.
Bob is correct on this. But for Komodo there is no explicit split depth although there are 2 parameters that we have pre-tuned and one of them has similar functionality. But even those two have only a small impact on the search.

I don't even use a min split depth parameter any longer, haven't for several years. There are better ways that don't require tuning for each different processor type one runs on..

Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP

Re: Peculiarity of Komodo 5.1MP