does houdini3 scale better than komodo?

Uri Blass · Post by **Uri Blass** » Wed Oct 17, 2012 12:43 pm

I would like to read an honest reply from the komodo team.

In other words what is the relative time advantage that komodo needs to score 50% against houdini3 and does komodo needs relatively more time when you use longer time control.

lucasart · Post by **lucasart** » Wed Oct 17, 2012 1:03 pm

Uri Blass wrote:I would like to read an honest reply from the komodo team.

In other words what is the relative time advantage that komodo needs to score 50% against houdini3 and does komodo needs relatively more time when you use longer time control.

I don't use either of these programs, so I haven't done any testing with them. But I do have some experience with testing in general, as an engine developper. So I don't have an answer to your question, but rather food for thoughts:
* I suppose testers will play thousands og games, and will all conclude that Houdini 3 is stronger than Komodo 4. period.
* In general, when you increase the time control, engines play better chess, and therefore the draw rate increases, and the elo difference decreases. Perfect chess games are just draws, you know...
* So for any couple of programs (A, B): if A is stronger than B, then ill-advised people will argue that B scales better than A, because experimentally they notice the elo differences shrinks with time control.

In other words, I don't know if that's even a meaningful question. Rather how much stronger is H3 at time control t1 ? t2 ? t3 ? are three meaningful questions. But "scales better" seems to suggest: who is better asynptotically ? And that question doesn't make sense, because asymptotically, we all play perfectly, and draw all the time, so score equally.

Laskos · Post by **Laskos** » Wed Oct 17, 2012 1:12 pm

lucasart wrote:
Uri Blass wrote:I would like to read an honest reply from the komodo team.

In other words what is the relative time advantage that komodo needs to score 50% against houdini3 and does komodo needs relatively more time when you use longer time control.
I don't use either of these programs, so I haven't done any testing with them. But I do have some experience with testing in general, as an engine developper. So I don't have an answer to your question, but rather food for thoughts:
* I suppose testers will play thousands og games, and will all conclude that Houdini 3 is stronger than Komodo 4. period.
* In general, when you increase the time control, engines play better chess, and therefore the draw rate increases, and the elo difference decreases. Perfect chess games are just draws, you know...
* So for any couple of programs (A, B): if A is stronger than B, then ill-advised people will argue that B scales better than A, because experimentally they notice the elo differences shrinks with time control.

In other words, I don't know if that's even a meaningful question. Rather how much stronger is H3 at time control t1 ? t2 ? t3 ? are three meaningful questions. But "scales better" seems to suggest: who is better asynptotically ? And that question doesn't make sense, because asymptotically, we all play perfectly, and draw all the time, so score equally.

Uri was subtler: he asked for time advantage that Komodo needs for 50% result against Houdini. It is speculated that it remains pretty constant with time control for identically scaling engines (although the rating would get closer for longer TC).

Uri Blass · Post by **Uri Blass** » Wed Oct 17, 2012 1:15 pm

lucasart wrote:
Uri Blass wrote:I would like to read an honest reply from the komodo team.

In other words what is the relative time advantage that komodo needs to score 50% against houdini3 and does komodo needs relatively more time when you use longer time control.
I don't use either of these programs, so I haven't done any testing with them. But I do have some experience with testing in general, as an engine developper. So I don't have an answer to your question, but rather food for thoughts:
* I suppose testers will play thousands og games, and will all conclude that Houdini 3 is stronger than Komodo 4. period.
* In general, when you increase the time control, engines play better chess, and therefore the draw rate increases, and the elo difference decreases. Perfect chess games are just draws, you know...
* So for any couple of programs (A, B): if A is stronger than B, then ill-advised people will argue that B scales better than A, because experimentally they notice the elo differences shrinks with time control.

In other words, I don't know if that's even a meaningful question. Rather how much stronger is H3 at time control t1 ? t2 ? t3 ? are three meaningful questions. But "scales better" seems to suggest: who is better asynptotically ? And that question doesn't make sense, because asymptotically, we all play perfectly, and draw all the time, so score equally.

when I say scale better it is not about rating difference but about time difference that the weaker program needs to get 50%

The results that I see did not show a significant reduction in score with longer time control so I suspect that the rating difference to komodo is going to be nearly the same for all time controls and if it is the case then it suggests that houdini3 scales better.

IPON showed
Houdini 3 STD - Komodo 5 (3012) 95.5 - 54.5 63.67%
90+30 time control gave 60% for houdini against komodo
CEGT blitz showed 59-41 for houdini against komodo

bupalo · Post by **bupalo** » Wed Oct 17, 2012 1:31 pm

I use both, and I don't believe in long time control there is a reduction of the gap between Komodo and Houdini
So I don't think the scale will benefit any of them.
The 2 products have differences in the search
However Houdini 3 is much better than houdini 2 for analysis purpose and I think has reached Komodo for quality of the
analysys (houdini 2 was much worst) and the komodo team will have to work hard to catch houdini again (komodo 5 for me was better
than houdini 2)

lkaufman · Post by **lkaufman** » Wed Oct 17, 2012 4:51 pm

Uri Blass wrote:I would like to read an honest reply from the komodo team.

In other words what is the relative time advantage that komodo needs to score 50% against houdini3 and does komodo needs relatively more time when you use longer time control.

We don't know the answer, because we would have to tie up a 12 or 16 core machine for several days to find out, and we would rather use them to improve Komodo. I do know beyond reasonable doubt that current Komodo scales better than H 1.5, and since H 1.5 seemed to scale better than H 2.0 (based on CEGT and CCRL), Komodo surely scales better than H 2.0. As for H3, I suggest you wait for the CCRL 40/40 and CEGT 40/20 data and compare the elo gap from Komodo 5 with the IPON gap. You can make some allowance for the fact that elo gains always decline a bit with more time given equal gap in terms of time handicap, as your question stated.

Don · Post by **Don** » Wed Oct 17, 2012 5:15 pm

Uri Blass wrote:I would like to read an honest reply from the komodo team.

In other words what is the relative time advantage that komodo needs to score 50% against houdini3 and does komodo needs relatively more time when you use longer time control.

Houdini 3 just came out so we do not have any data on this. Roberts said somewhere that Houdini 3 has scaling improvements, so if I had to guess I would say this is no longer an issue. I think it was an issue with Houdini 2.0 more than Houdni 1.5 and even though he never admitted it I think he realized it was an issue and probably has done something about it this time.

I personally believe the best TOP program for scaling properties is Stockfish. The jury is still out on Houdini 3 with respect to this. It's altogether likely, in my opinion, that Stockfish is stronger than Houdini 2.0 if you could run it at long enough time controls (and same with Komodo vs Houdini) but the cross-over point might be much higher for Stockish than is practical. Perhaps if you could run a 10,000 game match at 24 hour time control you would see Stockfish on top. I am only speculating. Maybe it won't need nearly that much, or maybe it will need more. But if I wanted to use a program for correspondence chess and my only consideration was ELO I would pick Stockfish or Komodo. I have no opinion on Houdini 3 but it's lead is substantial enough that I doubt Stockfish would outperform it at move in 24 hours even if does scale better than Houdini 3 which is not known. So don't ask me to speculate on something I don't know about.

From an idealistic computer science perspective I believe the BEST program is the one that scales best, not the one that is best at some time control you specify. In Big O notation, algorithms are spoken of in terms of their performance characteristics where unimportant details are abstracted away such as how it was compiled and so on. Thus quicksort is superior to insertion sort and it's not even a close call. But if you were only sorting 10 items, quicksort is not nearly as good but for analyzing algorithms you have to have a consistent model. The algorithm with better Big O characteristics is always going to out-perform if the problem is big enough. Quicksort on my old TRS-80 z80 computer will outperform insertion sort on my i7 given enough records to sort (and if the TRS-80 had enough disk space to accommodate this calculation of course.)

So you can add 50 ELO to any program by running it on a faster computer but that does not change how good the program is. How well it scales (in some idealistic sense) is the only important consideration and I think all chess programmers (or most of them) think that way even if they are not conscious of it. We approach Komodo that way. Of course it's important to optimize the program and make it as fast as possible because that is a practical consideration which cannot be ignored. It doesn't matter if your program scales a little better but is 200 ELO weaker that program X at game in 5 minutes - it might take game in 1 year for it to pass program X due to superior scaling and nobody will want that program.

Trying to measure scalability is difficult beyond the 5 minute "barrier" because it simply takes a very long to time to get enough data to measure scalability when it's usually fairly minor anyway. And a great deal of caution is in order too because most program (that do not scale horribly) will tend to converge with depth. What you have to do is not observe them get closer together (they will all probably do that) but seeing them "cross over." This is easy to do, you simply handicap the stronger program by a constant amount. For example you might play several matches at various time controls where the stronger program is handicapped always by 50%, giving it half as much time. If it still stronger at a fast time control but now weaker at a longer time control then it's not scaling as well.

Another danger is that some program may have very slow startup times per move. For example if program A spends 1/10 of a second building tables before searching every move, it will make it artificially look very weak at hyper blitz time controls and it will appear to scale very well. If you make a fast all-purpose runner wait 5 seconds before starting to run for any race he will look slow at the 100 meter race level but it will appear that he is getting better and better with distance when in fact the 5 second handicap is less and less of a factor. At the marathon level runners have stopped to tie their shoes and still win the race, but that will never happen in the 100 meter sprints.

Yet another issue is that some programs have algorithms that kick it at various depths. They can suddenly increase or decrease in ELO per TIME when a certain depth kicks in. Imagine having this test in your program:

if (depth > 14) { do some funky extension }

The program might act noticeably different once it was doing a 15 ply search and suddenly change it's scaling characteristic for better or worse. I think every program has these sort of tests, not doing LMR until depth >= 2 or 3 for example. Most of them do not have strange side-effects but some can.

What really matters for scalability? Good evaluation is probably the most important thing. Overly aggressive extensions can make your program incredibly strong at hyper blitz and much weaker at longer time controls so I think extreme care should be taken with extensions. But the right extensions matter at all levels. You cannot live without some check extensions for example. When it comes to forward pruning it's a difficult question because we have conflicting evidence, so I think it has a lot to do with the gory details - what you are pruning and why and where and so on. Pruning can help or hurt the scaling.

Houdini · Post by **Houdini** » Wed Oct 17, 2012 7:22 pm

Uri Blass wrote:I would like to read an honest reply from the komodo team.

In other words what is the relative time advantage that komodo needs to score 50% against houdini3 and does komodo needs relatively more time when you use longer time control.

It's a bit early to tell, no?

I can show you my own development testing results, based on gauntlets with 9 opponents, and setting Houdini 1.03a arbitrarily at 3000.
At about 10"+0.1" (ultra-short):

Code: Select all

Houdini 1.03a   3000
Houdini 1.5a    3046
Houdini 2.0     3071
Houdini 3       3127

At 2'+1.2" (blitz, 12 times slower than above)

Code: Select all

Houdini 1.03a   3000
Houdini 1.5a    3053
Houdini 2.0     3077
Houdini 3       3142

This gives +56 (ultra-short) and +65 (blitz) for Houdini 3.
My "+50 Elo" was based on these results, it was a prudent estimate.
My "+25 Elo" for Houdini 2 last year was also based on this, but didn't materialize in all the rating lists.

The data hints at a better scaling at longer TC for Houdini 3, it's unclear whether this is real. Last month's 90 min+30 sec/move long TC test was also encouraging, but again there's insufficient data to be conclusive.

Robert

bupalo · Post by **bupalo** » Wed Oct 17, 2012 7:54 pm

As a stupid chessplayer like I am, I have little to say, still I believe all the top 5 engines are beautiful and now komodo and houdini are the top. Stockfish can beat every engine but the attacking lines sometimes are much too optimistic
I think if they are used both for corrispondence chess they are equals.
Houdini is a wizard of chaotic tactical positions, Komodo know a lot of chess, in its analysys I always see when it expoits the lightest weakeness of the opponent, he knows where the pieces, he is a teacher of chess. Houdini 2 was horrible in this , houdini 3 is definitily better in this. The problem of Komodo is know MP and code a little slow. If you improve on this Don elo points come like the rain and still the ELO like the IPON one is a thing that doen't make justice to the engines

lech · Post by **lech** » Wed Oct 17, 2012 8:49 pm

Please let me say why Houdini is stronger than other engines.
Robert is a good chessplayer and knows a strategy of chess. It lets him make a very good static evaluation (especially king danger).
It lets him to use an agressive search too.
E.g. Stockfish's static evaluation is weak and additionally tuned. This tunning makes impossible to change anything. Stockfish gets a high depths, but in many tactic positions returns a wrong way.

does houdini3 scale better than komodo?

does houdini3 scale better than komodo?

Re: does houdini3 scale better than komodo?

Re: does houdini3 scale better than komodo?

Re: does houdini3 scale better than komodo?

Re: does houdini3 scale better than komodo?

Re: does houdini3 scale better than komodo?

Re: does houdini3 scale better than komodo?

Re: does houdini3 scale better than komodo?

Re: does houdini3 scale better than komodo?

Re: does houdini3 scale better than komodo?