Komodo 4.1

FriedmannC · Post by **FriedmannC** » Wed Jul 18, 2012 2:51 pm

Don, do you think Komodo 5 will be able to compete head to head with Houdini 2. Z that with some special settings has somehow proved to be by about 20 ELO stronger than Houdini 2.c? ? And do you have some news about Deep Rybka 5? Is it around the corner or is just an illusion, thus the biggest disappointment of the year... Best regards, Catalin

lkaufman · Post by **lkaufman** » Wed Jul 18, 2012 11:32 pm

Can you post a link to the description of the options for this "z" version, the time limits tested, and for the evidence that it is 20 elo stronger? Frankly, I find it extremely hard to believe that Robert has missed the proper settings on his own program by anywhere near 20 elo. I can tell you that Komodo 5 sp will be stronger than Houdini 1.5 sp at most or all reasonable time limits, which if true would imply that it will be stronger than Houdini 2 at the intermediate and longer time controls at least, since Houdini 2 is rated below Houdini 1.5 on those lists.

Houdini · Post by **Houdini** » Thu Jul 19, 2012 12:06 am

lkaufman wrote:Can you post a link to the description of the options for this "z" version, the time limits tested, and for the evidence that it is 20 elo stronger? Frankly, I find it extremely hard to believe that Robert has missed the proper settings on his own program by anywhere near 20 elo. I can tell you that Komodo 5 sp will be stronger than Houdini 1.5 sp at most or all reasonable time limits, which if true would imply that it will be stronger than Houdini 2 at the intermediate and longer time controls at least, since Houdini 2 is rated below Houdini 1.5 on those lists.

I always find it strange that you claim Komodo results against Houdini 1.5 in a very unclear way ("stronger than at most TC"

) and then make an extrapolation to claim results against Houdini 2.0, as if there is no way for you to test directly against Houdini 2.0. You're just deceiving yourself and the readers by claiming (or desperately hoping?) that Houdini 1.5 is the strongest Houdini.

My own tests of the "z" settings against standard settings give a score of 50.4 % in a 9,000 game match at 2'+2" TC. Resulting Elo difference is +3 +/- 4 Elo. So maybe a tiny bit stronger, but nothing I can measure with certainty.

For your info, at the same 2'+2" TC the current Houdini 3 DEV version scores 57.2% against Houdini 2.0 (+50 +/- 4 Elo), and 60.2% against Houdini 1.5 (+72 +/- 4 Elo). Again using 9,000 game test matches.

Looking forward to Komodo 5.

Cheers,
Robert

lkaufman · Post by **lkaufman** » Thu Jul 19, 2012 12:56 am

Houdini wrote:
lkaufman wrote:Can you post a link to the description of the options for this "z" version, the time limits tested, and for the evidence that it is 20 elo stronger? Frankly, I find it extremely hard to believe that Robert has missed the proper settings on his own program by anywhere near 20 elo. I can tell you that Komodo 5 sp will be stronger than Houdini 1.5 sp at most or all reasonable time limits, which if true would imply that it will be stronger than Houdini 2 at the intermediate and longer time controls at least, since Houdini 2 is rated below Houdini 1.5 on those lists.
I always find it strange that you claim Komodo results against Houdini 1.5 in a very unclear way ("stronger than at most TC" ) and then make an extrapolation to claim results against Houdini 2.0, as if there is no way for you to test directly against Houdini 2.0. You're just deceiving yourself and the readers by claiming (or desperately hoping?) that Houdini 1.5 is the strongest Houdini.

My own tests of the "z" settings against standard settings give a score of 50.4 % in a 9,000 game match at 2'+2" TC. Resulting Elo difference is +3 +/- 4 Elo. So maybe a tiny bit stronger, but nothing I can measure with certainty.

For your info, at the same 2'+2" TC the current Houdini 3 DEV version scores 57.2% against Houdini 2.0 (+50 +/- 4 Elo), and 60.2% against Houdini 1.5 (+72 +/- 4 Elo). Again using 9,000 game test matches.

Looking forward to Komodo 5.

Cheers,
Robert

Thanks for confirming that these "z" settings (what are they, btw?) are not provably stronger; you are obviously too good a programmer/devoper to make a 20 elo error in your recommended settings.
We have often found that "improvements" based on direct play between versions don't show up as such against Houdini, Critter, and Stockfish. I recall that I measured a 60+ elo gain for Rybka 4 over Rybka 3 in direct play, but it ended up more like +20 elo on the lists. I'm not sure why, but others report the same problem. Any claims we may make for elo gains from now on will be based on testing against other top engines, not self-testing.
We test at many different time controls, including all three types, and so we may not always have huge samples at every one, but overall we do. So I can't be sure we are superior at every possible time control, and we don't even test at really long ones. But probably Komodo 5 will outperform Houdini 1.5 at any control equal to game in one minute or longer, with enough games.
We mostly test against H1.5 rather than 2 just because some testers don't even have H2, and since there is no evidence for H2 being stronger except at blitz, people are reluctant to spend money on it. Frankly I can't guess why you didn't go commercial with 1.5 instead of waiting for 2.0.
If we do outrate H1.5 on the lists, it follows that we will also outrate 2.0 on the intermediate level lists, as it is lower. Perhaps 2.0 is not really weaker than 1.5, I don't claim this, but it is clearly not measurably stronger. Perhaps you improved H based on self-testing but the improvements fail to show up against other programs at longer levels.

Uri Blass · Post by **Uri Blass** » Thu Jul 19, 2012 1:05 am

I read that the strongest version is not 2z but 2s2 that is claimed to be stronger than 2s.

http://rybkaforum.net/cgi-bin/rybkaforu ... ?tid=25226

After 1000 games at 5+3 they get
+318,=473,-210), 55.4 % in the rybka forum for 2s and
Barnard claims there

"i think and i know that the s2 version will perform better than the s version"

Houdini · Post by **Houdini** » Thu Jul 19, 2012 1:14 am

Larry, the Houdini improvements are benchmarked against 9 different engines. Within the confidence interval of the different tests the Elo differences I quoted above are also verified against these 9 engines.

Robert

Houdini · Post by **Houdini** » Thu Jul 19, 2012 1:17 am

Uri Blass wrote:I read that the strongest version is not 2z but 2s2 that is claimed to be stronger than 2s.

http://rybkaforum.net/cgi-bin/rybkaforu ... ?tid=25226

After 1000 games at 5+3 they get
+318,=473,-210), 55.4 % in the rybka forum for 2s and
Barnard claims there

"i think and i know that the s2 version will perform better than the s version"

Uri, do you really accept that changing a handful of fairly minor evaluation parameters can result in a 38 +/- 15 Elo improvement of a top engine?
I really, really wish that these results were valid, but unfortunately I cannot reproduce them at all.

Robert

Red Hood · Post by **Red Hood** » Thu Jul 19, 2012 1:34 am

People use multi cut! It works, but without the nullmove! Robert probably uses multi cut in Houdini. Based from his interviews. He used robbolito code base, which is normal you have to start from some where and open source is the way to go, stockfish uses something similar(Tord Romstad used multi cut in Glaurung but back then it didn't work that good, so that's how stockfish got multi cut!), that's where he saw how it was implemented, also programming wiki has all the information about multi cut!

Uri Blass · Post by **Uri Blass** » Thu Jul 19, 2012 1:57 am

Houdini wrote:
Uri Blass wrote:I read that the strongest version is not 2z but 2s2 that is claimed to be stronger than 2s.

http://rybkaforum.net/cgi-bin/rybkaforu ... ?tid=25226

After 1000 games at 5+3 they get
+318,=473,-210), 55.4 % in the rybka forum for 2s and
Barnard claims there

"i think and i know that the s2 version will perform better than the s version"
Uri, do you really accept that changing a handful of fairly minor evaluation parameters can result in a 38 +/- 15 Elo improvement of a top engine?
I really, really wish that these results were valid, but unfortunately I cannot reproduce them at all.

Robert

I can imagine that it is possible and changing the time control can be responsible for 20 elo difference and there may be 20 elo difference between 5+3 and some faster time control that you test(38(+-) 15 may be only 23 elo so it is possible that you may get only small improvement in your test at different time control with different hardware).

I see difference in results at different time control(for example houdini2c seems to be better than houdini1.5a at blitz but not at longer time control based on rating lists)

Note that I do not know exactly what are the changes in 2s relative to 2c.

lkaufman · Post by **lkaufman** » Thu Jul 19, 2012 2:18 am

Where are the parameters for all these versions (2z, 2s, 2s2) posted? I could probably estimate whether the changes were substantial enough to account for a measurable elo gain. From the discussion it sounds like there were time control changes, which might have favored 5 + 3 over 2 + 2.

Komodo 4.1

Re: Komodo 4.1

Re: Komodo 4.1

Re: Komodo 4.1

Re: Komodo 4.1

Re: Komodo 4.1

Re: Komodo 4.1

Re: Komodo 4.1

Re: Komodo 4.1

Re: Komodo 4.1

Re: Komodo 4.1