Komodo 10.4 vs Houdini 5

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

beram
Posts: 1187
Joined: Wed Jan 06, 2010 3:11 pm

Re: Komodo 10.4 vs Houdini 5

Post by beram »

Scacchista1977 wrote:I'm conducting a similar test with i7 5960x, at moment 80 games +14 elo for Houdini. In total will be 500 games.
What time control and cores you use Anton?

Meanwhile i have +15 -8 =54 53,33% for H5 after 75 games with TC 15m 10s
so worse performance for Komodo 10.4 with 5x longer time control

Code: Select all

K10.4 - H5, Rapid 15m+10s                              
1   Houdini 5 x64-pext   +23  +13/=54/-8 53.33%   40.0/75
2   Komodo 10.4 64-bit   -23  +8/=54/-13 46.67%   35.0/75
Scacchista1977
Posts: 49
Joined: Thu Nov 10, 2016 11:40 am
Location: Italy
Full name: Aleandro Rossi

Re: Komodo 10.4 vs Houdini 5

Post by Scacchista1977 »

beram wrote:
Scacchista1977 wrote:I'm conducting a similar test with i7 5960x, at moment 80 games +14 elo for Houdini. In total will be 500 games.
What time control and cores you use Anton?

Meanwhile i have +15 -8 =54 53,33% for H5 after 75 games with TC 15m 10s
so worse performance for Komodo 10.4 with 5x longer time control

Code: Select all

K10.4 - H5, Rapid 15m+10s                              
1   Houdini 5 x64-pext   +23  +13/=54/-8 53.33%   40.0/75
2   Komodo 10.4 64-bit   -23  +8/=54/-13 46.67%   35.0/75
time 5'+0" - 14CPUs for engine - 512 MB Hash - No ponder.- Syzygy 6 men
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Komodo 10.4 vs Houdini 5

Post by Milos »

lkaufman wrote:Komodo 10.4 came out 4 elo above Houdini 5.01 on the IPON list. The total rating is of course vastly more significant statistically than the individual match result. Also no results yet for four threads, which is where I got the statistically significant plus score. But the exact results do seem to be fairly hardware-dependent; just between my Intel xeon machines relative speeds and results vary noticeably.
4 Elo on Ipon is first totally within error margins so K10.4 is not stronger than H5, and second Ipon list is pretty meaningless by today's standards (it has been well known to favorize Komodo and some other commercial engines, so its results should be taken with at least a grain of salt especially considering that there is absolutly nothing to support them except the word of Ingo).

There is not a single tester on any forum so far in any time control or SMP configuration that has reported K10.4 winning a match against H5. Not a single one out of at least 20 different posts (on multiple forums and lists).
So you are
a) either having some special conditions that favour Komodo against Houdini or
b) cherry picking your results meaning they are totally non-representative or
c) blatantly lying about the results you are reporting in order to advertise your engine (something totally against the charter of this forum).
mjlef
Posts: 1494
Joined: Thu Mar 30, 2006 2:08 pm

Re: Komodo 10.4 vs Houdini 5

Post by mjlef »

Milos wrote:
lkaufman wrote:Komodo 10.4 came out 4 elo above Houdini 5.01 on the IPON list. The total rating is of course vastly more significant statistically than the individual match result. Also no results yet for four threads, which is where I got the statistically significant plus score. But the exact results do seem to be fairly hardware-dependent; just between my Intel xeon machines relative speeds and results vary noticeably.
4 Elo on Ipon is first totally within error margins so K10.4 is not stronger than H5, and second Ipon list is pretty meaningless by today's standards (it has been well known to favorize Komodo and some other commercial engines, so its results should be taken with at least a grain of salt especially considering that there is absolutly nothing to support them except the word of Ingo).

There is not a single tester on any forum so far in any time control or SMP configuration that has reported K10.4 winning a match against H5. Not a single one out of at least 20 different posts (on multiple forums and lists).
So you are
a) either having some special conditions that favour Komodo against Houdini or
b) cherry picking your results meaning they are totally non-representative or
c) blatantly lying about the results you are reporting in order to advertise your engine (something totally against the charter of this forum).
"4 Elo on Ipon is first totally within error margins so K10.4 is not stronger than H5"??? A result showing Komodo 10.4 stronger than Houdini, but still within the error margins does not prove K 10.4 is stronger, but it also does not prove it is weaker. You need more games, and they will eventually come.

As usual, we expect Houdini to be very strong at very fast games, with Komodo gaining on it at longer time controls. I have seen quite a few test results showing this already, but how about we wait for enough games to get played before we jump to conclusions?

As for Ingo's integrity, I find him to be totally fair, completely independent and totally trustworthy. I think you owe him an apology.

Mark
mjlef
Posts: 1494
Joined: Thu Mar 30, 2006 2:08 pm

Re: Komodo 10.4 vs Houdini 5

Post by mjlef »

I wanted to add that over years of working with Larry, I have never seen him lie or exaggerate any claims. I have never seem him cherry pick. He always combines all the testing information he has when making a rating estimate. We always post our test results upon releases including the time controls, conditions and number of games. We let the user decide.

Mark
JJJ
Posts: 1346
Joined: Sat Apr 19, 2014 1:47 pm

Re: Komodo 10.4 vs Houdini 5

Post by JJJ »

I trust Mark and Larry, and I agree, we need to wait more game to draw any conclusion. fast.gm will provide enough game soon enough, so when we know :)

And so far, we see +26 elo on bullet, so it's a promising start to me.
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: Komodo 10.4 vs Houdini 5

Post by IWB »

If my list is meaningless (and it might be), please check the CEGT 40/20 released today.
It is simple mathematics to be rated better than another engines even with a lower score in direct comparison (and 220 direct games in my case are pretty much useless statistically for chess) Happens all the time in pretty much any sport ... but might be hard to understand for some.

I wanted to write more in detail about your strange accusations but as my results were and are always in line with the big ones (see above, and that speaks for me or means that everyone is biased just not you) I really think that you are stuck in a paranoid world I don't want to be dragged into!
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Komodo 10.4 vs Houdini 5

Post by Milos »

IWB wrote:If my list is meaningless (and it might be), please check the CEGT 40/20 released today.
It is simple mathematics to be rated better than another engines even with a lower score in direct comparison (and 220 direct games in my case are pretty much useless statistically for chess) Happens all the time in pretty much any sport ... but might be hard to understand for some.
K is not rated better than H since the difference is only 4 Elo, and error bars are 10. So basically K is ahead of H just by sheer luck. You seams to have difficulty understanding written text and also this simple statistical fact.
But K/H difference aside what is really outrages is SF/K difference at your list. It is indication of clear bias and a simply impossible result.
I wanted to write more in detail about your strange accusations but as my results were and are always in line with the big ones (see above, and that speaks for me or means that everyone is biased just not you) I really think that you are stuck in a paranoid world I don't want to be dragged into!
Your results were always suspicious. But I don't talk about results, you play ponder on which even though being more efficient in terms of burnt computer power introduces a bias of which engine has better ponder implementation. Second you play on outdated hardware and use weaker (on multiple occasions with SF) compiles of some engines. Third you use unknown openings which immediately disqualifies you from any serious testing discussion. And finally you never ever published any pgn or actual proof of any of games played so sorry, but it is hard to believe in your testing "methods" and I am certainly not the only one that doubts it. Many on this forum do and you know it no matter how much you pretend not to and turn your head away.
kranium
Posts: 2129
Joined: Thu May 29, 2008 10:43 am

Re: Komodo 10.4 vs Houdini 5

Post by kranium »

IWB wrote: I wanted to write more in detail about your strange accusations but as my results were and are always in line with the big ones (see above, and that speaks for me or means that everyone is biased just not you) I really think that you are stuck in a paranoid world I don't want to be dragged into!
Speaking of paranoia, I understand IPON doesn't allow download of test games because you believe authors will specifically tune their engines to play the secret IPON opening set well, thereby getting a better rating on your list... LOL

I am quite intrigued and amused by this particular IPON 'Condition':

"13. Each test run with 3300 games uses about 35kWh of energy. If you are willing to pay for
your favorite engine PLEASE send me an email! We will find an arangement :-)
"

:shock:
mjlef
Posts: 1494
Joined: Thu Mar 30, 2006 2:08 pm

Re: Komodo 10.4 vs Houdini 5

Post by mjlef »

Milos wrote:
IWB wrote:If my list is meaningless (and it might be), please check the CEGT 40/20 released today.
It is simple mathematics to be rated better than another engines even with a lower score in direct comparison (and 220 direct games in my case are pretty much useless statistically for chess) Happens all the time in pretty much any sport ... but might be hard to understand for some.
K is not rated better than H since the difference is only 4 Elo, and error bars are 10. So basically K is ahead of H just by sheer luck. You seams to have difficulty understanding written text and also this simple statistical fact.
But K/H difference aside what is really outrages is SF/K difference at your list. It is indication of clear bias and a simply impossible result.
I wanted to write more in detail about your strange accusations but as my results were and are always in line with the big ones (see above, and that speaks for me or means that everyone is biased just not you) I really think that you are stuck in a paranoid world I don't want to be dragged into!
Your results were always suspicious. But I don't talk about results, you play ponder on which even though being more efficient in terms of burnt computer power introduces a bias of which engine has better ponder implementation. Second you play on outdated hardware and use weaker (on multiple occasions with SF) compiles of some engines. Third you use unknown openings which immediately disqualifies you from any serious testing discussion. And finally you never ever published any pgn or actual proof of any of games played so sorry, but it is hard to believe in your testing "methods" and I am certainly not the only one that doubts it. Many on this forum do and you know it no matter how much you pretend not to and turn your head away.
Ponder on is exactly what humans do in tournaments, and what is done in the World Computer Chess Championship. I respect all the rating lists, but I especially admire the ones that include pondering on, since it better approximates human and computer chess tournaments. I do not have any specific data that Komodo gets more ponderhits than other engines. It would be interesting to study. So I am not sure it would have much of an effect on elo compared with no ponder testing. If you have more data on this, I would love to see it.

A great deal of the rating lists are at very fast time controls. Larry and I know that Komodo just does not do as well at these fast time controls. Larry has written here about this many times. Our goal is to make the strongest chess engine we can at standard time controls, even if it means Komodo is not the best at bullet/blitz. The results Larry reported and some much longer time control matches running now look pretty good so far, but we have to wait for more games to draw meaningful conclusions.

BTW, the 4 elo lead you mention does not mean "luck". 4 elo is roughly 1 Standard Deviation of the error margin. You can see on the Ipon list a column marked "CFS(next)". The error margins and results are used to determine a confidence that one program is stronger than the one below it in the list. It shows 70% for Komodo 10.4 right now. This is not proof, just the likelihood that Komodo is stronger with these settings. More games will raise CFS. If you are interested in how this works, you can google it or study the Ordo source code.

Basically, Ingo uses sound statistical methods. Scientists never work in absolutes. Just high probabilities. I find Ingo's methods and rating list admirable, even when we sometimes come out with disappointing results. I find Ingo not publishing his opening very interesting. It helps prevent programs form "booking up" or tuning programs for those openings. Like I mentioned before, Ingo is as fair as anyone can be.