Komodo 10.4 vs Houdini 5

beram · Post by **beram** » Sun Mar 19, 2017 2:16 pm

Scacchista1977 wrote:I'm conducting a similar test with i7 5960x, at moment 80 games +14 elo for Houdini. In total will be 500 games.

What time control and cores you use Anton?

Meanwhile i have +15 -8 =54 53,33% for H5 after 75 games with TC 15m 10s
so worse performance for Komodo 10.4 with 5x longer time control

Code: Select all

K10.4 - H5, Rapid 15m+10s                              
1   Houdini 5 x64-pext   +23  +13/=54/-8 53.33%   40.0/75
2   Komodo 10.4 64-bit   -23  +8/=54/-13 46.67%   35.0/75

Scacchista1977 · Post by **Scacchista1977** » Sun Mar 19, 2017 2:19 pm

beram wrote:
Scacchista1977 wrote:I'm conducting a similar test with i7 5960x, at moment 80 games +14 elo for Houdini. In total will be 500 games.
What time control and cores you use Anton?

Meanwhile i have +15 -8 =54 53,33% for H5 after 75 games with TC 15m 10s
so worse performance for Komodo 10.4 with 5x longer time control
Code: Select all
K10.4 - H5, Rapid 15m+10s                              
1   Houdini 5 x64-pext   +23  +13/=54/-8 53.33%   40.0/75
2   Komodo 10.4 64-bit   -23  +8/=54/-13 46.67%   35.0/75

time 5'+0" - 14CPUs for engine - 512 MB Hash - No ponder.- Syzygy 6 men

Milos · Post by **Milos** » Sun Mar 19, 2017 7:53 pm

lkaufman wrote:Komodo 10.4 came out 4 elo above Houdini 5.01 on the IPON list. The total rating is of course vastly more significant statistically than the individual match result. Also no results yet for four threads, which is where I got the statistically significant plus score. But the exact results do seem to be fairly hardware-dependent; just between my Intel xeon machines relative speeds and results vary noticeably.

4 Elo on Ipon is first totally within error margins so K10.4 is not stronger than H5, and second Ipon list is pretty meaningless by today's standards (it has been well known to favorize Komodo and some other commercial engines, so its results should be taken with at least a grain of salt especially considering that there is absolutly nothing to support them except the word of Ingo).

There is not a single tester on any forum so far in any time control or SMP configuration that has reported K10.4 winning a match against H5. Not a single one out of at least 20 different posts (on multiple forums and lists).
So you are
a) either having some special conditions that favour Komodo against Houdini or
b) cherry picking your results meaning they are totally non-representative or
c) blatantly lying about the results you are reporting in order to advertise your engine (something totally against the charter of this forum).

mjlef · Post by **mjlef** » Sun Mar 19, 2017 9:15 pm

Milos wrote:
lkaufman wrote:Komodo 10.4 came out 4 elo above Houdini 5.01 on the IPON list. The total rating is of course vastly more significant statistically than the individual match result. Also no results yet for four threads, which is where I got the statistically significant plus score. But the exact results do seem to be fairly hardware-dependent; just between my Intel xeon machines relative speeds and results vary noticeably.
4 Elo on Ipon is first totally within error margins so K10.4 is not stronger than H5, and second Ipon list is pretty meaningless by today's standards (it has been well known to favorize Komodo and some other commercial engines, so its results should be taken with at least a grain of salt especially considering that there is absolutly nothing to support them except the word of Ingo).

There is not a single tester on any forum so far in any time control or SMP configuration that has reported K10.4 winning a match against H5. Not a single one out of at least 20 different posts (on multiple forums and lists).
So you are
a) either having some special conditions that favour Komodo against Houdini or
b) cherry picking your results meaning they are totally non-representative or
c) blatantly lying about the results you are reporting in order to advertise your engine (something totally against the charter of this forum).

"4 Elo on Ipon is first totally within error margins so K10.4 is not stronger than H5"??? A result showing Komodo 10.4 stronger than Houdini, but still within the error margins does not prove K 10.4 is stronger, but it also does not prove it is weaker. You need more games, and they will eventually come.

As usual, we expect Houdini to be very strong at very fast games, with Komodo gaining on it at longer time controls. I have seen quite a few test results showing this already, but how about we wait for enough games to get played before we jump to conclusions?

As for Ingo's integrity, I find him to be totally fair, completely independent and totally trustworthy. I think you owe him an apology.

Mark

mjlef · Post by **mjlef** » Sun Mar 19, 2017 9:45 pm

I wanted to add that over years of working with Larry, I have never seen him lie or exaggerate any claims. I have never seem him cherry pick. He always combines all the testing information he has when making a rating estimate. We always post our test results upon releases including the time controls, conditions and number of games. We let the user decide.

Mark

JJJ · Post by **JJJ** » Sun Mar 19, 2017 9:51 pm

I trust Mark and Larry, and I agree, we need to wait more game to draw any conclusion. fast.gm will provide enough game soon enough, so when we know

And so far, we see +26 elo on bullet, so it's a promising start to me.

IWB · Post by **IWB** » Sun Mar 19, 2017 11:07 pm

If my list is meaningless (and it might be), please check the CEGT 40/20 released today.
It is simple mathematics to be rated better than another engines even with a lower score in direct comparison (and 220 direct games in my case are pretty much useless statistically for chess) Happens all the time in pretty much any sport ... but might be hard to understand for some.

I wanted to write more in detail about your strange accusations but as my results were and are always in line with the big ones (see above, and that speaks for me or means that everyone is biased just not you) I really think that you are stuck in a paranoid world I don't want to be dragged into!

Milos · Post by **Milos** » Sun Mar 19, 2017 11:35 pm

IWB wrote:If my list is meaningless (and it might be), please check the CEGT 40/20 released today.
It is simple mathematics to be rated better than another engines even with a lower score in direct comparison (and 220 direct games in my case are pretty much useless statistically for chess) Happens all the time in pretty much any sport ... but might be hard to understand for some.

K is not rated better than H since the difference is only 4 Elo, and error bars are 10. So basically K is ahead of H just by sheer luck. You seams to have difficulty understanding written text and also this simple statistical fact.
But K/H difference aside what is really outrages is SF/K difference at your list. It is indication of clear bias and a simply impossible result.

I wanted to write more in detail about your strange accusations but as my results were and are always in line with the big ones (see above, and that speaks for me or means that everyone is biased just not you) I really think that you are stuck in a paranoid world I don't want to be dragged into!

Your results were always suspicious. But I don't talk about results, you play ponder on which even though being more efficient in terms of burnt computer power introduces a bias of which engine has better ponder implementation. Second you play on outdated hardware and use weaker (on multiple occasions with SF) compiles of some engines. Third you use unknown openings which immediately disqualifies you from any serious testing discussion. And finally you never ever published any pgn or actual proof of any of games played so sorry, but it is hard to believe in your testing "methods" and I am certainly not the only one that doubts it. Many on this forum do and you know it no matter how much you pretend not to and turn your head away.

kranium · Post by **kranium** » Sun Mar 19, 2017 11:42 pm

IWB wrote: I wanted to write more in detail about your strange accusations but as my results were and are always in line with the big ones (see above, and that speaks for me or means that everyone is biased just not you) I really think that you are stuck in a paranoid world I don't want to be dragged into!

Speaking of paranoia, I understand IPON doesn't allow download of test games because you believe authors will specifically tune their engines to play the secret IPON opening set well, thereby getting a better rating on your list... LOL

I am quite intrigued and amused by this particular IPON 'Condition':

"13. Each test run with 3300 games uses about 35kWh of energy. If you are willing to pay for
your favorite engine PLEASE send me an email! We will find an arangement "

mjlef · Post by **mjlef** » Sun Mar 19, 2017 11:54 pm

Milos wrote:
IWB wrote:If my list is meaningless (and it might be), please check the CEGT 40/20 released today.
It is simple mathematics to be rated better than another engines even with a lower score in direct comparison (and 220 direct games in my case are pretty much useless statistically for chess) Happens all the time in pretty much any sport ... but might be hard to understand for some.
K is not rated better than H since the difference is only 4 Elo, and error bars are 10. So basically K is ahead of H just by sheer luck. You seams to have difficulty understanding written text and also this simple statistical fact.
But K/H difference aside what is really outrages is SF/K difference at your list. It is indication of clear bias and a simply impossible result.

I wanted to write more in detail about your strange accusations but as my results were and are always in line with the big ones (see above, and that speaks for me or means that everyone is biased just not you) I really think that you are stuck in a paranoid world I don't want to be dragged into!
Your results were always suspicious. But I don't talk about results, you play ponder on which even though being more efficient in terms of burnt computer power introduces a bias of which engine has better ponder implementation. Second you play on outdated hardware and use weaker (on multiple occasions with SF) compiles of some engines. Third you use unknown openings which immediately disqualifies you from any serious testing discussion. And finally you never ever published any pgn or actual proof of any of games played so sorry, but it is hard to believe in your testing "methods" and I am certainly not the only one that doubts it. Many on this forum do and you know it no matter how much you pretend not to and turn your head away.

Ponder on is exactly what humans do in tournaments, and what is done in the World Computer Chess Championship. I respect all the rating lists, but I especially admire the ones that include pondering on, since it better approximates human and computer chess tournaments. I do not have any specific data that Komodo gets more ponderhits than other engines. It would be interesting to study. So I am not sure it would have much of an effect on elo compared with no ponder testing. If you have more data on this, I would love to see it.

A great deal of the rating lists are at very fast time controls. Larry and I know that Komodo just does not do as well at these fast time controls. Larry has written here about this many times. Our goal is to make the strongest chess engine we can at standard time controls, even if it means Komodo is not the best at bullet/blitz. The results Larry reported and some much longer time control matches running now look pretty good so far, but we have to wait for more games to draw meaningful conclusions.

BTW, the 4 elo lead you mention does not mean "luck". 4 elo is roughly 1 Standard Deviation of the error margin. You can see on the Ipon list a column marked "CFS(next)". The error margins and results are used to determine a confidence that one program is stronger than the one below it in the list. It shows 70% for Komodo 10.4 right now. This is not proof, just the likelihood that Komodo is stronger with these settings. More games will raise CFS. If you are interested in how this works, you can google it or study the Ordo source code.

Basically, Ingo uses sound statistical methods. Scientists never work in absolutes. Just high probabilities. I find Ingo's methods and rating list admirable, even when we sometimes come out with disappointing results. I find Ingo not publishing his opening very interesting. It helps prevent programs form "booking up" or tuning programs for those openings. Like I mentioned before, Ingo is as fair as anyone can be.

Komodo 10.4 vs Houdini 5

Re: Komodo 10.4 vs Houdini 5

Re: Komodo 10.4 vs Houdini 5

Re: Komodo 10.4 vs Houdini 5

Re: Komodo 10.4 vs Houdini 5

Re: Komodo 10.4 vs Houdini 5

Re: Komodo 10.4 vs Houdini 5

Re: Komodo 10.4 vs Houdini 5

Re: Komodo 10.4 vs Houdini 5

Re: Komodo 10.4 vs Houdini 5

Re: Komodo 10.4 vs Houdini 5