Milos wrote:IWB wrote:If my list is meaningless (and it might be), please check the CEGT 40/20 released today.
It is simple mathematics to be rated better than another engines even with a lower score in direct comparison (and 220 direct games in my case are pretty much useless statistically for chess) Happens all the time in pretty much any sport ... but might be hard to understand for some.
K is not rated better than H since the difference is only 4 Elo, and error bars are 10. So basically K is ahead of H just by sheer luck. You seams to have difficulty understanding written text and also this simple statistical fact.
But K/H difference aside what is really outrages is SF/K difference at your list. It is indication of clear bias and a simply impossible result.
I wanted to write more in detail about your strange accusations but as my results were and are always in line with the big ones (see above, and that speaks for me or means that everyone is biased just not you) I really think that you are stuck in a paranoid world I don't want to be dragged into!
Your results were always suspicious. But I don't talk about results, you play ponder on which even though being more efficient in terms of burnt computer power introduces a bias of which engine has better ponder implementation. Second you play on outdated hardware and use weaker (on multiple occasions with SF) compiles of some engines. Third you use unknown openings which immediately disqualifies you from any serious testing discussion. And finally you never ever published any pgn or actual proof of any of games played so sorry, but it is hard to believe in your testing "methods" and I am certainly not the only one that doubts it. Many on this forum do and you know it no matter how much you pretend not to and turn your head away.
Ponder on is exactly what humans do in tournaments, and what is done in the World Computer Chess Championship. I respect all the rating lists, but I especially admire the ones that include pondering on, since it better approximates human and computer chess tournaments. I do not have any specific data that Komodo gets more ponderhits than other engines. It would be interesting to study. So I am not sure it would have much of an effect on elo compared with no ponder testing. If you have more data on this, I would love to see it.
A great deal of the rating lists are at very fast time controls. Larry and I know that Komodo just does not do as well at these fast time controls. Larry has written here about this many times. Our goal is to make the strongest chess engine we can at standard time controls, even if it means Komodo is not the best at bullet/blitz. The results Larry reported and some much longer time control matches running now look pretty good so far, but we have to wait for more games to draw meaningful conclusions.
BTW, the 4 elo lead you mention does not mean "luck". 4 elo is roughly 1 Standard Deviation of the error margin. You can see on the Ipon list a column marked "CFS(next)". The error margins and results are used to determine a confidence that one program is stronger than the one below it in the list. It shows 70% for Komodo 10.4 right now. This is not proof, just the likelihood that Komodo is stronger with these settings. More games will raise CFS. If you are interested in how this works, you can google it or study the Ordo source code.
Basically, Ingo uses sound statistical methods. Scientists never work in absolutes. Just high probabilities. I find Ingo's methods and rating list admirable, even when we sometimes come out with disappointing results. I find Ingo not publishing his opening very interesting. It helps prevent programs form "booking up" or tuning programs for those openings. Like I mentioned before, Ingo is as fair as anyone can be.