I meant no disrespect here, and I apologize if you feel offended. But I could not leave this unchallenged because you basically reported a 50 game result in a highly visible way, with banners and a thread devoted to it. I had to challenge this just as visibly and publicly because it's not representative and would lead many to reach a conclusion that is completely wrong. Despite what you say you are presenting your data "sensationally" and leading people to believe that MY program is highly inferior to Ivanhoe - and I take that personally. If there were any truth to that I would keep my mouth shut and not draw attention to it.
And I am completely sincere when I say it is possible to inadvertently be biased but that this is not meant to be disparaging to you. As I say even respected scientist know this and have to take precautions and I have done this myself, later finding out what I did was flawed. It could be due to actual bias or it could be purely by accident, for example setting something incorrectly by accident. In the cases where the experimental setup was incorrect it is impossible to say for sure whether bias was involved because humans have very limited self-diagnostics. So when that happens I just shrug it off and correct the mistake and move on.
What protects Larry and I is that we have no reason to bias our results because it would just stand in the way of our progress. What point is there in thinking we did better than we actually did on some test? There is no test that is useful to us unless it's correct, whether we like the result or not.
And NO, I don't think you are stupid, but both Larry and I have at times set up tests wrong and it can happen to anyone. Your test here was so lopsided (and in the wrong direction) that you either did something wrong or IvanHoe really was quite fortunate. The error margins for a such a match would be something like 85 ELO so it's possible that the only thing you did "wrong" was report a result based on a 50 game sample. But then you implied that IvanHoe should have won by a much bigger margin and was just getting started, in other words Komodo was "lucky" to even do as well as it did.
Anyway, in science the thing that can protect your integrity and reputation is that when you publish result, do it in a verifiable way - which you have done. Since you explained all the testing conditions I am trying to duplicate your results and I would request that other people reading this do the same so that we don't have to argue about whether your test is correct.
The primary thing different is that I am running my test on a lower powered machine that you are, but this favors Ivanhoe. I am willing to share the games with anyone who requests them. I am going to try to get at least 1000 games (which is not nearly enough but all I can muster in a reasonable time) because I want to run the same test with more recent stronger versions of Komodo:
Code: Select all
Rank Name Elo + - games score oppo. draws
1 Komodo4 3025.7 36.4 36.4 312 54.3% 3000.0 45.8%
2 IvanHoe 3000.0 36.4 36.4 312 45.7% 3025.7 45.8%
TIME RATIO log(r) NODES log(r) ave DEPTH GAMES PLAYER
--------- ---------- -------- -------- -------- --------- ------- -------
4.7163 0.987 -0.014 3.100 -0.679 17.1137 312 Komodo4
4.7805 1.000 0.000 6.112 0.000 17.9475 312 IvanHoe
To be honest, Ivanhoe did better than I expected here but that might be explained by the low sample or that Ivanhoe has improved since I last looked at it or some combination of the these things. The version of Ivanhoe I tested is the same one you tested, IvanHoe999946e.
I will post the final result when I get 1000 games. I would like to kindly request that we get more verification from a third party running a similar test under the same conditions. Anyone?
Don wrote:George,
Your are obviously doing some wrong here. I suspect that you have Ivanhoe set to use the default number of processors, which is 8 on the version I am using.
Please check your results because if it's set up fairly, Komodo 4 should WIN such a match at the time control you are testing and the hardware you are testing on. Komodo 4 should win by a small but definite margin.
I'm running a similar match myself and I will publish the results. My hardware is a notebook but it's 64 bit, and I am also running Ivanhoe 46e and using the same time control you are using. After 10 games Komodo 4 is a game up, showing a score of 55% and about 25 ELO. Ten games is a ridiculously low sample but that is roughly in line with what we would expect.
Your ridiculously distorted results would put Ivanhoe well over 100 ELO ahead of Houdini. Do the math - if Houdini is about 40-50 ELO ahead of Komodo and Ivanhoe is 180 ahead of Komodo, then it's clear than Ivanhoe is much stronger than Houdini. But that is obviously not the case which you yourself have admitted.
It's understood that your sample of 50 or 60 games is ridiculously low but a match this unlucky for Komodo is possible but unlikely. So I think that either your setup is incorrect somehow, or that you are inadvertently reporting only the matches that are strongly in favor of Ivanhoe. Even though I have no question about your integrity and honesty, this can happen due to human error and psychology, so even respected scientists have to be diligent so as not to report biased results inadvertently. You have never failed to report a bad result for Ivanhoe so I suspect this, but I suspect your setup even more because 180 ELO is unlikely, even after only 50 games.
So can you please check your setup? Also, it would be good if we could get a neutral party to run this same test to see if your results can be duplicated. I'm running the test myself but I'm not a neutral party obviously.
I actually now have 20 games and here is what I get so far, i7-2630QM at 2.00 Ghz notebook, 64 bit linux, Komodo 4 vs IvanHoe 64 bit IvanHoe999946e both running just 1 core, 128 meg hash 40 moves in 180 seconds repeating time control ponder off. The games are available on request:
Code: Select all
Rank Name Elo + - games score oppo. draws 1 Komodo4 3000.0 135.7 135.7 20 57.5% 2958.9 45.0% 2 IvanHoe 2958.9 135.7 135.7 20 42.5% 3000.0 45.0%
geots wrote:Ivanhoe B46e x64 vs Komodo64 SSE Version 4
This Ivanhoe version is firmly placed in the "Top 4" of Ivanhoe versions. And I would imagine Komodo hopes so. Thing is, it was a much worse beating than the score indicates. I forgot to set the match for 50 games. When I woke up and checked, 59 games had been played. Of the 9 I had to remove to keep it at 50 games, Ivanhoe had really begun to turn it on- winning 6 of the last 9 with 3 games drawn. You can do the math.............. This makes 2 Ivanhoe versions checked.
Intel i5 w/4TCs
Fritz 11 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit
40/3 Repeating
Match=50 games
Code: Select all
1 Ivanhoe B46e x64 +92 +21/-8/=21 63.00% 31.5/50 2 Komodo64 SSE Version 4 -92 +8/-21/=21 37.00% 18.5/50
To post another match-
g
e
o
r
g
e