A "Top 4" Ivanhoe v Komodo64- OUCH!!!

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: A "Top 4" Ivanhoe v Komodo64- OUCH!!!

Post by Don »

Hi George,

I meant no disrespect here, and I apologize if you feel offended. But I could not leave this unchallenged because you basically reported a 50 game result in a highly visible way, with banners and a thread devoted to it. I had to challenge this just as visibly and publicly because it's not representative and would lead many to reach a conclusion that is completely wrong. Despite what you say you are presenting your data "sensationally" and leading people to believe that MY program is highly inferior to Ivanhoe - and I take that personally. If there were any truth to that I would keep my mouth shut and not draw attention to it.

And I am completely sincere when I say it is possible to inadvertently be biased but that this is not meant to be disparaging to you. As I say even respected scientist know this and have to take precautions and I have done this myself, later finding out what I did was flawed. It could be due to actual bias or it could be purely by accident, for example setting something incorrectly by accident. In the cases where the experimental setup was incorrect it is impossible to say for sure whether bias was involved because humans have very limited self-diagnostics. So when that happens I just shrug it off and correct the mistake and move on.

What protects Larry and I is that we have no reason to bias our results because it would just stand in the way of our progress. What point is there in thinking we did better than we actually did on some test? There is no test that is useful to us unless it's correct, whether we like the result or not.

And NO, I don't think you are stupid, but both Larry and I have at times set up tests wrong and it can happen to anyone. Your test here was so lopsided (and in the wrong direction) that you either did something wrong or IvanHoe really was quite fortunate. The error margins for a such a match would be something like 85 ELO so it's possible that the only thing you did "wrong" was report a result based on a 50 game sample. But then you implied that IvanHoe should have won by a much bigger margin and was just getting started, in other words Komodo was "lucky" to even do as well as it did.

Anyway, in science the thing that can protect your integrity and reputation is that when you publish result, do it in a verifiable way - which you have done. Since you explained all the testing conditions I am trying to duplicate your results and I would request that other people reading this do the same so that we don't have to argue about whether your test is correct.

The primary thing different is that I am running my test on a lower powered machine that you are, but this favors Ivanhoe. I am willing to share the games with anyone who requests them. I am going to try to get at least 1000 games (which is not nearly enough but all I can muster in a reasonable time) because I want to run the same test with more recent stronger versions of Komodo:

Code: Select all

Rank Name       Elo      +      -    games   score   oppo.   draws 
   1 Komodo4  3025.7   36.4   36.4     312   54.3%  3000.0   45.8% 
   2 IvanHoe  3000.0   36.4   36.4     312   45.7%  3025.7   45.8% 


      TIME       RATIO    log(r)     NODES    log(r)  ave DEPTH    GAMES   PLAYER
 ---------  ----------  --------  --------  --------  ---------  -------   -------
    4.7163       0.987    -0.014     3.100    -0.679    17.1137      312   Komodo4
    4.7805       1.000     0.000     6.112     0.000    17.9475      312   IvanHoe
I also show the average depth and node counts per move - a statistic my tester supports which helps us analyze differences in programs. As Larry and I already know most of the top ten programs search deeper and look at more nodes per second even though they are weaker than Komodo. In this case Ivanhoe searches on average almost a ply extra. (But it's not about depth, it's about quality.)

To be honest, Ivanhoe did better than I expected here but that might be explained by the low sample or that Ivanhoe has improved since I last looked at it or some combination of the these things. The version of Ivanhoe I tested is the same one you tested, IvanHoe999946e.

I will post the final result when I get 1000 games. I would like to kindly request that we get more verification from a third party running a similar test under the same conditions. Anyone?

Don wrote:George,

Your are obviously doing some wrong here. I suspect that you have Ivanhoe set to use the default number of processors, which is 8 on the version I am using.

Please check your results because if it's set up fairly, Komodo 4 should WIN such a match at the time control you are testing and the hardware you are testing on. Komodo 4 should win by a small but definite margin.

I'm running a similar match myself and I will publish the results. My hardware is a notebook but it's 64 bit, and I am also running Ivanhoe 46e and using the same time control you are using. After 10 games Komodo 4 is a game up, showing a score of 55% and about 25 ELO. Ten games is a ridiculously low sample but that is roughly in line with what we would expect.

Your ridiculously distorted results would put Ivanhoe well over 100 ELO ahead of Houdini. Do the math - if Houdini is about 40-50 ELO ahead of Komodo and Ivanhoe is 180 ahead of Komodo, then it's clear than Ivanhoe is much stronger than Houdini. But that is obviously not the case which you yourself have admitted.

It's understood that your sample of 50 or 60 games is ridiculously low but a match this unlucky for Komodo is possible but unlikely. So I think that either your setup is incorrect somehow, or that you are inadvertently reporting only the matches that are strongly in favor of Ivanhoe. Even though I have no question about your integrity and honesty, this can happen due to human error and psychology, so even respected scientists have to be diligent so as not to report biased results inadvertently. You have never failed to report a bad result for Ivanhoe so I suspect this, but I suspect your setup even more because 180 ELO is unlikely, even after only 50 games.

So can you please check your setup? Also, it would be good if we could get a neutral party to run this same test to see if your results can be duplicated. I'm running the test myself but I'm not a neutral party obviously.

I actually now have 20 games and here is what I get so far, i7-2630QM at 2.00 Ghz notebook, 64 bit linux, Komodo 4 vs IvanHoe 64 bit IvanHoe999946e both running just 1 core, 128 meg hash 40 moves in 180 seconds repeating time control ponder off. The games are available on request:

Code: Select all

Rank Name       Elo      +      -    games   score   oppo.   draws 
   1 Komodo4  3000.0  135.7  135.7      20   57.5%  2958.9   45.0% 
   2 IvanHoe  2958.9  135.7  135.7      20   42.5%  3000.0   45.0% 


geots wrote:Ivanhoe B46e x64 vs Komodo64 SSE Version 4


This Ivanhoe version is firmly placed in the "Top 4" of Ivanhoe versions. And I would imagine Komodo hopes so. Thing is, it was a much worse beating than the score indicates. I forgot to set the match for 50 games. When I woke up and checked, 59 games had been played. Of the 9 I had to remove to keep it at 50 games, Ivanhoe had really begun to turn it on- winning 6 of the last 9 with 3 games drawn. You can do the math.............. This makes 2 Ivanhoe versions checked.



Intel i5 w/4TCs
Fritz 11 gui
1CPU/64bit

128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit
40/3 Repeating
Match=50 games


Code: Select all

1   Ivanhoe B46e x64         +92    +21/-8/=21   63.00%   31.5/50
2   Komodo64 SSE Version 4   -92    +8/-21/=21   37.00%   18.5/50


To post another match-

g
e
o
r
g
e
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: A "Top 4" Ivanhoe v Komodo64- OUCH!!!

Post by Dr.Wael Deeb »

Don wrote:Hi George,

I meant no disrespect here, and I apologize if you feel offended. But I could not leave this unchallenged because you basically reported a 50 game result in a highly visible way, with banners and a thread devoted to it. I had to challenge this just as visibly and publicly because it's not representative and would lead many to reach a conclusion that is completely wrong. Despite what you say you are presenting your data "sensationally" and leading people to believe that MY program is highly inferior to Ivanhoe - and I take that personally. If there were any truth to that I would keep my mouth shut and not draw attention to it.

And I am completely sincere when I say it is possible to inadvertently be biased but that this is not meant to be disparaging to you. As I say even respected scientist know this and have to take precautions and I have done this myself, later finding out what I did was flawed. It could be due to actual bias or it could be purely by accident, for example setting something incorrectly by accident. In the cases where the experimental setup was incorrect it is impossible to say for sure whether bias was involved because humans have very limited self-diagnostics. So when that happens I just shrug it off and correct the mistake and move on.

What protects Larry and I is that we have no reason to bias our results because it would just stand in the way of our progress. What point is there in thinking we did better than we actually did on some test? There is no test that is useful to us unless it's correct, whether we like the result or not.

And NO, I don't think you are stupid, but both Larry and I have at times set up tests wrong and it can happen to anyone. Your test here was so lopsided (and in the wrong direction) that you either did something wrong or IvanHoe really was quite fortunate. The error margins for a such a match would be something like 85 ELO so it's possible that the only thing you did "wrong" was report a result based on a 50 game sample. But then you implied that IvanHoe should have won by a much bigger margin and was just getting started, in other words Komodo was "lucky" to even do as well as it did.

Anyway, in science the thing that can protect your integrity and reputation is that when you publish result, do it in a verifiable way - which you have done. Since you explained all the testing conditions I am trying to duplicate your results and I would request that other people reading this do the same so that we don't have to argue about whether your test is correct.

The primary thing different is that I am running my test on a lower powered machine that you are, but this favors Ivanhoe. I am willing to share the games with anyone who requests them. I am going to try to get at least 1000 games (which is not nearly enough but all I can muster in a reasonable time) because I want to run the same test with more recent stronger versions of Komodo:

Code: Select all

Rank Name       Elo      +      -    games   score   oppo.   draws 
   1 Komodo4  3025.7   36.4   36.4     312   54.3%  3000.0   45.8% 
   2 IvanHoe  3000.0   36.4   36.4     312   45.7%  3025.7   45.8% 


      TIME       RATIO    log(r)     NODES    log(r)  ave DEPTH    GAMES   PLAYER
 ---------  ----------  --------  --------  --------  ---------  -------   -------
    4.7163       0.987    -0.014     3.100    -0.679    17.1137      312   Komodo4
    4.7805       1.000     0.000     6.112     0.000    17.9475      312   IvanHoe
I also show the average depth and node counts per move - a statistic my tester supports which helps us analyze differences in programs. As Larry and I already know most of the top ten programs search deeper and look at more nodes per second even though they are weaker than Komodo. In this case Ivanhoe searches on average almost a ply extra. (But it's not about depth, it's about quality.)

To be honest, Ivanhoe did better than I expected here but that might be explained by the low sample or that Ivanhoe has improved since I last looked at it or some combination of the these things. The version of Ivanhoe I tested is the same one you tested, IvanHoe999946e.

I will post the final result when I get 1000 games. I would like to kindly request that we get more verification from a third party running a similar test under the same conditions. Anyone?

Don wrote:George,

Your are obviously doing some wrong here. I suspect that you have Ivanhoe set to use the default number of processors, which is 8 on the version I am using.

Please check your results because if it's set up fairly, Komodo 4 should WIN such a match at the time control you are testing and the hardware you are testing on. Komodo 4 should win by a small but definite margin.

I'm running a similar match myself and I will publish the results. My hardware is a notebook but it's 64 bit, and I am also running Ivanhoe 46e and using the same time control you are using. After 10 games Komodo 4 is a game up, showing a score of 55% and about 25 ELO. Ten games is a ridiculously low sample but that is roughly in line with what we would expect.

Your ridiculously distorted results would put Ivanhoe well over 100 ELO ahead of Houdini. Do the math - if Houdini is about 40-50 ELO ahead of Komodo and Ivanhoe is 180 ahead of Komodo, then it's clear than Ivanhoe is much stronger than Houdini. But that is obviously not the case which you yourself have admitted.

It's understood that your sample of 50 or 60 games is ridiculously low but a match this unlucky for Komodo is possible but unlikely. So I think that either your setup is incorrect somehow, or that you are inadvertently reporting only the matches that are strongly in favor of Ivanhoe. Even though I have no question about your integrity and honesty, this can happen due to human error and psychology, so even respected scientists have to be diligent so as not to report biased results inadvertently. You have never failed to report a bad result for Ivanhoe so I suspect this, but I suspect your setup even more because 180 ELO is unlikely, even after only 50 games.

So can you please check your setup? Also, it would be good if we could get a neutral party to run this same test to see if your results can be duplicated. I'm running the test myself but I'm not a neutral party obviously.

I actually now have 20 games and here is what I get so far, i7-2630QM at 2.00 Ghz notebook, 64 bit linux, Komodo 4 vs IvanHoe 64 bit IvanHoe999946e both running just 1 core, 128 meg hash 40 moves in 180 seconds repeating time control ponder off. The games are available on request:

Code: Select all

Rank Name       Elo      +      -    games   score   oppo.   draws 
   1 Komodo4  3000.0  135.7  135.7      20   57.5%  2958.9   45.0% 
   2 IvanHoe  2958.9  135.7  135.7      20   42.5%  3000.0   45.0% 


geots wrote:Ivanhoe B46e x64 vs Komodo64 SSE Version 4


This Ivanhoe version is firmly placed in the "Top 4" of Ivanhoe versions. And I would imagine Komodo hopes so. Thing is, it was a much worse beating than the score indicates. I forgot to set the match for 50 games. When I woke up and checked, 59 games had been played. Of the 9 I had to remove to keep it at 50 games, Ivanhoe had really begun to turn it on- winning 6 of the last 9 with 3 games drawn. You can do the math.............. This makes 2 Ivanhoe versions checked.



Intel i5 w/4TCs
Fritz 11 gui
1CPU/64bit

128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit
40/3 Repeating
Match=50 games


Code: Select all

1   Ivanhoe B46e x64         +92    +21/-8/=21   63.00%   31.5/50
2   Komodo64 SSE Version 4   -92    +8/-21/=21   37.00%   18.5/50


To post another match-

g
e
o
r
g
e
So you have the ability to apologize Don but you didn't apply it when you once called me Dweeb....

The human nature is so messed up most of the time regards,
Dr.D

P.S.And yes....your reply to George as a disrespect is severely underestimated.....
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: A "Top 4" Ivanhoe v Komodo64- OUCH!!!

Post by Don »

Dr.Wael Deeb wrote: So you have the ability to apologize Don but you didn't apply it when you once called me Dweeb....

The human nature is so messed up most of the time regards,
Dr.D

P.S.And yes....your reply to George as a disrespect is severely underestimated.....
I tried to apologize to you, you just didn't accept it and your response was that I was lying. And now your response is that I did not even apologize so there is nowhere else to go with people who will not accept an apology and who proudly hold a grudge until the end of time. Are we going to be reading about that incident 5 or 10 years from now too, something that happened long ago?

If you think I have no right to defend a clear misrepresentation of Komodo's strength (which is personal to me) that is your privilege, but I would hope (and I believe) that the vast majority of people on this forum would respect my right to make this correction. YOU seemed to think it was YOUR right to correct me when I inadvertently called you D Weeb but I don't have the same right?
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: A "Top 4" Ivanhoe v Komodo64- OUCH!!!

Post by Dr.Wael Deeb »

Don wrote:
Dr.Wael Deeb wrote: So you have the ability to apologize Don but you didn't apply it when you once called me Dweeb....

The human nature is so messed up most of the time regards,
Dr.D

P.S.And yes....your reply to George as a disrespect is severely underestimated.....
I tried to apologize to you, you just didn't accept it and your response was that I was lying. And now your response is that I did not even apologize so there is nowhere else to go with people who will not accept an apology and who proudly hold a grudge until the end of time. Are we going to be reading about that incident 5 or 10 years from now too, something that happened long ago?

If you think I have no right to defend a clear misrepresentation of Komodo's strength (which is personal to me) that is your privilege, but I would hope (and I believe) that the vast majority of people on this forum would respect my right to make this correction. YOU seemed to think it was YOUR right to correct me when I inadvertently called you D Weeb but I don't have the same right?
Ok Don,maybe you're right....It's not that important to roll it up over and over again for the next ten years or so....

As for your right to defend Komodo,only a messed up person will deny you this right but you were a little bit too harsh on George.....

Anyways,it's time to concentrate on Komodo and get it released ASAP....
Dr.D
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….