Komodo 4 running for the IPON

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

MM
Posts: 766
Joined: Sun Oct 16, 2011 11:25 am

Re: Komodo 4 running for the IPON

Post by MM »

Houdini wrote:
Uri Blass wrote:I see it but it seems strange for me difference that is so high from the average performance of komodo4

I do not believe that the engine that beat houdini2.0 and every different engine in a direct match is 40 elo weaker than houdini2.0 considering the fact that I remember that it got many results with performance above 3000.

I expected it to get elo near 3000 based on the results or maybe the performance based on a single match are clearly lower than what is written.

I wonder what is going to be the rating of a hypotetical engine that get a performance of 3000 against every engine.
It was exactly the same with Houdini 2.0.
All the individual match performances were averaging about 3045, yet the final rating was 3020.
Apparently engines are penalized for being near or at the top of the pack :-).

Robert
+1

That is funny, i think that the avarage of the performances of all matches (100 games each) should be the real rating.

Regards
MM
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: Komodo 4 running for the IPON

Post by IWB »

FIXED.

Problems with transmission right after I left the comp ... bad luck.

Bye
Ingo
Uri Blass
Posts: 10889
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Komodo 4 running for the IPON

Post by Uri Blass »

MM wrote:
Houdini wrote:
Uri Blass wrote:I see it but it seems strange for me difference that is so high from the average performance of komodo4

I do not believe that the engine that beat houdini2.0 and every different engine in a direct match is 40 elo weaker than houdini2.0 considering the fact that I remember that it got many results with performance above 3000.

I expected it to get elo near 3000 based on the results or maybe the performance based on a single match are clearly lower than what is written.

I wonder what is going to be the rating of a hypotetical engine that get a performance of 3000 against every engine.
It was exactly the same with Houdini 2.0.
All the individual match performances were averaging about 3045, yet the final rating was 3020.
Apparently engines are penalized for being near or at the top of the pack :-).

Robert
+1

That is funny, i think that the avarage of the performances of all matches (100 games each) should be the real rating.

Regards
In that case an engine that beat one engine 100-0 is going to get an infinite rating even if it lose against the best engines.

I do not think that it should be the case but I think that it should be closer to the average relative to what I see.(at least when I see no extreme results and the best performance of komodo was only slightly above 3100)
Bram Visser
Posts: 52
Joined: Wed Oct 19, 2011 3:37 pm
Location: NL

Re: Komodo 4 running for the IPON

Post by Bram Visser »

I still don't see Komodo 4 on the rating list. Also not after pressing F5 ...
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Komodo 4 running for the IPON

Post by Sven »

Uri Blass wrote:
MM wrote:
Houdini wrote:
Uri Blass wrote:I see it but it seems strange for me difference that is so high from the average performance of komodo4

I do not believe that the engine that beat houdini2.0 and every different engine in a direct match is 40 elo weaker than houdini2.0 considering the fact that I remember that it got many results with performance above 3000.

I expected it to get elo near 3000 based on the results or maybe the performance based on a single match are clearly lower than what is written.

I wonder what is going to be the rating of a hypotetical engine that get a performance of 3000 against every engine.
It was exactly the same with Houdini 2.0.
All the individual match performances were averaging about 3045, yet the final rating was 3020.
Apparently engines are penalized for being near or at the top of the pack :-).

Robert
+1

That is funny, i think that the avarage of the performances of all matches (100 games each) should be the real rating.

Regards
In that case an engine that beat one engine 100-0 is going to get an infinite rating even if it lose against the best engines.

I do not think that it should be the case but I think that it should be closer to the average relative to what I see.(at least when I see no extreme results and the best performance of komodo was only slightly above 3100)
Arithmetic averaging of match performances, or of ELO rating numbers in general, is not applicable when precise ratings are required since the ELO rating system is based on a non-linear formula. Example: two matches A-B and A-C, both with the same number of games.
A-B 500:500 => A and B have identical "match performance".
A-C 760:240 => A has a "match performance" of roughly +200 ELO points better than C.
Arithmetic average would be +100 for A but the actual rating will be something around +94 only, due to non-linearity (the average of 50% and 76% expected wins is 63% which corresponds to about +94 ELO, not to +100).

Establishing a rating from games against a variety of opponents is even a bit more complex than in my tiny example above, so explaining an ELO rating cannot be done within few lines only. But at least you can say that averaging match performances is usually never the way to go.

Sven
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Komodo 4 running for the IPON

Post by Laskos »

Sven Schüle wrote:
Uri Blass wrote:
MM wrote:
Houdini wrote:
Uri Blass wrote:I see it but it seems strange for me difference that is so high from the average performance of komodo4

I do not believe that the engine that beat houdini2.0 and every different engine in a direct match is 40 elo weaker than houdini2.0 considering the fact that I remember that it got many results with performance above 3000.

I expected it to get elo near 3000 based on the results or maybe the performance based on a single match are clearly lower than what is written.

I wonder what is going to be the rating of a hypotetical engine that get a performance of 3000 against every engine.
It was exactly the same with Houdini 2.0.
All the individual match performances were averaging about 3045, yet the final rating was 3020.
Apparently engines are penalized for being near or at the top of the pack :-).

Robert
+1

That is funny, i think that the avarage of the performances of all matches (100 games each) should be the real rating.

Regards
In that case an engine that beat one engine 100-0 is going to get an infinite rating even if it lose against the best engines.

I do not think that it should be the case but I think that it should be closer to the average relative to what I see.(at least when I see no extreme results and the best performance of komodo was only slightly above 3100)
Arithmetic averaging of match performances, or of ELO rating numbers in general, is not applicable when precise ratings are required since the ELO rating system is based on a non-linear formula. Example: two matches A-B and A-C, both with the same number of games.
A-B 500:500 => A and B have identical "match performance".
A-C 760:240 => A has a "match performance" of roughly +200 ELO points better than C.
Arithmetic average would be +100 for A but the actual rating will be something around +94 only, due to non-linearity (the average of 50% and 76% expected wins is 63% which corresponds to about +94 ELO, not to +100).

Establishing a rating from games against a variety of opponents is even a bit more complex than in my tiny example above, so explaining an ELO rating cannot be done within few lines only. But at least you can say that averaging match performances is usually never the way to go.

Sven
Yes, I tried to explain this differently some time ago, but no one followed:

500:500 result has smaller Elo points errors than 760:240 result, therefore the weight of close results like 500:500 is higher than mismatches like 900:100, when averaging. In other words, 0 points difference has a higher weight than +/-200 points difference (for equal number of games), as you showed (+/-94 instead of +/-100, a bit closer to 0, a higher weight for 0 result). For top engines, it means that the top, pretty equal results have higher weights that the bottom mismatches.

I was still wondering, because even top 5 or 10 results were averaging a bit higher, but I do trust EloStat.

Kai
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Komodo 4 running for the IPON

Post by Sven »

Laskos wrote:500:500 result has smaller Elo points errors than 760:240 result, therefore the weight of close results like 500:500 is higher than mismatches like 900:100, when averaging. In other words, 0 points difference has a higher weight than +/-200 points difference (for equal number of games), as you showed (+/-94 instead of +/-100, a bit closer to 0, a higher weight for 0 result). For top engines, it means that the top, pretty equal results have higher weights that the bottom mismatches.
I am not sure about your "elo points error" reasoning, my key point is simply the percentage expectancy curve that is already non-linear and thus makes linear averaging incorrect. The error bars of "match performances" do not affect the final rating result but only the total error bars, IMO.

Sven
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Komodo 4 running for the IPON

Post by Laskos »

Sven Schüle wrote:
Laskos wrote:500:500 result has smaller Elo points errors than 760:240 result, therefore the weight of close results like 500:500 is higher than mismatches like 900:100, when averaging. In other words, 0 points difference has a higher weight than +/-200 points difference (for equal number of games), as you showed (+/-94 instead of +/-100, a bit closer to 0, a higher weight for 0 result). For top engines, it means that the top, pretty equal results have higher weights that the bottom mismatches.
I am not sure about your "elo points error" reasoning, my key point is simply the percentage expectancy curve that is already non-linear and thus makes linear averaging incorrect. The error bars of "match performances" do not affect the final rating result but only the total error bars, IMO.

Sven
Yes, this non-linearity is exactly the reason why 500:500 result has smaller errors than 900:100 result. The relevance of a result in the statistical average is determined by its weight, which is proportional to 1/error^2. About the errors, the total error must be less than the each individual, and it looks like 1/sqrt(1/error1^2 + 1/error2^2+.....), if I am not wrong.

Kai

ps You are correct that non-linearity of the Elo curve is the culprit of everything, I was just trying to show to Uri how to get intuitively a feeling about those averages. Never mind, I myself was a bit surprised lol
ernest
Posts: 2053
Joined: Wed Mar 08, 2006 8:30 pm

Re: Komodo 4 running for the IPON

Post by ernest »

Laskos wrote:ps You are correct that non-linearity of the Elo curve is the culprit of everything,
Remember Rémi's answer (Posted: Sun Sep 04, 2011): :)

http://www.talkchess.com/forum/viewtopi ... 320#422320
Uri Blass
Posts: 10889
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Komodo 4 running for the IPON

Post by Uri Blass »

Sven,I agree that arithmetic average is wrong but even the median seems to be always higher than the rating estimate and this is not to be expected.

Something is clearly wrong in the calculation of rating.

Note that
I see that komodo beat houdini and I see later that Critter also has 50% against Houdini and I wonder what are the better results of houdini that cause houdini2 to get better rating.