Komodo 4 running for the IPON

MM · Post by MM » Wed Dec 28, 2011 2:16 am

Houdini wrote:
Uri Blass wrote:I see it but it seems strange for me difference that is so high from the average performance of komodo4

I do not believe that the engine that beat houdini2.0 and every different engine in a direct match is 40 elo weaker than houdini2.0 considering the fact that I remember that it got many results with performance above 3000.

I expected it to get elo near 3000 based on the results or maybe the performance based on a single match are clearly lower than what is written.

I wonder what is going to be the rating of a hypotetical engine that get a performance of 3000 against every engine.
It was exactly the same with Houdini 2.0.
All the individual match performances were averaging about 3045, yet the final rating was 3020.
Apparently engines are penalized for being near or at the top of the pack .

Robert

+1

That is funny, i think that the avarage of the performances of all matches (100 games each) should be the real rating.

Regards

IWB · Post by **IWB** » Wed Dec 28, 2011 8:51 am

FIXED.

Problems with transmission right after I left the comp ... bad luck.

Bye
Ingo

Uri Blass · Post by **Uri Blass** » Wed Dec 28, 2011 11:21 am

MM wrote:
Houdini wrote:
Uri Blass wrote:I see it but it seems strange for me difference that is so high from the average performance of komodo4

I do not believe that the engine that beat houdini2.0 and every different engine in a direct match is 40 elo weaker than houdini2.0 considering the fact that I remember that it got many results with performance above 3000.

I expected it to get elo near 3000 based on the results or maybe the performance based on a single match are clearly lower than what is written.

I wonder what is going to be the rating of a hypotetical engine that get a performance of 3000 against every engine.
It was exactly the same with Houdini 2.0.
All the individual match performances were averaging about 3045, yet the final rating was 3020.
Apparently engines are penalized for being near or at the top of the pack .

Robert
+1

That is funny, i think that the avarage of the performances of all matches (100 games each) should be the real rating.

Regards

In that case an engine that beat one engine 100-0 is going to get an infinite rating even if it lose against the best engines.

I do not think that it should be the case but I think that it should be closer to the average relative to what I see.(at least when I see no extreme results and the best performance of komodo was only slightly above 3100)

Bram Visser · Post by **Bram Visser** » Wed Dec 28, 2011 12:03 pm

I still don't see Komodo 4 on the rating list. Also not after pressing F5 ...

Sven · Post by **Sven** » Wed Dec 28, 2011 3:40 pm

Uri Blass wrote:
MM wrote:
Houdini wrote:
Uri Blass wrote:I see it but it seems strange for me difference that is so high from the average performance of komodo4

I do not believe that the engine that beat houdini2.0 and every different engine in a direct match is 40 elo weaker than houdini2.0 considering the fact that I remember that it got many results with performance above 3000.

I expected it to get elo near 3000 based on the results or maybe the performance based on a single match are clearly lower than what is written.

I wonder what is going to be the rating of a hypotetical engine that get a performance of 3000 against every engine.
It was exactly the same with Houdini 2.0.
All the individual match performances were averaging about 3045, yet the final rating was 3020.
Apparently engines are penalized for being near or at the top of the pack .

Robert
+1

That is funny, i think that the avarage of the performances of all matches (100 games each) should be the real rating.

Regards
In that case an engine that beat one engine 100-0 is going to get an infinite rating even if it lose against the best engines.

I do not think that it should be the case but I think that it should be closer to the average relative to what I see.(at least when I see no extreme results and the best performance of komodo was only slightly above 3100)

Arithmetic averaging of match performances, or of ELO rating numbers in general, is not applicable when precise ratings are required since the ELO rating system is based on a non-linear formula. Example: two matches A-B and A-C, both with the same number of games.
A-B 500:500 => A and B have identical "match performance".
A-C 760:240 => A has a "match performance" of roughly +200 ELO points better than C.
Arithmetic average would be +100 for A but the actual rating will be something around +94 only, due to non-linearity (the average of 50% and 76% expected wins is 63% which corresponds to about +94 ELO, not to +100).

Establishing a rating from games against a variety of opponents is even a bit more complex than in my tiny example above, so explaining an ELO rating cannot be done within few lines only. But at least you can say that averaging match performances is usually never the way to go.

Sven

Laskos · Post by **Laskos** » Wed Dec 28, 2011 4:41 pm

Sven Schüle wrote:
Uri Blass wrote:
MM wrote:
Houdini wrote:
Uri Blass wrote:I see it but it seems strange for me difference that is so high from the average performance of komodo4

I do not believe that the engine that beat houdini2.0 and every different engine in a direct match is 40 elo weaker than houdini2.0 considering the fact that I remember that it got many results with performance above 3000.

I expected it to get elo near 3000 based on the results or maybe the performance based on a single match are clearly lower than what is written.

I wonder what is going to be the rating of a hypotetical engine that get a performance of 3000 against every engine.
It was exactly the same with Houdini 2.0.
All the individual match performances were averaging about 3045, yet the final rating was 3020.
Apparently engines are penalized for being near or at the top of the pack .

Robert
+1

That is funny, i think that the avarage of the performances of all matches (100 games each) should be the real rating.

Regards
In that case an engine that beat one engine 100-0 is going to get an infinite rating even if it lose against the best engines.

I do not think that it should be the case but I think that it should be closer to the average relative to what I see.(at least when I see no extreme results and the best performance of komodo was only slightly above 3100)
Arithmetic averaging of match performances, or of ELO rating numbers in general, is not applicable when precise ratings are required since the ELO rating system is based on a non-linear formula. Example: two matches A-B and A-C, both with the same number of games.
A-B 500:500 => A and B have identical "match performance".
A-C 760:240 => A has a "match performance" of roughly +200 ELO points better than C.
Arithmetic average would be +100 for A but the actual rating will be something around +94 only, due to non-linearity (the average of 50% and 76% expected wins is 63% which corresponds to about +94 ELO, not to +100).

Establishing a rating from games against a variety of opponents is even a bit more complex than in my tiny example above, so explaining an ELO rating cannot be done within few lines only. But at least you can say that averaging match performances is usually never the way to go.

Sven

Yes, I tried to explain this differently some time ago, but no one followed:

500:500 result has smaller Elo points errors than 760:240 result, therefore the weight of close results like 500:500 is higher than mismatches like 900:100, when averaging. In other words, 0 points difference has a higher weight than +/-200 points difference (for equal number of games), as you showed (+/-94 instead of +/-100, a bit closer to 0, a higher weight for 0 result). For top engines, it means that the top, pretty equal results have higher weights that the bottom mismatches.

I was still wondering, because even top 5 or 10 results were averaging a bit higher, but I do trust EloStat.

Kai

Sven · Post by **Sven** » Wed Dec 28, 2011 4:56 pm

Laskos wrote:500:500 result has smaller Elo points errors than 760:240 result, therefore the weight of close results like 500:500 is higher than mismatches like 900:100, when averaging. In other words, 0 points difference has a higher weight than +/-200 points difference (for equal number of games), as you showed (+/-94 instead of +/-100, a bit closer to 0, a higher weight for 0 result). For top engines, it means that the top, pretty equal results have higher weights that the bottom mismatches.

I am not sure about your "elo points error" reasoning, my key point is simply the percentage expectancy curve that is already non-linear and thus makes linear averaging incorrect. The error bars of "match performances" do not affect the final rating result but only the total error bars, IMO.

Sven

Laskos · Post by **Laskos** » Wed Dec 28, 2011 5:13 pm

Sven Schüle wrote:
Laskos wrote:500:500 result has smaller Elo points errors than 760:240 result, therefore the weight of close results like 500:500 is higher than mismatches like 900:100, when averaging. In other words, 0 points difference has a higher weight than +/-200 points difference (for equal number of games), as you showed (+/-94 instead of +/-100, a bit closer to 0, a higher weight for 0 result). For top engines, it means that the top, pretty equal results have higher weights that the bottom mismatches.
I am not sure about your "elo points error" reasoning, my key point is simply the percentage expectancy curve that is already non-linear and thus makes linear averaging incorrect. The error bars of "match performances" do not affect the final rating result but only the total error bars, IMO.

Sven

Yes, this non-linearity is exactly the reason why 500:500 result has smaller errors than 900:100 result. The relevance of a result in the statistical average is determined by its weight, which is proportional to 1/error^2. About the errors, the total error must be less than the each individual, and it looks like 1/sqrt(1/error1^2 + 1/error2^2+.....), if I am not wrong.

Kai

ps You are correct that non-linearity of the Elo curve is the culprit of everything, I was just trying to show to Uri how to get intuitively a feeling about those averages. Never mind, I myself was a bit surprised lol

ernest · Post by **ernest** » Wed Dec 28, 2011 6:10 pm

Laskos wrote:ps You are correct that non-linearity of the Elo curve is the culprit of everything,

Remember Rémi's answer (Posted: Sun Sep 04, 2011):

http://www.talkchess.com/forum/viewtopi ... 320#422320

Uri Blass · Post by **Uri Blass** » Wed Dec 28, 2011 8:57 pm

Sven,I agree that arithmetic average is wrong but even the median seems to be always higher than the rating estimate and this is not to be expected.

Something is clearly wrong in the calculation of rating.

Note that
I see that komodo beat houdini and I see later that Critter also has 50% against Houdini and I wonder what are the better results of houdini that cause houdini2 to get better rating.

Komodo 4 running for the IPON

Re: Komodo 4 running for the IPON

Re: Komodo 4 running for the IPON

Re: Komodo 4 running for the IPON

Re: Komodo 4 running for the IPON

Re: Komodo 4 running for the IPON

Re: Komodo 4 running for the IPON

Re: Komodo 4 running for the IPON

Re: Komodo 4 running for the IPON

Re: Komodo 4 running for the IPON

Re: Komodo 4 running for the IPON