I remember it, I answered in the same thread, and while correct, his answer is basically just saying that taking the average is wrong. Which was not a secret, I think.
if you consider only the games against houdini 2, the rating will be more than 3020. As you include more engines, the calculation turns more complex.
An engine (lets call Anti-Houdini) can win against houdini and perform very bad against all other engines.. so the final ELO will be low, because takes into account the proformance against the group of best engines, not only the one at top.
When houdini ipon test was done, managed to beat all, and really killed under 2800 engines. The komodo4 performance is more like the anti-houdini example.
(just to exaggerate things and show why is possible to beat all opponents >50% and not get to top)
Uri Blass wrote:Sven,I agree that arithmetic average is wrong but even the median seems to be always higher than the rating estimate and this is not to be expected.
Something is clearly wrong in the calculation of rating.
Note that
I see that komodo beat houdini and I see later that Critter also has 50% against Houdini and I wonder what are the better results of houdini that cause houdini2 to get better rating.
Ernest has provided a link to a statement of Rémi, have you read it, and also subsequent posts in that thread?
I think that match performance ratings between pairs of engines are almost irrelevant for any estimate of total ratings. Neither arithmetic mean nor median do say anything here. The overall score from all games is most probably a better indicator for the overall rating than any number derived from match performances.
Uri Blass wrote:Sven,I agree that arithmetic average is wrong but even the median seems to be always higher than the rating estimate and this is not to be expected.
Something is clearly wrong in the calculation of rating.
Note that
I see that komodo beat houdini and I see later that Critter also has 50% against Houdini and I wonder what are the better results of houdini that cause houdini2 to get better rating.
Why is it so difficult to understand that if Komodo scores 70% against the rest of the pack, not being Houdini, and that Houdini scores 80% against the same pack, that Houdini still has a higher rating even though it loses a head on match against Komodo ?
Sounds to me that some readers here are in need of a basic statistics course.
Uri Blass wrote:Sven,I agree that arithmetic average is wrong but even the median seems to be always higher than the rating estimate and this is not to be expected.
Something is clearly wrong in the calculation of rating.
Note that
I see that komodo beat houdini and I see later that Critter also has 50% against Houdini and I wonder what are the better results of houdini that cause houdini2 to get better rating.
Why is it so difficult to understand that if Komodo scores 70% against the rest of the pack, not being Houdini, and that Houdini scores 80% against the same pack, that Houdini still has a higher rating even though it loses a head on match against Komodo ?
Sounds to me that some readers here are in need of a basic statistics course.
Of course
I understand it and my last question was what are the better results that houdini got.
Unfortunately I cannot see results of old matches.
Reply to my question could be
houdini-Crafty 90-10 Komodo-(same Crafty) 88-12 and the same for other programs
I wanted to see houdini and komodo results side by side and the same for Critter now.
Uri Blass wrote:Sven,I agree that arithmetic average is wrong but even the median seems to be always higher than the rating estimate and this is not to be expected.
Something is clearly wrong in the calculation of rating.
Note that
I see that komodo beat houdini and I see later that Critter also has 50% against Houdini and I wonder what are the better results of houdini that cause houdini2 to get better rating.
Why is it so difficult to understand that if Komodo scores 70% against the rest of the pack, not being Houdini, and that Houdini scores 80% against the same pack, that Houdini still has a higher rating even though it loses a head on match against Komodo ?
Sounds to me that some readers here are in need of a basic statistics course.
Yes, then do it, first by hand, with what you exemplified as clear, second, with EloStat.