guyhaw wrote:I recommend John Beasley's book 'The Mathematics of Games'. There, he gives a critical review of the concept of rating, ELO etc.
He gives a scenario involving a circular running track (the usual kind) where, quite legitimately, A beats B, B beats C and C beats A ... a counterexample to the assumption that players can be ordered in a linear spectrum.
Abstract: The Bradley–Terry model is widely and often beneficially used to rank objects from paired comparisons. The underlying assumption that makes ranking possible is the existence of a latent linear scale of merit or equivalently of a kind of transitiveness of the preference. However, in some situations such as sensory comparisons of products, this assumption can be unrealistic. In these contexts, although the Bradley–Terry model appears to be significantly interesting, the linear ranking does not make sense. Our aim is to propose a 2-dimensional extension of the Bradley–Terry model that accounts for interactions between the compared objects. From a methodological point of view, this proposition can be seen as a multidimensional scaling approach in the context of a logistic model for binomial data. Maximum likelihood is investigated and asymptotic properties are derived in order to construct confidence ellipses on the diagram of the 2-dimensional scores. It is shown by an illustrative example based on real sensory data on how to use the 2-dimensional model to inspect the lack-of-fit of the Bradley–Terry model.
The lack of fit of the Elo-rating model to computer-go data is particularly strong, with different kinds of programs using radically different algorithms.
Remi - good to have yr contributions to this board - I could not have put it better myself
Would be interesting to see the proposals applied to John Beasley's circular running track.
guy
Consider, say three features, and let A A' A" and B B' B" be the ratings of two players with respect to these three ratings. Formulas which are functions of the differences B-A and B'-A' and B"-A" cannot be right when the differences are not uniquely comparable. Often a rating essentially does not change when it is multiplied by a positive constant. Then it is not clear whether one should use the said differences or for instance 5*(B-A) and B'-A' and B"-A".
However, I will show that the quotients A/B and similar have a meaning, which is comparable for different features and in general, in the case of the m-ad rating. Thus inequality:
A/B + A'/B' + A"/B" > 3
would show that in a sense the first player is indeed stronger than the second player if the three features are independent and about equally essential.
I don't really understand your thinking about using ratios instead of differences. It is very similar. It all depends on whether you use a log scale (ie, log(a/b)=log(a)-log(b)). So being a function of the ratio, or being a function of the difference is the same. But maybe there is something that you meant and I did not understand.
Rémi Coulom wrote:This is a short French version of the paper: http://www.agro-montpellier.fr/sfds/CD/ ... usson1.pdf
I am sure if you ask the authors by e-mail, they would be glad to send the english version to you.
Perhaps. Thank you for your suggestion. I have very little time and energy though (unfortunately).
Rémi Coulom wrote:I don't really understand your thinking about using ratios instead of differences.
Rémi
There is none in general, as you have observed, of course. But, as you allowed for it, in the case of my m-ad rating the quotient is meaningful, not changing under any change of an arbitrary parameter like the list's average rating M. The meaning is that the quotient of the m-ad ratings of two players is expected to be (statistically speaking) the quotient of their scores in a long match. And yes, from the beginning, for the psychological reasons, I was planning for a rating company to provide users also with the logarithmic variant of the same rating (see, if it's important , my old posts on rgc*). I'll start to present the m-ad basic rating function in a few hours (I hope ).