rating system

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Dann Corbit, Harvey Williamson

wlod

rating system

Post by wlod »

Once again, let me start (and end?) with a formal question. Is this forum a proper place to discuss a rating system?

Let me mention that I had encountered a surprisingly hostile reaction in the past from some statisticians(?) and "Elo experts", while I had a nice and strong support (years ago) from a researcher who was able to understand my simple idea. Thus I am a bit hesitant to even propose this topic.

My basic function is simple, mathematically clean. It allows to compute ("predict") things which for other functions are difficult or hopeless. Since all reasonable functions will work pretty well and in a roughly comparable way, the Ockham principle says that the simplest is the best (when other aspects are about even).

Actually, I have developed my system further, so that it solves such problems like the presence of the new players or of players who didn't play much in recent times, etc. It also handles well the results of the games between players of very different levels.

We could consider just one universal rating. Actually, in addition to one such function there is also a continuous spectrum of ratings, one for each strength (but each particular rating applies to all players). In practice, one would pay attention to the general function (for fun) and to one of the, say, twelve functions, namely to the one associated roughly with your strength, as it would be the most meaningful one. The idea is that, say, Morozevich "kills" weaker players while Ivanchuk does better against super grandmasters. Thus, possibly, Morozevich may have a higher than Ivanchuk general rating, while Ivanchuk may have a higher superGM rating (I am not saying that this is the case - it would have to be tested by the respective ratings).

Over years I wrote about my rating function at different times on rgc*, but got very little feedback (and almost zero constructive feedback), except for the mentioned one person.

The topic is interesting to me for at least two reasons. It relates to many areas like economy, politics, etc. And, on the chess side, I feel that there is no need for the prone to the degeneration FIDE and USCF like bureaucratic central organizations, which attempt to monopolize the chess life. Instead, there should be some (major in effect) rating companies, independent of any other companies and institutions. Then there might be some relaxed associations for chess judges, and separate ones for tournament organizers, etc etc. Some standards will be proposed and the best (or just some) standards will win, because it is convenient for everybody to have standards. They will be adopted freely, without being forced. A self-organizing, decentralized (multi-dimensional) chess world would be so much nicer than what we saw over the past sixty years.

Digression. It is possible to have ONE joint rating list for people playing all kind of different games. I would see it as something for fun and trivia and even (especially! :)) for ego trips. :) I wish, someone would turn this idea into a successful business, I'd like to see it (myself I am not going to do it, I am not capable of making a business like this). It'd be interesting to compare chess players with tennis players and mathematics students, etc. It'd be free for all in a sense.

***

Regards,
  • Wlod
James Constance
Posts: 358
Joined: Wed Mar 08, 2006 8:36 pm
Location: UK

Re: rating system

Post by James Constance »

wlod wrote:Once again, let me start (and end?) with a formal question. Is this forum a proper place to discuss a rating system?
We recently discussed elo calculations in the programming and technical forum.
wlod

Re: rating system

Post by wlod »

James Constance wrote:
wlod wrote:Once again, let me start (and end?) with a formal question. Is this forum a proper place to discuss a rating system?
We recently discussed elo calculations in the programming and technical forum.
For non-trivial mathematical constructions there is a trade-off: either the proof of the construction desired properties is straightforward, and then the construction is unnatural and contorted (because it was guided by the requirements of an easy proof) or the construction is simple and natural but not obvious, and the proof of its correctness can be difficult. Of the two, the much nicer and preferable is the second kind of a construction. But Elo made one of the first kind, and the world of organized chess has bought it. From an algorithmic (as opposed to set-theoretic) point of view, he didn't define rating as a function but as an inverse of a function.

There should be a very general theorem that all rating functions which satisfy certain natural axioms are basically fine, meaning that they order players according to their strength reasonably well. And still, chess tournament organizers should not be overly impressed with a single rating method--they should remember that there are Morozeviches and Ivanchuks, and that neither has to be absolutely better than the other.

Remark Certain features are arbitrary: how fast should ranking (not rating) of players change? Say, player A used to get 3 points out of 5 games against player B. How many games in a row should B win against A in order to catch A on the rating list? That's an arbitrary decision. In fact, one could have different ratings for different dynamic sensitivity.

Sorry to state well known truisms.

***

Now I know that Elo rating is welcome on this portal, but is there a room also for a discussion of a different rating system, which has never been used anywhere?

Regards,
  • Wlod
gerold
Posts: 10121
Joined: Thu Mar 09, 2006 12:57 am
Location: van buren,missouri

Re: rating system

Post by gerold »

Go to the tech.forum i am sure there are people there who can
help you, :) If not there CTF is the place to put if.
Nimzovik
Posts: 1831
Joined: Sat Jan 06, 2007 11:08 pm

Re: rating system

Post by Nimzovik »

By all means continue ..... I for one am interested............
wlod

Re: rating system

Post by wlod »

Nimzovik wrote:By all means continue ..... I for one am interested............
Thank you, Alex. Following James' suggestion and gerold's advice, I'll open a new thread in the P&TD forum. I'll call my system
    • m-ad rating system
where m stands for multiplication, and ad for addition (while the whole thing is pronounced mad).

Best regards,
  • Wlod
wlod

HIQ - my only cynical business idea

Post by wlod »

All my life I had only socially positive business ideas. In particular, I was never interested in stocks. The idea of making money without creating something good for the society somehow didn't have a chance to appeal to me. The only exception is the one described below, a fluke.

***

A company called, say, HIQ, offers nothing but rating to its accounts. Anybody may open an account or even any number of accounts - say, one under their true name, and several under fictitious names, it's ok, it's part of the game. For each account there is, say, a one time $5 opening fee, and a yearly $1 maintenance fee. Finally, for each rated "game" there is the rating fee of 5 cents per each "player", i.e. a total of 10 cents. (Accounts would perhaps pay for rating in bunches, say $5 for a 100 "games").

All that users have to do, from the point of view of HIQ, is to pay for rated games, and to send the results of the games, where the result would be agreed upon by the two involved accounts i.e. say Peter and Mary send to HIQ a message that they played a game, and that the result is .4 (results are decimal numbers between 0 and 1). Such a message would have to be electronically signed by both accounts. Then HIQ would update the rating of accounts of Peter and Mary.

Now you may play tennis or chess or you may try 100m dash or swimming... -- HIQ company does not care, just send scores between 0 and 1, that's all.

In particular, someone may create extra accounts, which will lose "games" against their main account, to pump up their HIQ rating in their main account). Fine. That's part of the game. If there are snobs and they have too much money, HIQ co. does not mind.

I feel that this silly idea somehow can be very popular, it may be mentioned by media, it may end in the Guinness book of records, ... There is something psychological about it, and once it achieves the critical mass there will be no stopping.

And still, I don't feel like actually implementing it. To me it is like a practical joke. I can think about practical jokes but I don't do them in reality, only in my mind. If you like to implement it, go ahead, it's yours :)

Regards,
  • Wlod
wlod

A model of inconsistent pair comparisons

Post by wlod »

Some cases of cyclic domination among certain top chess players are well documented. They remind us of the
  • scissors > paper > stone > scissors
game. Ironically, it seems common among the specialists on the pairwise comparison method and the consistency issue to think that if someone evaluates some items (say politicians :)) in a cyclic or--more generally-- inconsistent way then something is wrong with the evaluation process of such a party. But chess alone contradicts them.

There can be different objective reasons behind the cycles. I'll present just one of them, i.e. one possible model, which may be correct in some cases.

Let's have some items, say apple, pear and orange. We may evaluate them with respect to some features. One can even call a feature to be simple if it allows for a consistent evaluation. But when we compare two items overall, we may count the number of features with respect to which one item is better than the other. Here is a possible ranking of the three mentioned fruits with respect to their nourishing value, aroma, and mildness:

Code: Select all

            |  nour | aroma | mild
    --------|-------|---------|-----
    orange  |   1   |    3    |   2
    apple   |   3   |    2    |   1
    pear    |   2   |    1    |   3

Features wise, orange beats apple 2:1, and apple wins 2:1 against pear, while pear wins 2:1 with orange. An absolute linear rating and ranking of these fruits is impossible when what counts to us is the number of feature-wins.

Possibly, just possibly, we have a similar situation in chess to some extent. The mastery of chess may mean the mastery of several roughly independent features of the game. If features are well chosen then we might be able to predict the winner of a chess match as the one who prevails with respect to the greater number of features. This method may work better then any 1-dimensional rating. indeed, it is potentially capable of predicting cycles, something which is impossible for any 1-dimensional rating.

It that were the case then players and chess playing programs should learn to be good with respect to the considered features. The student's progress would be measured by the collection of the rating functions, one per each feature.

Consider this post to be just a signal rather than a complete discussion.

Regards,
  • Wlod
User avatar
hgm
Posts: 27701
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: A model of inconsistent pair comparisons

Post by hgm »

Well, it is obvious that Chess-playing skills are not a one-dimensional quantity, while ratings are. One should thus not overestimate the significance of ratings. In particular not with respect to their predictive power of the outcome of a match between individual players. The best one could hope for it to get a good predictive power for the case where one player plays a number of other players, sufficiently dispersed over the space of Chess skills to be considered a representative sample of a certain Elo range. This avoids rock-scissors-paper-like paradoxes, as there will always be similar numbers of rocks and paper to cancel out each other's anomalous scores when meeting a scissor.

I trhink existing rating systems are well equiped to deal with this. The main open questions are what rating model to use with regard to the probability distribution function of results, and the necessity to parametrize this distribution by more than one paramenter (the rating), e.g. if the width of the distribution should be considered a player-dependent quantity as well. (For computer Chess it seems that this could be very useful, as some engines definitely have a much larger capacity for producing 'surprises' than others.) Another question is how to properly accout for statistical flukes under conditions of minimal data (e.g. only a dozen or so games of each player, but very many players).
guyhaw

Re: A model of inconsistent pair comparisons

Post by guyhaw »

I recommend John Beasley's book 'The Mathematics of Games'. There, he gives a critical review of the concept of rating, ELO etc.
He gives a scenario involving a circular running track (the usual kind) where, quite legitimately, A beats B, B beats C and C beats A ... a counterexample to the assumption that players can be ordered in a linear spectrum.
He also shows that, to some degree, rating systems are self-fulfilling prophecies.
The ELO system was an honest effort at the time, with at least one 'finger in the air' choice of parameter-value.
There are better models now, e.g. Whole-History Rating, which do 'proper' Bayesian inference using a model of how actual rating diffuses over a period of inactivity - could be better, could be worse but is clearly not so certainly the nominal value as time passes.
See http://remi.coulom.free.fr/WHR/ re WHR.