m-ad rating system

wlod · Post by **wlod** » Mon Jul 28, 2008 5:43 am

The m in the name m-ad rating stands for multiplication, and ad stands for addition.

M-ad rating can be applied to any game or even any collection of competitions, between two players, where the result of each game or encounter can be represented by a real number a between 0 and 1. The result of the game is understood to be the result of the first player. The second player result of the same game is 1-a.

Rating may depend on different factors, but the basic m-ad rating function

- Df = Df(A B a)

depends only on the pregame ratings A B of the two players, and on the result of the game a. I will provide an especially suitable, specific basic function, but at this moment let me just insist, that it satisfies the 0-sum axiom, namely:

Df(A B a) + Df(B A 1-a) = 0

Then, if we were satisfied with a crude but (still pretty) good rating system, we would define the post-game ratings as:

- A' := A + Df(A B a)
  B' := B + Df(B A 1-a)

(you may consider the second equality, for B', to be a consequence of the first one, but never mind). We see that the basic m-ad rating satisfies the stability condition:

- A' + B' = A + B

Of course, Df stands for the Difference.

To keep things simple and clear, each new to the rating list player will get the same initial rating, and this initial rating will always and forever be the average rating of the whole given list, i.e. it will be the average mean of all ratings at any time, it will be always the same. Thus a m-ad chess list open for all players may have the initial, i.e. average rating set to, say, 1400 points. Another list, which admits only strong players, may have the initial/average rating set higher, say to 2000 points, just to tell it easily apart from the other list. Or, perhaps, all lists can set their average rating to 1000, and that's it.

I'll write more about the related issues, which are partly social, perhaps on the other forum.

***

When we think about the games between chess players then we feel that some games are more important than others. And for the purpose of rating, the most important games are between the players who are similar with respect to their degree of being in practice, or rusty, or new; and also with respect to their relative playing strength. The games between players of about equal strength should have more weight than between players whose strength is widely apart.

Thus a m-ad rating system for chess players has an auxiliary function

- 0 < wgt(P Q) \< 1[/list
And the more sophisticated difference function is defined as:
- - diff(P Q a) := wgt(P Q) * Df(A B a)
where A B are the pre-game ratings of players P Q, and a is the result of the game. Thus the post-game ratings according to the complete m-ad system are actually as follows:
- - A' := A + diff(P Q game)
    B' := B + diff(Q P 1-a)
Function wgt(P Q) is defined as a product of two functions, namely of the intensity function I(P Q), and of the playing strength function L(P Q).

I need to run now, but I will continue (I hope ),
- Wlod

wlod · Post by **wlod** » Mon Jul 28, 2008 8:36 am

Let me first get out of the way the important issue of the weight of a game:

- wgt(P Q) := I(P Q) * L(P Q)

where I(P Q) reflects the difference of the activity, and L(P Q) the difference of the playing strength of players P and Q:

- 0 < I(P Q) \< 1 and 0 < L(P Q) \< 1

Value 1 of I(P Q) or L(P Q) corresponds to the equal activity (intensity) or strength respectively of players P Q.

Function intensity has two components: the total experience and the recent activities:

- I(P Q) := tot(P Q) + act(P Q)

defined in terms of the number of all rated games all(P) all(Q), and of the recent ones, for the past 120 days: rec(P) rec(Q).

First let me modify all(P) as folows:

- All(P) := min(200, all(P)+8)

Now:

- tot(P Q) := (1/2) * min(All(P)/All(Q), All(Q)/All(P))

Let's also modify rec(P Q):

- Rec(P) := min(30, rec(P)+1)

Now, like earlier, let:

- act(P Q) := (1/2) * min(Rec(P)/Rec(Q), Rec(Q)/Rec(P))

***

The issue of defining L(P Q) is more delicate because it should depend on the meaning of the rating. In the m-ad rating case when ratings of P and Q players are A and B respectively then (statistically speaking) you expect that in a long match of n games the scores of the players will be about

- n*A/(A+B) and n*B/(A+B)

Thus let me define:

- L(P Q) := 1/64 + (63/64)*min(A/B B/A)

REMARK - I am considering here the main (ezponential) m-ad rating function, which is always positive (for every player, at any time).

This concludes the definition of the weight coefficient.

Regards,

Wlod

wlod · Post by **wlod** » Mon Jul 28, 2008 10:40 am

wlod wrote:

Df(A B a) + Df(B A 1-a) = 0

Then, if we were satisfied with a crude but (still pretty) good rating system, we would define the post-game ratings as:

A' := A + Df(A B a)
B' := B + Df(B A 1-a)

[...]We see that the basic m-ad rating satisfies the stability condition:
A' + B' = A + B
***
0 < wgt(P Q) \< 1
And the more sophisticated difference function is defined as:
diff(P Q a) := wgt(P Q) * Df(A B a)

As for the basic difference function Df, here too we have:

- diff(P Q a) + diff(Q P 1-a) = 0

A' := A + diff(P Q a)
B' := B + diff(Q P 1-a)

And once again, the new ratings satisfy the stability condition:

- A' + B' = A + B

Regards,

Wlod

PS. It's such a pity(!) that we cannot edit our posts 15 or more minutes after the original posting.

Vinvin · Post by **Vinvin** » Mon Jul 28, 2008 11:39 am

wlod wrote:PS. It's such a pity(!) that we cannot edit our posts 15 or more minutes after the original posting.

I think the same, it should be somewhere between 3h to 12h ...

Matthias Gemuh · Post by **Matthias Gemuh** » Mon Jul 28, 2008 12:13 pm

Vinvin wrote:
wlod wrote:PS. It's such a pity(!) that we cannot edit our posts 15 or more minutes after the original posting.
I think the same, it should be somewhere between 3h to 12h ...

I am of same opinion.
And it is very bad that other users see a different edited text than the poster.

Matthias.

.

hgm · Post by **hgm** » Mon Jul 28, 2008 12:35 pm

You seem to base your rating system on differential update of pre-existing ratings. For players whose strength can be assumed to vary over time this might be a natural system, as the differential update automatically performs the function of a moving-average filter.

For computer programs such a system is not very attractive, though. Their strength is a time-independent quantity. So what we usually do to determine the rating of computer programs is to feed all the games all programs ever played to a rating program, and let it extract the ratings that best explain the observed results from there.

wlod · Post by **wlod** » Mon Jul 28, 2008 2:08 pm

hgm wrote:You seem to base your rating system on differential update of pre-existing ratings. For players whose strength can be assumed to vary over time this might be a natural system, as the differential update automatically performs the function of a moving-average filter.

For computer programs such a system is not very attractive, though. Their strength is a time-independent quantity. So what we usually do to determine the rating of computer programs is to feed all the games all programs ever played to a rating program, and let it extract the ratings that best explain the observed results from there.

If each program under the consideration played each other a sufficient number of games then we are dealing with a pairwise comparisons situation under exceptionally pure circumstances. If some programs didn't play each other (or very little) then we have an additional challenge of facing an incomplete graph. It should be sufficiently connected to allow for a fully meaningful global rating, but the same is true in the case of people.

The m-ad rating can still be helpful, and possibly optimal, when applied as follows - Simon Read was the first and only to post this idea on rgcm:

Order the games in any way you like. Run the m-ad algoritm on their sequence, then again and again, periodically, but keep lowering properly the dynamic coefficient, so that it will approach zero. Then the ratings will converge to some limit values (they will slowly freeze). The limit values will not depend on the selected permutation (order of the games), nor on the schedule of freezing (of lowering the dynamic constant, within the proper assumptions). I'll try to cover this process in detail, perhaps on the technical forum. Now the mathematical challenge is to get the limit values without the infinite iteration, while in practice finite iterations may numerically approximate the solution quite efficiently anyway (?). In any case, we see that a good rating function can be a general tool of solving the classical problem of the pairwise comparison method of ranking. It motivated me over years, since the beginning (but not enough to overcome my laziness

).

Regards,

Wlod

hgm · Post by **hgm** » Mon Jul 28, 2008 3:58 pm

Yes, one could use such a process to iterate to self-consistency. But this is not really the major stumbling block in obtaining ratings.

The main problem, as you already mention, is the incompleteness of the graph. BayesElo performs very poorly on incomplete graphs if the players are spread out over a very wide Elo range. (E.g. the ChessWar promo division.)

I think that in principle Bayesian estimation is the way to go, but the problem is that BayesElo uses a maximum-likelyhood estimate for the ratings, in stead of calculating the expectation value under the likelyhood function.

Rémi Coulom · Post by **Rémi Coulom** » Mon Jul 28, 2008 10:07 pm

wlod wrote: The m-ad rating can still be helpful, and possibly optimal, when applied as follows - Simon Read was the first and only to post this idea on rgcm:

Order the games in any way you like. Run the m-ad algoritm on their sequence, then again and again, periodically, but keep lowering properly the dynamic coefficient, so that it will approach zero. Then the ratings will converge to some limit values (they will slowly freeze). The limit values will not depend on the selected permutation (order of the games), nor on the schedule of freezing (of lowering the dynamic constant, within the proper assumptions). I'll try to cover this process in detail, perhaps on the technical forum. Now the mathematical challenge is to get the limit values without the infinite iteration, while in practice finite iterations may numerically approximate the solution quite efficiently anyway (?).

Hi Wlod,

This idea was suggested many times before. If you don't know, you could take a look at:

I don't have time to read your post carefully now, but I will do it later.

Rémi

wlod · Post by **wlod** » Tue Jul 29, 2008 1:08 am

A sophisticated rating system may assist players and programs in learning better chess (I'll try to write about it). Thus a discussion of rating can be more than just about rating and ranking. I'd like to continue it. Thank you Rémi and HGM for your comments and links. I'd like to propose to move the discussion portion which goes beyond m-ad system to the General Topics forum, to the "rating system" thread, where the titles of the posts can still reflect a more specific issue at hand. I plan to provide there an inconsistency models. If we got right model then we would be able to make further progress, I believe. Would the same model work for programs and for human players? Some priorities must be different while in principle it should be the same (a muddy answer

).

Please, write in this thread rather than not at all. I just wanted an easier (for me) organization of posts but it's not crucial.

Regards,

Wlod

m-ad rating system

m-ad rating system

The game weight coefficient

Re: m-ad rating system

Re: m-ad rating system

Re: m-ad rating system

Re: m-ad rating system

Re: m-ad rating system

Re: m-ad rating system

Re: m-ad rating system

Re: m-ad rating system