Ordo release (rating software, ELO-like)

michiguel · Post by **michiguel** » Thu Jan 12, 2012 5:10 am

michiguel wrote:
Rémi Coulom wrote:I see, thanks. It is like bayeselo, except there is no prior, no confidence intervals, no LOS, and the model is Gaussian instead of logistic.

Rémi
Hi Remi,

It is logistic. The procedure has evolved since the beginning, when I used a Gaussian (there are other things that I do not need to do for engines, like giving weights for "newer" games etc.Home field advantage, or neutral). In fact, this is a simpler rewrite in C of the original Pascal code. I still do not have White-Black bonus calculation, for instance.

I just gave the volleyball link for historical purpose and out of laziness. I will try to give more details soon. The fact that is logistic is a "mathematical convergence" of my "theoretical" frame work. I did not pick the equation to mimic a Gaussian as I recently mention days ago here. As Kai told me, it may look very similar to BayesELO mathematically.

Miguel

Let’s imagine that two players differ in strength, and that this strength could be represented as “levels of energy”. The stronger the player, the lower (more stable) level of energy represents this player. Let’s suppose that these two levels of “energy” are fighting for an element (a win). The lower the level, there are more chances to “attract” that element. For instance, a valley is more likely to attract water than a mountain top. In chemistry, a particle or a molecule that could be in two different energetic levels could be predicted to be in one or the other with a certain probability.

Code: Select all

Level A ----------- (weaker player)
           /\
           || Delta Energy = Ea - Eb
           \/ 
Level B ----------- (stronger player)

If Na = population of particles in Level A, and Nb = population of particles in Level B, the distribution of Boltzmann describes the relative probabilities between states A and B as

Code: Select all

Na/Nb = exp(-beta*(Ea-Eb))

Beta is a constant of the system (that depends on the temperature, which is noise in our case, but I have messed with it).
We can treat the probabilities of a particle to be in A or B, as analogous of a win to fall in A or B.
Therefore, after reordering the equation, the fraction of wins (f) of player B out of the total played (Nb/(Na+Nb)) will be

Code: Select all

f  = 1 / (1 + exp(-beta*(Ea-Eb)))

if we define strength “S” as the negative value of “energy”, then, Sa = -Ea, and

Code: Select all

f  = 1 / (1 + exp(-beta*(Sb-Sa)))

This equation has the same form as the logistic equation.
With this equation we can calculate the predicted fraction of wins between two players if the model is appropriate. The predicted performance Px, or number of wins of player X among a pool of other players will be the summation of each of the predicted fractions“f” for each game (similar to the number of particles or molecules that will be in state “X” in presence of many other energy levels or states).

Code: Select all

Px = fa + fb + fc … + fn

or

Code: Select all

Px = 1 / (1 + exp(-beta*(Sx-Sa))) + 1 / (1 + exp(-beta*(Sx-Sb))) + 1 / (1 + exp(-beta*(Sx-Sc))) … etc.

The most likely distribution of strengths “S” for each player will be the one that satisfies the condition that each predicted performance Px equals the actual performance of the player X (i.e. the actual number of wins). So, if we find the appropriate numbers for Sx to satisfiy all the Px equations, we are done. That is the tricky part, because the system does not always converge nicely trying to find this optimal solution. The way it is fitted is in discrete steps (sort of hill climbing), and making those steps smaller and smaller once the convergence was reached. I found this procedure to be the safest and faster after many different tests. The volleyball ranking were really challenging because there were not many games (~30 for each team) and the number of teams were ~300.

Beta was chosen to make sure that the scale looks like ELO, for the people that are accustomed to that type of scale (chess players). When the difference of ELO is 200 points, IIRC, I chose that in my scale 200 points give the same probability of win.

Miguel
PS: I never read what other rating programs do and I apologize for it. I really wanted to learn what BayesELO do but I never got the time to read about it.

Michel · Post by **Michel** » Thu Jan 12, 2012 1:48 pm

PS: I never read what other rating programs do and I apologize for it. I really wanted to learn what BayesELO do but I never got the time to read about it.

BayesElo uses a straightforward maximum likelihood approach.

In essence you compute the probability of the observation in terms of the parameters of the model (elo's in our case) and then you choose the parameters in such a way that this probabilty is maximal (BayesElo actually uses a Bayesian variant which gives a bit more flexibility in the sense that it allows for a prior, but that is not really important).

The big advantage of this is that there is a theoretical underpinning. Maximum likelihood
estimators are asymptotically optimal.

The distributions used in statistical mechanics (Boltzman, Fermi, Bose Einstein) are also obtained by the maximum likelihood principle.

Of course the theoretical results assume that the Elo model is correct, which of course isn't.

Rémi Coulom · Post by **Rémi Coulom** » Thu Jan 12, 2012 3:00 pm

Michel wrote:
PS: I never read what other rating programs do and I apologize for it. I really wanted to learn what BayesELO do but I never got the time to read about it.
BayesElo uses a straightforward maximum likelihood approach.

In essence you compute the probability of the observation in terms of the parameters of the model (elo's in our case) and then you choose the parameters in such a way that this probabilty is maximal (BayesElo actually uses a Bayesian variant which gives a bit more flexibility in the sense that it allows for a prior, but that is not really important).

The big advantage of this is that there is a theoretical underpinning. Maximum likelihood
estimators are asymptotically optimal.

The distributions used in statistical mechanics (Boltzman, Fermi, Bose Einstein) are also obtained by the maximum likelihood principle.

Of course the theoretical results assume that the Elo model is correct, which of course isn't.

In the case of the logistic model with win/loss outcome, it turns out that Miguel's method is formally identical to maximum likelihood.

Othello people use this method, and call it "Jech". A description in French is there:
http://www.ffothello.org/classement/jech.php

Rémi

hgm · Post by **hgm** » Thu Jan 12, 2012 4:17 pm

Rémi Coulom wrote:In the case of the logistic model with win/loss outcome, it turns out that Miguel's method is formally identical to maximum likelihood.

Yet another amazing property of the logistic!

Michel · Post by **Michel** » Fri Jan 13, 2012 12:33 pm

Rémi Coulom wrote: In the case of the logistic model with win/loss outcome, it turns out that Miguel's method is formally identical to maximum likelihood.

Rémi

Seems correct. Thanks for pointing that out.

Vinvin · Post by **Vinvin** » Fri Jan 13, 2012 3:21 pm

I tested with a merge version of CCRL, CEGT and WBEC (I'll produce a list next week with bayeselo.exe) : a little bit more than 1 Millions games and 3500 different players.

The program crashed ... is it because numbers are too big ?

michiguel wrote:https://sites.google.com/site/gaviotachessengine/ordo

Based on a recent discussion on the IPON rankings, I decided to clean up the command line interface and release it. It may be an alternative to BayesELO and ELOSTAT.

Miguel

michiguel · Post by **michiguel** » Fri Jan 13, 2012 7:15 pm

Vinvin wrote:I tested with a merge version of CCRL, CEGT and WBEC (I'll produce a list next week with bayeselo.exe) : a little bit more than 1 Millions games and 3500 different players.

The program crashed ... is it because numbers are too big ?

michiguel wrote:https://sites.google.com/site/gaviotachessengine/ordo

Based on a recent discussion on the IPON rankings, I decided to clean up the command line interface and release it. It may be an alternative to BayesELO and ELOSTAT.

Miguel

It could certainly be the problem. I hardwired a limit, but the program should warn about it. Whatever it is, it is my fault. I am sorry, I rushed to get this out, otherwise it would not have happened. I will check and fix it tonight. Crashing is unacceptable.

Miguel

michiguel · Post by **michiguel** » Sat Jan 14, 2012 3:53 am

michiguel wrote:
Vinvin wrote:I tested with a merge version of CCRL, CEGT and WBEC (I'll produce a list next week with bayeselo.exe) : a little bit more than 1 Millions games and 3500 different players.

The program crashed ... is it because numbers are too big ?

michiguel wrote:https://sites.google.com/site/gaviotachessengine/ordo

Based on a recent discussion on the IPON rankings, I decided to clean up the command line interface and release it. It may be an alternative to BayesELO and ELOSTAT.

Miguel
It could certainly be the problem. I hardwired a limit, but the program should warn about it. Whatever it is, it is my fault. I am sorry, I rushed to get this out, otherwise it would not have happened. I will check and fix it tonight. Crashing is unacceptable.

Miguel

I fixed the problem (for now).
https://sites.google.com/site/gaviotachessengine/ordo
3 million games are accepted. I will later allow a dynamic allocation, so the limit will be the memory of the computer.

Miguel

jshriver · Post by **jshriver** » Sat Jan 14, 2012 4:26 am

michiguel wrote: It may be an alternative to BayesELO and ELOSTAT.
Miguel

Sounds good, hope all is well. Thanks for the release!

Vinvin · Post by **Vinvin** » Sun Jan 15, 2012 4:12 am

michiguel wrote:
michiguel wrote:
Vinvin wrote:I tested with a merge version of CCRL, CEGT and WBEC (I'll produce a list next week with bayeselo.exe) : a little bit more than 1 Millions games and 3500 different players.

The program crashed ... is it because numbers are too big ?

michiguel wrote:https://sites.google.com/site/gaviotachessengine/ordo

Based on a recent discussion on the IPON rankings, I decided to clean up the command line interface and release it. It may be an alternative to BayesELO and ELOSTAT.

Miguel
It could certainly be the problem. I hardwired a limit, but the program should warn about it. Whatever it is, it is my fault. I am sorry, I rushed to get this out, otherwise it would not have happened. I will check and fix it tonight. Crashing is unacceptable.

Miguel
I fixed the problem (for now).
https://sites.google.com/site/gaviotachessengine/ordo
3 million games are accepted. I will later allow a dynamic allocation, so the limit will be the memory of the computer.

Miguel

Thanks, no more crash

The run last more than 1 hour (I don't know exactly because I let the PC running and I leave).

There's smthg strange at the top of the list ...

Code: Select all

                        ENGINE:  RATING    POINTS  PLAYED    (%)
      Houdini 1.5a 64-bit 4CPU:  3317.0    2396.0    3336   71.8%
            King Of Kings 1.95:  3317.0      28.0      28   100.0%
       Houdini 2.0 64-bit 4CPU:  3317.0    1233.0    1668   73.9%
      Houdini 1.5a 64-bit 6CPU:  3317.0     438.0     600   73.0%
    Deep Rybka 4.1 64-bit 4CPU:  3291.5     646.5    1012   63.9%
...

Why 4 engines rated at 3317 ?

Ordo release (rating software, ELO-like)

Re: Ordo release (rating software, ELO-like).

Re: Ordo release (rating software, ELO-like).

Re: Ordo release (rating software, ELO-like).

Re: Ordo release (rating software, ELO-like).

Re: Ordo release (rating software, ELO-like).

Re: Ordo release (rating software, ELO-like)

Re: Ordo release (rating software, ELO-like)

Re: Ordo release (rating software, ELO-like)

Re: Ordo release (rating software, ELO-like)

Re: Ordo release (rating software, ELO-like)