Ordo vs. Bayeselo

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Ordo vs. Bayeselo

Post by Laskos »

Michel wrote:Kay:

The BayesElo score model has a different expectation value from the usual logistic curve in the presence of a non-zero drawelo.

The scale parameter scales the BayesElo score model (the x coordinate) so that it matches the usual logistic curve as closely as possible.
As recently shown neither default scale nor scale=1 matches the usual logistic in many cases (from 10% to 50%). If it assigns Bayeselo peculiar points, I need a dictionary of what these points mean. I know how to invert a logistic, I don't know what Bayeselo points mean and how to invert them.
Michel
Posts: 2292
Joined: Mon Sep 29, 2008 1:50 am

Re: Ordo vs. Bayeselo

Post by Michel »

Actually now I am confused. On my copy of BayesElo (0056) scale=1 seems to be the default.... So what gives...?

SOLVED: Ok it only changes after invoking the mm interface. Which makes sense!
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Ordo vs. Bayeselo

Post by Adam Hair »

Michel wrote:
Adam and others have shown that the predicted ratings in this case obey a different logistic than the assumed one.
You mean using the default scale?

Of course it is not precisely the usual logistic but very close. The usual logistic is a (probably very crude) approximation anyway. I am going out on a limb here but I think the default scale is chosen in such a way that the second(?) derivative at zero of the expected score matches the one predicted by the standard logistic (Daniel: can you confirm this?).
I believe that is correct.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Ordo vs. Bayeselo

Post by Laskos »

Adam Hair wrote:
Michel wrote:
Adam and others have shown that the predicted ratings in this case obey a different logistic than the assumed one.
You mean using the default scale?

Of course it is not precisely the usual logistic but very close. The usual logistic is a (probably very crude) approximation anyway. I am going out on a limb here but I think the default scale is chosen in such a way that the second(?) derivative at zero of the expected score matches the one predicted by the standard logistic (Daniel: can you confirm this?).
I believe that is correct.
Isn't the second derivative in zero equal to 0? It is neither convexe nor concave in 0.

Kai
Last edited by Laskos on Sun Sep 30, 2012 3:39 pm, edited 1 time in total.
Michel
Posts: 2292
Joined: Mon Sep 29, 2008 1:50 am

Re: Ordo vs. Bayeselo

Post by Michel »

Thanks! Actually I made some plots and it matches the first derivative :D (I could have computed it but I was too lazy).

Anyway for drawelo=200 (which is already fairly large but realistic for self play) with the default scale parameter 0.730126 there is no discernable difference between the usual logistic and the model BayesElo uses unless the elo difference is very large (and even then the difference is small).

We must therefore conclude that the people that say otherwise are simply spreading FUD (Fear, Uncertainty and Doubt).
Last edited by Michel on Sun Sep 30, 2012 3:43 pm, edited 1 time in total.
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Ordo vs. Bayeselo

Post by Daniel Shawul »

Michel wrote:Thanks! Actually I made some plots and it matches the first derivative :D
(I could have computed it but I was too lazy).

Anyway for drawelo=200 (which is already fairly large but realistic for self play) with the default scale parameter there is no discernable difference
between the usual logistic and the model BayesElo uses unless the elo difference is very large (and even then the difference is small).

We must therefore conclude that the people that say otherwise are simply spreading FUD (Fear, Uncertainty and Doubt).
Michel, don't be so brutally honest :)
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Ordo vs. Bayeselo

Post by Laskos »

Michel wrote:Thanks! Actually I made some plots and it matches the first derivative :D
(I could have computed it but I was too lazy).

Anyway for drawelo=200 (which is already fairly large but realistic for self play) with the default scale parameter there is no discernable difference
between the usual logistic and the model BayesElo uses unless the elo difference is very large (and even then the difference is small).

We must therefore conclude that the people that say otherwise are simply spreading FUD (Fear, Uncertainty and Doubt).
Come on, you have to take a look at Adam plots. The differences from default are some 20%.
Michel
Posts: 2292
Joined: Mon Sep 29, 2008 1:50 am

Re: Ordo vs. Bayeselo

Post by Michel »

Come on, you have to take a look at Adam plots. The differences from default are some 20%.
Could you then give a reference to these plots?
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Ordo vs. Bayeselo

Post by Adam Hair »

Michel wrote:
No, the problem with Bayeselo is that it does not give the correct predictions or predictions which would obey the logistic model used by Bayeselo.
I think this is precisely what the scale parameter corrects. So one should _not_ set scale=1 but use the default instead (I wonder why people would fiddle with defaults if they don't know what they are doing).
There arose a question about the ratings produced by Bayeselo. I did a simple experiment with Bayeselo, Ordo, and Elostat. I computed the ratings for various databases (IPON, CCRL 40/4, ChessWar) and then plotted White Score % versus Elo difference, found a regression model (that was of the same form as the model for each rating tool) that fit the data, and compared that to the models for each tool. I assumed that the model for Elostat was the logistic model. At that time, that was also the model for Ordo. And, of course, the model equation for Bayeselo is the logistic with a correction for White advantage and drawElo. In each case, I left the denominator in the exponent as the unknown.

In the resulting graphs, it is apparent that Elostat and Ordo were in need of a location parameter (White advantage). Also, the regression model for Elostat was of a smaller scale than its model (the denominator was smaller). According to the output from Elostat, a particular White score was given a smaller ratings difference than the (supposed) Elostat model.

I found that, in general, the ratings produced by Ordo corresponded well with its model when corrected for White advantage.

For Bayeselo, the ratings produced using the default values did not match the Bayeselo model. The ratings difference for a particular White score was smaller than expected. And the main reason was the scale parameter. When scale is set to 1 and the maximum likelihood values for White advantage and drawElo for the given database are used, then the ratings produced correspond very well with the Bayeselo model (with the correct White advantage and drawElo values inserted).

The default method for computing the scale was included because people did not like the scale of the ratings produced by Bayeselo (according to Rémi). This does allow ratings from different databases to be comparable, but it does compress the ratings differences that Bayeselo computes for each database.

The default values for White advantage and drawElo were computed from the WBEC database back in 2005. If the reason for including those parameters into Bayeselo was to use more information to produce more accurate ratings, then it makes sense to use the values computed from the particular database of games whose participants are to be rated.

If I am wrong on any point or am missing the point, by all means let me know. My sole purpose in all of this is to better understand how ratings are computed, to determine the best method for doing so, and to determine what is and what is not comparable (in terms of ratings). I do not care what program is used to do it.
Michel
Posts: 2292
Joined: Mon Sep 29, 2008 1:50 am

Re: Ordo vs. Bayeselo

Post by Michel »

For Bayeselo, the ratings produced using the default values did not match the Bayeselo model. The ratings difference for a particular White score was smaller than expected. And the main reason was the scale parameter. When scale is set to 1 and the maximum likelihood values for White advantage and drawElo for the given database are used, then the ratings produced correspond very well with the Bayeselo model (with the correct White advantage and drawElo values inserted).
I still have difficulty understanding this (but I am trying).

I am ignoring the white/black issue (I am not entirely sure this is legitimate but the focus of this discussion is drawelo).

The unscaled ratings produced by BayesElo should predict correct winning percentages according to its own ("BayesElo") model.

On the other hand the scaled ratings produced by BayesElo should predict correct scores according to the logistic model (the scaling is done to fit the logistic model).

Are you saying that your experiments contradict this? If I read correctly then they don't.
Last edited by Michel on Sun Sep 30, 2012 5:37 pm, edited 1 time in total.