Ordo vs. Bayeselo

Michel · Post by **Michel** » Mon Oct 01, 2012 9:33 am

This discusion throws a lot of unrelated things together. Let me summarize what is relevant.

Statement of facts

(1) Adam's experiments show that both Ordo and BayesElo are good score predictors. This means that their underlying models (versions of the "elo model") are somewhat sound.

(2) So in statistics terminology both Ordo and BayesElo produce estimators for elolike ratings (which by (1) we can assume to exist).

(3) The only way to say which estimator is better is to compare their variances. This has not been done so the discussion in this long thread is actually void.

(4) BayesElo uses maximum likelihood estimation (MLE) which produces theoretically the most efficient estimators, if the model used is correct. Since the true model is unknown this advantage of BayesElo may be void.

Conclusion: Unless somebody comes up with more data with regard to (3) there is nothing more to be said.

Laskos · Post by **Laskos** » Mon Oct 01, 2012 11:41 am

Michel wrote:
I have raised a legitimate concern I have with Bayeselo,
Adam:

I hope you have read my earlier mail on this, but let me reiterate.

Your concern is not valid (and there is no need for a "solution").

The scaled ratings produced by BayesElo do not fit the BayesElo model (which is your concern). This is normal. They are designed to fit the logistic model.

The unscaled ratings do fit the BayesElo model (as you observed in your experiments) but they are inflated with respect to the usual ratings. This is normal too since they measure something different. One could say they are not expressed in elos but in bayeselos. With this terminology the scale parameter converts bayeselos to elos.

I have the same understanding, but if it's all clear to you, could you say what "scale" I have to set in this case:
Ratings
A 2200
B 2400
C 2500
...
to have a prediction that B scores 75% against A? Ordo does this, how to do that with Bayeselo? It seems that neither the default nor "scale=1" gives this prediction.

Daniel Shawul · Post by **Daniel Shawul** » Mon Oct 01, 2012 2:18 pm

Kai, I think you simply don't want to accept the need for scaling. Here is my post trying to convince CCRL to go back to using calculated scale. The difference in magnitude between Elostat and unscaled bayeselo rating could be very big. So you will not be able to compare them without the scale. But using calculated scale , they matched within one elo.

I run the effect of the scale and how it helps for comparison with elostat using scct data. First run is elostat (i.e one within bayeselo). Then it is bayselo with calculated scale (which turned out to be 0.7), and finally the third one with scale = 1 as you use it now. Note that I didn't even need to calculate ratings again because scale is such a 'post processing' parameter, much like offset. The ratings are magnified by 1/0.7=1.4x times.That is a difference of 100 elo will become 140 elo. Clearly list 1 and list 2 are comparable while the third one has magnified values. Provide this example to CCRL team and ask them if that is what they want.. In my opinion it was good before i.e using calculated scale (defalult bayeselo), but changing it to scale=1 has caused problems for no apparent advantage of using it...

Summary:

Exampe comparison: Gull and Vitruvius
Elostat: 55 - 7 = 48 elo
Bayeselo default = 46 -- 3 = 49 elo
Bayeselo (scale = 1) as used right now in CCRL = 67 - -4 = 71 elo

Clearly elostat and bayeselo are comparable ~49elo difference between the two but scale = 1 gives 71 elo. That is 1.4 x 50 = 70 elo as I predicted

Modern Times · Post by **Modern Times** » Mon Oct 01, 2012 2:46 pm

Daniel Shawul wrote:Kai, I think you simply don't want to accept the need for scaling. Here is my post trying to convince CCRL to go back to using calculated scale.

CCRL are dropping "scale 1" and going back to default scaling.

Daniel Shawul · Post by **Daniel Shawul** » Mon Oct 01, 2012 3:53 pm

Modern Times wrote:
Daniel Shawul wrote:Kai, I think you simply don't want to accept the need for scaling. Here is my post trying to convince CCRL to go back to using calculated scale.
CCRL are dropping "scale 1" and going back to default scaling.

Thanks Ray for confirming that.

Laskos · Post by **Laskos** » Mon Oct 01, 2012 4:03 pm

Daniel Shawul wrote:Kai, I think you simply don't want to accept the need for scaling. Here is my post trying to convince CCRL to go back to using calculated scale. The difference in magnitude between Elostat and unscaled bayeselo rating could be very big. So you will not be able to compare them without the scale. But using calculated scale , they matched within one elo.
I run the effect of the scale and how it helps for comparison with elostat using scct data. First run is elostat (i.e one within bayeselo). Then it is bayselo with calculated scale (which turned out to be 0.7), and finally the third one with scale = 1 as you use it now. Note that I didn't even need to calculate ratings again because scale is such a 'post processing' parameter, much like offset. The ratings are magnified by 1/0.7=1.4x times.That is a difference of 100 elo will become 140 elo. Clearly list 1 and list 2 are comparable while the third one has magnified values. Provide this example to CCRL team and ask them if that is what they want.. In my opinion it was good before i.e using calculated scale (defalult bayeselo), but changing it to scale=1 has caused problems for no apparent advantage of using it...

Summary:

Exampe comparison: Gull and Vitruvius
Elostat: 55 - 7 = 48 elo
Bayeselo default = 46 -- 3 = 49 elo
Bayeselo (scale = 1) as used right now in CCRL = 67 - -4 = 71 elo

Clearly elostat and bayeselo are comparable ~49elo difference between the two but scale = 1 gives 71 elo. That is 1.4 x 50 = 70 elo as I predicted

Daniel, I don't care about EloStat, and I know EloStat gives wrong predictions. I don't want Bayeselo to adjust its scale to be in line with EloStat, but as Remi said it here and wrote on his site

2005.12.18:
New scale command to scale ratings. By default, maximum-likelihood ratings are now scaled down so that they look more like Elostat/SSDF ratings.

There must be a scale for which Bayeselo gives correct predictions according to the usual 400 logistic. Neither the default nor "scale=1" (if Larry is correct) seem to work.

Kai

Michel · Post by **Michel** » Mon Oct 01, 2012 4:12 pm

Neither the default nor "scale=1" (if Larry is correct) seem to work.

Kai: if you reread the thread then you will see that no evidence has been produced that the default scale does not work.

Please stop stating things which have no basis in facts.

Laskos · Post by **Laskos** » Mon Oct 01, 2012 4:15 pm

Michel wrote:
Neither the default nor "scale=1" (if Larry is correct) seem to work.
Kai: if you reread the thread then you will see that no evidence has been produced that the default scale does not work.

Please stop stating things which have no basis in facts.

Adam's plots show that. Have you seen them? That was the problem to which Remi answered in that thread.
http://www.talkchess.com/forum/viewtopi ... o+bayeselo

Michel · Post by **Michel** » Mon Oct 01, 2012 4:35 pm

Adam's plots show that. Have you seen them? That was the problem to which Remi answered in that thread.
http://www.talkchess.com/forum/viewtopi ... o+bayeselo

Thanks for the link. But I have already answered Adam. Please reread that. There is no "problem".

Laskos · Post by **Laskos** » Mon Oct 01, 2012 5:01 pm

Michel wrote:
Adam's plots show that. Have you seen them? That was the problem to which Remi answered in that thread.
http://www.talkchess.com/forum/viewtopi ... o+bayeselo
Thanks for the link. But I have already answered Adam. Please reread that. There is no "problem".

I see a problem with the "default" ratings. 200 points difference in Bayeselo default ratings do not predict 75% performance. I am a bit puzzled reading your or Daniel statements that it's irrelevant and that adjusting to the wrong EloStat is more important than to give 400 logistic predictions.

I will state: testing groups using EloStat and "default" Bayeselo (which is adjusted to match EloStat) give compressed ratings by some 10-30% compared to the usual 400 logistic. Ordo gives correct predictions, in accordance with the usual logistic.
Bayeselo can use the "scale" factor to give correct predictions, but people don't know how to use it.

Kai

Ordo vs. Bayeselo

Re: Ordo vs. Bayeselo

Re: Ordo vs. Bayeselo

Re: Ordo vs. Bayeselo

Re: Ordo vs. Bayeselo

Re: Ordo vs. Bayeselo

Re: Ordo vs. Bayeselo

Re: Ordo vs. Bayeselo

Re: Ordo vs. Bayeselo

Re: Ordo vs. Bayeselo

Re: Ordo vs. Bayeselo