This discusion throws a lot of unrelated things together. Let me summarize what is relevant.
Statement of facts
(1) Adam's experiments show that both Ordo and BayesElo are good score predictors. This means that their underlying models (versions of the "elo model") are somewhat sound.
(2) So in statistics terminology both Ordo and BayesElo produce estimators for elolike ratings (which by (1) we can assume to exist).
(3) The only way to say which estimator is better is to compare their variances. This has not been done so the discussion in this long thread is actually void.
(4) BayesElo uses maximum likelihood estimation (MLE) which produces theoretically the most efficient estimators, if the model used is correct. Since the true model is unknown this advantage of BayesElo may be void.
Conclusion: Unless somebody comes up with more data with regard to (3) there is nothing more to be said.
Ordo vs. Bayeselo
Moderator: Ras
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Ordo vs. Bayeselo
I have the same understanding, but if it's all clear to you, could you say what "scale" I have to set in this case:Michel wrote:Adam:I have raised a legitimate concern I have with Bayeselo,
I hope you have read my earlier mail on this, but let me reiterate.
Your concern is not valid (and there is no need for a "solution").
The scaled ratings produced by BayesElo do not fit the BayesElo model (which is your concern). This is normal. They are designed to fit the logistic model.
The unscaled ratings do fit the BayesElo model (as you observed in your experiments) but they are inflated with respect to the usual ratings. This is normal too since they measure something different. One could say they are not expressed in elos but in bayeselos. With this terminology the scale parameter converts bayeselos to elos.
Ratings
A 2200
B 2400
C 2500
...
to have a prediction that B scores 75% against A? Ordo does this, how to do that with Bayeselo? It seems that neither the default nor "scale=1" gives this prediction.
-
- Posts: 4186
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Ordo vs. Bayeselo
Kai, I think you simply don't want to accept the need for scaling. Here is my post trying to convince CCRL to go back to using calculated scale. The difference in magnitude between Elostat and unscaled bayeselo rating could be very big. So you will not be able to compare them without the scale. But using calculated scale , they matched within one elo.
I run the effect of the scale and how it helps for comparison with elostat using scct data. First run is elostat (i.e one within bayeselo). Then it is bayselo with calculated scale (which turned out to be 0.7), and finally the third one with scale = 1 as you use it now. Note that I didn't even need to calculate ratings again because scale is such a 'post processing' parameter, much like offset. The ratings are magnified by 1/0.7=1.4x times.That is a difference of 100 elo will become 140 elo. Clearly list 1 and list 2 are comparable while the third one has magnified values. Provide this example to CCRL team and ask them if that is what they want.. In my opinion it was good before i.e using calculated scale (defalult bayeselo), but changing it to scale=1 has caused problems for no apparent advantage of using it...
Summary:
Exampe comparison: Gull and Vitruvius
Elostat: 55 - 7 = 48 elo
Bayeselo default = 46 -- 3 = 49 elo
Bayeselo (scale = 1) as used right now in CCRL = 67 - -4 = 71 elo
Clearly elostat and bayeselo are comparable ~49elo difference between the two but scale = 1 gives 71 elo. That is 1.4 x 50 = 70 elo as I predicted
-
- Posts: 3707
- Joined: Thu Jun 07, 2012 11:02 pm
Re: Ordo vs. Bayeselo
CCRL are dropping "scale 1" and going back to default scaling.Daniel Shawul wrote:Kai, I think you simply don't want to accept the need for scaling. Here is my post trying to convince CCRL to go back to using calculated scale.
-
- Posts: 4186
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Ordo vs. Bayeselo
Thanks Ray for confirming that.Modern Times wrote:CCRL are dropping "scale 1" and going back to default scaling.Daniel Shawul wrote:Kai, I think you simply don't want to accept the need for scaling. Here is my post trying to convince CCRL to go back to using calculated scale.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Ordo vs. Bayeselo
Daniel, I don't care about EloStat, and I know EloStat gives wrong predictions. I don't want Bayeselo to adjust its scale to be in line with EloStat, but as Remi said it here and wrote on his siteDaniel Shawul wrote:Kai, I think you simply don't want to accept the need for scaling. Here is my post trying to convince CCRL to go back to using calculated scale. The difference in magnitude between Elostat and unscaled bayeselo rating could be very big. So you will not be able to compare them without the scale. But using calculated scale , they matched within one elo.I run the effect of the scale and how it helps for comparison with elostat using scct data. First run is elostat (i.e one within bayeselo). Then it is bayselo with calculated scale (which turned out to be 0.7), and finally the third one with scale = 1 as you use it now. Note that I didn't even need to calculate ratings again because scale is such a 'post processing' parameter, much like offset. The ratings are magnified by 1/0.7=1.4x times.That is a difference of 100 elo will become 140 elo. Clearly list 1 and list 2 are comparable while the third one has magnified values. Provide this example to CCRL team and ask them if that is what they want.. In my opinion it was good before i.e using calculated scale (defalult bayeselo), but changing it to scale=1 has caused problems for no apparent advantage of using it...
Summary:
Exampe comparison: Gull and Vitruvius
Elostat: 55 - 7 = 48 elo
Bayeselo default = 46 -- 3 = 49 elo
Bayeselo (scale = 1) as used right now in CCRL = 67 - -4 = 71 elo
Clearly elostat and bayeselo are comparable ~49elo difference between the two but scale = 1 gives 71 elo. That is 1.4 x 50 = 70 elo as I predicted
2005.12.18:
New scale command to scale ratings. By default, maximum-likelihood ratings are now scaled down so that they look more like Elostat/SSDF ratings.
There must be a scale for which Bayeselo gives correct predictions according to the usual 400 logistic. Neither the default nor "scale=1" (if Larry is correct) seem to work.
Kai
-
- Posts: 2292
- Joined: Mon Sep 29, 2008 1:50 am
Re: Ordo vs. Bayeselo
Kai: if you reread the thread then you will see that no evidence has been produced that the default scale does not work.Neither the default nor "scale=1" (if Larry is correct) seem to work.
Please stop stating things which have no basis in facts.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Ordo vs. Bayeselo
Adam's plots show that. Have you seen them? That was the problem to which Remi answered in that thread.Michel wrote:Kai: if you reread the thread then you will see that no evidence has been produced that the default scale does not work.Neither the default nor "scale=1" (if Larry is correct) seem to work.
Please stop stating things which have no basis in facts.
http://www.talkchess.com/forum/viewtopi ... o+bayeselo
-
- Posts: 2292
- Joined: Mon Sep 29, 2008 1:50 am
Re: Ordo vs. Bayeselo
Thanks for the link. But I have already answered Adam. Please reread that. There is no "problem".Adam's plots show that. Have you seen them? That was the problem to which Remi answered in that thread.
http://www.talkchess.com/forum/viewtopi ... o+bayeselo
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Ordo vs. Bayeselo
I see a problem with the "default" ratings. 200 points difference in Bayeselo default ratings do not predict 75% performance. I am a bit puzzled reading your or Daniel statements that it's irrelevant and that adjusting to the wrong EloStat is more important than to give 400 logistic predictions.Michel wrote:Thanks for the link. But I have already answered Adam. Please reread that. There is no "problem".Adam's plots show that. Have you seen them? That was the problem to which Remi answered in that thread.
http://www.talkchess.com/forum/viewtopi ... o+bayeselo
I will state: testing groups using EloStat and "default" Bayeselo (which is adjusted to match EloStat) give compressed ratings by some 10-30% compared to the usual 400 logistic. Ordo gives correct predictions, in accordance with the usual logistic.
Bayeselo can use the "scale" factor to give correct predictions, but people don't know how to use it.
Kai