Ordo vs. Bayeselo

Michel · Post by **Michel** » Tue Oct 02, 2012 7:40 am

Did you take 75% or 200 points as the starter?

Read what I wrote. 200 in scaled BayesElo with drawelo=200 is 210 logistic (both with denomintor 400) (or perhaps 212 as you write).

but Adam shows that fitting with a significantly smaller value than 400 logistic fits better Bayeselo predictions,

No his tests did not show that. He made an (understandable) mistake in the interpretation of his result, which was pointed out to him by Remy. Read my reply to him.

Using scale=1 would create major systematic distortions for small elo differences.

Laskos · Post by **Laskos** » Tue Oct 02, 2012 10:13 am

Michel wrote:
Did you take 75% or 200 points as the starter?
Read what I wrote. 200 in scaled BayesElo with drawelo=200 is 210 logistic (both with denomintor 400) (or perhaps 212 as you write).

but Adam shows that fitting with a significantly smaller value than 400 logistic fits better Bayeselo predictions,
No his tests did not show that. He made an (understandable) mistake in the interpretation of his result, which was pointed out to him by Remy. Read my reply to him.

Using scale=1 would create major systematic distortions for small elo differences.

About your reply to Adam, the scaled ratings are designed to fit the logistic model, as you said, and the model fits the usual logistic in the origin (derivative), but away from the origin the fit is not very good.

What Adam's error did Remi correct? Remi revealed us the "scale" parameter and then wrote a bit about it.

Remi:
It is a bit more complicated. In bayeselo, the ratings are "scaled" to match the derivative of the usual Elo formula at Elo = 0. Maybe that explains the "compression" you noticed.

You can get the scaling with the "scale" command in the elo interface. This is how it is computed:

I am not saying that setting scale=1 would resolve things, but the default compresses ratings on larges ranges (as CCRL) by some 10% (maybe even more). Sure, for testing 3 points difference this is irrelevant, and the ratings preserve ordering, besides that, close to the origin the fit to the usual logistic is good. But if one wants to know the score prediction of Crafty against Houdini from CCRL ratings, Bayeselo default prediction is pretty badly off, say instead of 6% it predicts 9%, if transforming Bayeselo default points with 400 logistic. Ordo does not suffer from this compression.

Kai

lucasart · Post by **lucasart** » Tue Oct 02, 2012 3:17 pm

Modern Times wrote:
Daniel Shawul wrote:
Modern Times wrote:
Daniel Shawul wrote:Kai, I think you simply don't want to accept the need for scaling. Here is my post trying to convince CCRL to go back to using calculated scale.
CCRL are dropping "scale 1" and going back to default scaling.
Thanks Ray for confirming that.

God knows if it is the right thing to do or not. I'm not qualified to judge myself, and even the experts don't agree.

The experts DO AGREE !

The problem is that such threads are not 100% between experts but spammers and FUD spreaders keep writing here... And then more and more people start to think of the spammers as experts and get confused.

My recommendation is to only consider the following people to be experts:
- Remi Coulom
- Daniel Shawul
- Michel Van Der Bergh

Laskos · Post by **Laskos** » Tue Oct 02, 2012 4:47 pm

To me now it's pretty clear that Bayeselo model cannot fit the usual 400 logistic no matter what "scale" is used. The fit for small rating differences is achieved by the "default", and not scale=1, as the "default" fits the derivative of the ususal logistic in 0. In fact, now I think that "default" should be used by rating lists, but keeping in mind that it compresses ratings for large rating differences. I plotted the "default" and the true logistic on logarithmic scale to see the tails, with drawelo=200:

You were right, 200 on the Bayeselo curve is 212 on the logistic, a compression of 6%.

Here are the Bayeselo "default" compressions for several values of rating differences

100 Elo points: 2%
200 points: 6%
400 points: 15%
800 points: 25%

Therefore, if one doesn't use Ordo, then use Bayeselo "default", it is good for small rating differences, but keep in mind that on large scales Bayeselo compresses ratings by 5-20%. Scale=1 distorts the ratings for small rating differences, giving Bayeselos instead of Elos. I used a large drawelo of 200, smaller drawelo gives smaller compression.

Kai

Michel · Post by **Michel** » Tue Oct 02, 2012 7:41 pm

You are still sketching an overly pessimistic picture, first by making only a graph for a very large drawelo and secondly by not taking into account the fact that large elo differences are obtained by combinining many small elo differences.

To stay with your example. If you measure an elo difference of 800 points by seven intermediate engines each one a 100 elo apart from the next one then the difference between logistic and scaled BayesElo will be only 2% and not 25%.

So in other words the 25% elo difference for 800 elo would only be in the _absence of any other information_, which is completely unrealistic since I strongly doubt you can actually measure such a large elo difference directly.

Ordo vs. Bayeselo

Re: Ordo vs. Bayeselo

Re: Ordo vs. Bayeselo

Re: Ordo vs. Bayeselo

Re: Ordo vs. Bayeselo

Re: Ordo vs. Bayeselo