SCCT Rating List - Calculation by EloStat 1.3

Daniel Shawul · Post by **Daniel Shawul** » Tue Aug 28, 2012 2:05 pm

Hello Adam
I didn't know you have that problem with the complete/pure list too. Well in that case I think using scale becomes even more necessary. For other models where the formula don't give you elos close to arpad's assumption scaling would be even more appropriate. Also Remi prefered use of scaling (made it default) so I would think using it would be safer. It doesn't change anything else other than the magnitude of the elo differences.
Danile

Laskos · Post by **Laskos** » Tue Aug 28, 2012 2:56 pm

Daniel Shawul wrote:Hello Adam
I didn't know you have that problem with the complete/pure list too. Well in that case I think using scale becomes even more necessary. For other models where the formula don't give you elos close to arpad's assumption scaling would be even more appropriate. Also Remi prefered use of scaling (made it default) so I would think using it would be safer. It doesn't change anything else other than the magnitude of the elo differences.
Danile

I don't understand how it doesn't change anything else. If 200-75% is not preserved, what is the meaning of those numbers given as rating? Even the transitivity is not preserved using arbitary scaling. I understand by Elo rating that if I pick from the rating list an engine rated 2900 and another rated 2700, then the prediction is that the engine rated 2900 will score 75% in a match against the engine rated 2700. Bayeselo default fails in its prediction or maybe there are secret tables to derive the predictions of which I am not aware. So, tell me, with default Bayeselo, what is the prediction in % for those 2900 and 2700 rated (by Bayeselo default) engines in a match?

Kai

Modern Times · Post by **Modern Times** » Tue Aug 28, 2012 2:58 pm

Daniel Shawul wrote:Hello Adam
I didn't know you have that problem with the complete/pure list too. Well in that case I think using scale becomes even more necessary.

Do you mean using "scale 1" is necessary, or not using it ?

Daniel Shawul · Post by **Daniel Shawul** » Tue Aug 28, 2012 3:14 pm

Modern Times wrote:
Daniel Shawul wrote:Hello Adam
I didn't know you have that problem with the complete/pure list too. Well in that case I think using scale becomes even more necessary.
Do you mean using "scale 1" is necessary, or not using it ?

Hello Ray
I meant not using it. By default it calculates its own scale say 0.834 and applies it to get rating differences similar to elo stat's. If you use scale = 1 it means the ratings are displayed as they are calculated in which case you could get magnified values sometimes.
Daniel

Modern Times · Post by **Modern Times** » Tue Aug 28, 2012 3:22 pm

Thanks. Adam knows better, but I think it is "mm 1 1" that causes the differences between our pure and compete databases, not scale 1.

Daniel Shawul · Post by **Daniel Shawul** » Tue Aug 28, 2012 3:29 pm

Laskos wrote:
Daniel Shawul wrote:Hello Adam
I didn't know you have that problem with the complete/pure list too. Well in that case I think using scale becomes even more necessary. For other models where the formula don't give you elos close to arpad's assumption scaling would be even more appropriate. Also Remi prefered use of scaling (made it default) so I would think using it would be safer. It doesn't change anything else other than the magnitude of the elo differences.
Danile
I don't understand how it doesn't change anything else. If 200-75% is not preserved, what is the meaning of those numbers given as rating?

When you use scale = 1, you are not preserving the 200-75% assumption.
Bayeselo has eloDraw and eloAdvantage that you need to add to the eloDelta to see the 200-75% assumption. But people want to see that with just eloDelta (i.e when comparing two engines). That is why an arbitrary factor was needed to be applied but it doesn't need to be. It doesn't change anything else in the sense that the order and relative elo differences are preserved.

Code: Select all

Rating  = scale * Original values + offset

Even the transitivity is not preserved using arbitary scaling. I understand by Elo rating that if I pick from the rating list an engine rated 2900 and another rated 2700, then the prediction is that the engine rated 2900 will score 75% in a match against the engine rated 2700. Bayeselo default fails in its prediction or maybe there are secret tables to derive the predictions of which I am not aware. So, tell me, with default Bayeselo, what is the prediction in % for those 2900 and 2700 rated (by Bayeselo default) engines in a match?
Kai

As I explained above bayeselo uses logistic by adding two more parameters

Code: Select all

logistic(-eloDelta - eloAdvantage + eloDraw);

So you need to add and decreas eloAdvantage and eloDraw to the differece to see the 200-75%. With another draw model that uses logistic like this

Code: Select all

double f = thetaD * sqrt(logistic(eloDelta + eloAdvantage) * logistic(-eloDelta - eloAdvantage));
return logistic(eloDelta + eloHome) / (1 + f);

,the default values you get may be even more magnified. But it is better to ask Remi ,this is just my opinion from making elo ratings on ccrl 40/40 with this model. Remi had a reason to apply the scaling by default from what I read.

Daniel

Daniel Shawul · Post by **Daniel Shawul** » Tue Aug 28, 2012 3:34 pm

Modern Times wrote:Thanks. Adam knows better, but I think it is "mm 1 1" that causes the differences between our pure and compete databases, not scale 1.

I don't think so. That was what was originally suspected for the 'compression' effect but after remi pointed out use of scaling parameter the problem went away. So ccrl decided to use scale = 1 to avoid compression but I am not sure it is the right thing to do if you want to compare to other rating lists.

Adam Hair · Post by **Adam Hair** » Tue Aug 28, 2012 4:01 pm

'mm 1 1' does cause a difference between the complete list and the pure list ratings.. However, 'scale 1' is the cause of the large difference. The default method of computing the scale modulates those differences.

Adam Hair · Post by **Adam Hair** » Tue Aug 28, 2012 4:17 pm

Daniel Shawul wrote:
Modern Times wrote:Thanks. Adam knows better, but I think it is "mm 1 1" that causes the differences between our pure and compete databases, not scale 1.
I don't think so. That was what was originally suspected for the 'compression' effect but after remi pointed out use of scaling parameter the problem went away. So ccrl decided to use scale = 1 to avoid compression but I am not sure it is the right thing to do if you want to compare to other rating lists.

Hi Daniel,

I regret recommending that we use 'scale 1'. The great thing about the different lists is to be able to compare the results. No two lists would be comparable if 'scale 1' is used, unless their draw rates were close to being the same.

Daniel Shawul · Post by **Daniel Shawul** » Tue Aug 28, 2012 4:46 pm

Adam Hair wrote:
Daniel Shawul wrote:
Modern Times wrote:Thanks. Adam knows better, but I think it is "mm 1 1" that causes the differences between our pure and compete databases, not scale 1.
I don't think so. That was what was originally suspected for the 'compression' effect but after remi pointed out use of scaling parameter the problem went away. So ccrl decided to use scale = 1 to avoid compression but I am not sure it is the right thing to do if you want to compare to other rating lists.
Hi Daniel,

I regret recommending that we use 'scale 1'. The great thing about the different lists is to be able to compare the results. No two lists would be comparable if 'scale 1' is used, unless their draw rates were close to being the same.

Hi Adam
No need to regret , infact your tests motivated good discussions and also led me to look in rating stuff more. I would not have brought scaling issue , if I wasn't getting very large numbers with a different draw model. I though this is going to be a problem to convince people to use it, and maybe that also happens as it is with the default draw model, which you confirmed by the big difference in range of pure/complete rating list. I am not sure if it would be completely avoidable but it is good to know anyway.
Daniel

SCCT Rating List - Calculation by EloStat 1.3

Re: SCCT Rating List - Calculation by EloStat 1.3

Re: SCCT Rating List - Calculation by EloStat 1.3

Re: SCCT Rating List - Calculation by EloStat 1.3

Re: SCCT Rating List - Calculation by EloStat 1.3

Re: SCCT Rating List - Calculation by EloStat 1.3

Re: SCCT Rating List - Calculation by EloStat 1.3

Re: SCCT Rating List - Calculation by EloStat 1.3

Re: SCCT Rating List - Calculation by EloStat 1.3

Re: SCCT Rating List - Calculation by EloStat 1.3

Re: SCCT Rating List - Calculation by EloStat 1.3