Hello Adam
I didn't know you have that problem with the complete/pure list too. Well in that case I think using scale becomes even more necessary. For other models where the formula don't give you elos close to arpad's assumption scaling would be even more appropriate. Also Remi prefered use of scaling (made it default) so I would think using it would be safer. It doesn't change anything else other than the magnitude of the elo differences.
Danile
SCCT Rating List - Calculation by EloStat 1.3
Moderator: Ras
-
Daniel Shawul
- Posts: 4186
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
-
Laskos
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: SCCT Rating List - Calculation by EloStat 1.3
I don't understand how it doesn't change anything else. If 200-75% is not preserved, what is the meaning of those numbers given as rating? Even the transitivity is not preserved using arbitary scaling. I understand by Elo rating that if I pick from the rating list an engine rated 2900 and another rated 2700, then the prediction is that the engine rated 2900 will score 75% in a match against the engine rated 2700. Bayeselo default fails in its prediction or maybe there are secret tables to derive the predictions of which I am not aware. So, tell me, with default Bayeselo, what is the prediction in % for those 2900 and 2700 rated (by Bayeselo default) engines in a match?Daniel Shawul wrote:Hello Adam
I didn't know you have that problem with the complete/pure list too. Well in that case I think using scale becomes even more necessary. For other models where the formula don't give you elos close to arpad's assumption scaling would be even more appropriate. Also Remi prefered use of scaling (made it default) so I would think using it would be safer. It doesn't change anything else other than the magnitude of the elo differences.
Danile
Kai
-
Modern Times
- Posts: 3782
- Joined: Thu Jun 07, 2012 11:02 pm
Re: SCCT Rating List - Calculation by EloStat 1.3
Do you mean using "scale 1" is necessary, or not using it ?Daniel Shawul wrote:Hello Adam
I didn't know you have that problem with the complete/pure list too. Well in that case I think using scale becomes even more necessary.
-
Daniel Shawul
- Posts: 4186
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: SCCT Rating List - Calculation by EloStat 1.3
Hello RayModern Times wrote:Do you mean using "scale 1" is necessary, or not using it ?Daniel Shawul wrote:Hello Adam
I didn't know you have that problem with the complete/pure list too. Well in that case I think using scale becomes even more necessary.
I meant not using it. By default it calculates its own scale say 0.834 and applies it to get rating differences similar to elo stat's. If you use scale = 1 it means the ratings are displayed as they are calculated in which case you could get magnified values sometimes.
Daniel
-
Modern Times
- Posts: 3782
- Joined: Thu Jun 07, 2012 11:02 pm
Re: SCCT Rating List - Calculation by EloStat 1.3
Thanks. Adam knows better, but I think it is "mm 1 1" that causes the differences between our pure and compete databases, not scale 1.
-
Daniel Shawul
- Posts: 4186
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: SCCT Rating List - Calculation by EloStat 1.3
When you use scale = 1, you are not preserving the 200-75% assumption.Laskos wrote:I don't understand how it doesn't change anything else. If 200-75% is not preserved, what is the meaning of those numbers given as rating?Daniel Shawul wrote:Hello Adam
I didn't know you have that problem with the complete/pure list too. Well in that case I think using scale becomes even more necessary. For other models where the formula don't give you elos close to arpad's assumption scaling would be even more appropriate. Also Remi prefered use of scaling (made it default) so I would think using it would be safer. It doesn't change anything else other than the magnitude of the elo differences.
Danile
Bayeselo has eloDraw and eloAdvantage that you need to add to the eloDelta to see the 200-75% assumption. But people want to see that with just eloDelta (i.e when comparing two engines). That is why an arbitrary factor was needed to be applied but it doesn't need to be. It doesn't change anything else in the sense that the order and relative elo differences are preserved.
Code: Select all
Rating = scale * Original values + offset
As I explained above bayeselo uses logistic by adding two more parametersEven the transitivity is not preserved using arbitary scaling. I understand by Elo rating that if I pick from the rating list an engine rated 2900 and another rated 2700, then the prediction is that the engine rated 2900 will score 75% in a match against the engine rated 2700. Bayeselo default fails in its prediction or maybe there are secret tables to derive the predictions of which I am not aware. So, tell me, with default Bayeselo, what is the prediction in % for those 2900 and 2700 rated (by Bayeselo default) engines in a match?
Kai
Code: Select all
logistic(-eloDelta - eloAdvantage + eloDraw);
Code: Select all
double f = thetaD * sqrt(logistic(eloDelta + eloAdvantage) * logistic(-eloDelta - eloAdvantage));
return logistic(eloDelta + eloHome) / (1 + f);
Daniel
-
Daniel Shawul
- Posts: 4186
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: SCCT Rating List - Calculation by EloStat 1.3
I don't think so. That was what was originally suspected for the 'compression' effect but after remi pointed out use of scaling parameter the problem went away. So ccrl decided to use scale = 1 to avoid compression but I am not sure it is the right thing to do if you want to compare to other rating lists.Modern Times wrote:Thanks. Adam knows better, but I think it is "mm 1 1" that causes the differences between our pure and compete databases, not scale 1.
-
Adam Hair
- Posts: 3226
- Joined: Wed May 06, 2009 10:31 pm
- Location: Fuquay-Varina, North Carolina
Re: SCCT Rating List - Calculation by EloStat 1.3
'mm 1 1' does cause a difference between the complete list and the pure list ratings.. However, 'scale 1' is the cause of the large difference. The default method of computing the scale modulates those differences.
-
Adam Hair
- Posts: 3226
- Joined: Wed May 06, 2009 10:31 pm
- Location: Fuquay-Varina, North Carolina
Re: SCCT Rating List - Calculation by EloStat 1.3
Hi Daniel,Daniel Shawul wrote:I don't think so. That was what was originally suspected for the 'compression' effect but after remi pointed out use of scaling parameter the problem went away. So ccrl decided to use scale = 1 to avoid compression but I am not sure it is the right thing to do if you want to compare to other rating lists.Modern Times wrote:Thanks. Adam knows better, but I think it is "mm 1 1" that causes the differences between our pure and compete databases, not scale 1.
I regret recommending that we use 'scale 1'. The great thing about the different lists is to be able to compare the results. No two lists would be comparable if 'scale 1' is used, unless their draw rates were close to being the same.
-
Daniel Shawul
- Posts: 4186
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: SCCT Rating List - Calculation by EloStat 1.3
Hi AdamAdam Hair wrote:Hi Daniel,Daniel Shawul wrote:I don't think so. That was what was originally suspected for the 'compression' effect but after remi pointed out use of scaling parameter the problem went away. So ccrl decided to use scale = 1 to avoid compression but I am not sure it is the right thing to do if you want to compare to other rating lists.Modern Times wrote:Thanks. Adam knows better, but I think it is "mm 1 1" that causes the differences between our pure and compete databases, not scale 1.
I regret recommending that we use 'scale 1'. The great thing about the different lists is to be able to compare the results. No two lists would be comparable if 'scale 1' is used, unless their draw rates were close to being the same.
No need to regret , infact your tests motivated good discussions and also led me to look in rating stuff more. I would not have brought scaling issue , if I wasn't getting very large numbers with a different draw model. I though this is going to be a problem to convince people to use it, and maybe that also happens as it is with the default draw model, which you confirmed by the big difference in range of pure/complete rating list. I am not sure if it would be completely avoidable but it is good to know anyway.
Daniel