SCCT Rating List - Calculation by EloStat 1.3

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Daniel Shawul »

Hello Adam
I didn't know you have that problem with the complete/pure list too. Well in that case I think using scale becomes even more necessary. For other models where the formula don't give you elos close to arpad's assumption scaling would be even more appropriate. Also Remi prefered use of scaling (made it default) so I would think using it would be safer. It doesn't change anything else other than the magnitude of the elo differences.
Danile
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Laskos »

Daniel Shawul wrote:Hello Adam
I didn't know you have that problem with the complete/pure list too. Well in that case I think using scale becomes even more necessary. For other models where the formula don't give you elos close to arpad's assumption scaling would be even more appropriate. Also Remi prefered use of scaling (made it default) so I would think using it would be safer. It doesn't change anything else other than the magnitude of the elo differences.
Danile
I don't understand how it doesn't change anything else. If 200-75% is not preserved, what is the meaning of those numbers given as rating? Even the transitivity is not preserved using arbitary scaling. I understand by Elo rating that if I pick from the rating list an engine rated 2900 and another rated 2700, then the prediction is that the engine rated 2900 will score 75% in a match against the engine rated 2700. Bayeselo default fails in its prediction or maybe there are secret tables to derive the predictions of which I am not aware. So, tell me, with default Bayeselo, what is the prediction in % for those 2900 and 2700 rated (by Bayeselo default) engines in a match?

Kai
Modern Times
Posts: 3782
Joined: Thu Jun 07, 2012 11:02 pm

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Modern Times »

Daniel Shawul wrote:Hello Adam
I didn't know you have that problem with the complete/pure list too. Well in that case I think using scale becomes even more necessary.
Do you mean using "scale 1" is necessary, or not using it ?
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Daniel Shawul »

Modern Times wrote:
Daniel Shawul wrote:Hello Adam
I didn't know you have that problem with the complete/pure list too. Well in that case I think using scale becomes even more necessary.
Do you mean using "scale 1" is necessary, or not using it ?
Hello Ray
I meant not using it. By default it calculates its own scale say 0.834 and applies it to get rating differences similar to elo stat's. If you use scale = 1 it means the ratings are displayed as they are calculated in which case you could get magnified values sometimes.
Daniel
Modern Times
Posts: 3782
Joined: Thu Jun 07, 2012 11:02 pm

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Modern Times »

Thanks. Adam knows better, but I think it is "mm 1 1" that causes the differences between our pure and compete databases, not scale 1.
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Daniel Shawul »

Laskos wrote:
Daniel Shawul wrote:Hello Adam
I didn't know you have that problem with the complete/pure list too. Well in that case I think using scale becomes even more necessary. For other models where the formula don't give you elos close to arpad's assumption scaling would be even more appropriate. Also Remi prefered use of scaling (made it default) so I would think using it would be safer. It doesn't change anything else other than the magnitude of the elo differences.
Danile
I don't understand how it doesn't change anything else. If 200-75% is not preserved, what is the meaning of those numbers given as rating?
When you use scale = 1, you are not preserving the 200-75% assumption.
Bayeselo has eloDraw and eloAdvantage that you need to add to the eloDelta to see the 200-75% assumption. But people want to see that with just eloDelta (i.e when comparing two engines). That is why an arbitrary factor was needed to be applied but it doesn't need to be. It doesn't change anything else in the sense that the order and relative elo differences are preserved.

Code: Select all

Rating  = scale * Original values + offset
Even the transitivity is not preserved using arbitary scaling. I understand by Elo rating that if I pick from the rating list an engine rated 2900 and another rated 2700, then the prediction is that the engine rated 2900 will score 75% in a match against the engine rated 2700. Bayeselo default fails in its prediction or maybe there are secret tables to derive the predictions of which I am not aware. So, tell me, with default Bayeselo, what is the prediction in % for those 2900 and 2700 rated (by Bayeselo default) engines in a match?
Kai
As I explained above bayeselo uses logistic by adding two more parameters

Code: Select all

logistic(-eloDelta - eloAdvantage + eloDraw);
So you need to add and decreas eloAdvantage and eloDraw to the differece to see the 200-75%. With another draw model that uses logistic like this

Code: Select all

double f = thetaD * sqrt(logistic(eloDelta + eloAdvantage) * logistic(-eloDelta - eloAdvantage));
return logistic(eloDelta + eloHome) / (1 + f);
,the default values you get may be even more magnified. But it is better to ask Remi ,this is just my opinion from making elo ratings on ccrl 40/40 with this model. Remi had a reason to apply the scaling by default from what I read.

Daniel
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Daniel Shawul »

Modern Times wrote:Thanks. Adam knows better, but I think it is "mm 1 1" that causes the differences between our pure and compete databases, not scale 1.
I don't think so. That was what was originally suspected for the 'compression' effect but after remi pointed out use of scaling parameter the problem went away. So ccrl decided to use scale = 1 to avoid compression but I am not sure it is the right thing to do if you want to compare to other rating lists.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Adam Hair »

'mm 1 1' does cause a difference between the complete list and the pure list ratings.. However, 'scale 1' is the cause of the large difference. The default method of computing the scale modulates those differences.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Adam Hair »

Daniel Shawul wrote:
Modern Times wrote:Thanks. Adam knows better, but I think it is "mm 1 1" that causes the differences between our pure and compete databases, not scale 1.
I don't think so. That was what was originally suspected for the 'compression' effect but after remi pointed out use of scaling parameter the problem went away. So ccrl decided to use scale = 1 to avoid compression but I am not sure it is the right thing to do if you want to compare to other rating lists.
Hi Daniel,

I regret recommending that we use 'scale 1'. The great thing about the different lists is to be able to compare the results. No two lists would be comparable if 'scale 1' is used, unless their draw rates were close to being the same.
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Daniel Shawul »

Adam Hair wrote:
Daniel Shawul wrote:
Modern Times wrote:Thanks. Adam knows better, but I think it is "mm 1 1" that causes the differences between our pure and compete databases, not scale 1.
I don't think so. That was what was originally suspected for the 'compression' effect but after remi pointed out use of scaling parameter the problem went away. So ccrl decided to use scale = 1 to avoid compression but I am not sure it is the right thing to do if you want to compare to other rating lists.
Hi Daniel,

I regret recommending that we use 'scale 1'. The great thing about the different lists is to be able to compare the results. No two lists would be comparable if 'scale 1' is used, unless their draw rates were close to being the same.
Hi Adam
No need to regret , infact your tests motivated good discussions and also led me to look in rating stuff more. I would not have brought scaling issue , if I wasn't getting very large numbers with a different draw model. I though this is going to be a problem to convince people to use it, and maybe that also happens as it is with the default draw model, which you confirmed by the big difference in range of pure/complete rating list. I am not sure if it would be completely avoidable but it is good to know anyway.
Daniel