SCCT Rating List - Calculation by EloStat 1.3

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Daniel Shawul »

Hello Sedat
I didn't mean to criticize the way you calculate ratings. Bayeselo should be used with mm 1 1 because both parameters are unique to it and are also important for correct elo modeling. White advantage and draw ratio (indicated by 1 1) are better calculated from your data and sometimes using default values from elesewhere may not be appropriate. So in your case the difference b/n mm 1 1 and mm 0 1 was small because the default values happened to be same as those obtained from your data. But in general it is better to always do an mm 1 1.
Anyway why I commented in this thread was because I was surprized that you switched from bayeselo to elostat 1.3. I can assure you gain nothing going that way :) There is no white advantage , no draw model, no prior , no LOS etc.. Bayeselo wins hands down if we don't make it a popularity contest. But anyway for most situations all seem to do a fairly good job. Using different elo estimators in different lists causes confusion as to by how much elos an engine improved. For example CCRL stoped using scale parameter which means the improvement it reports for an engine is slightly magnified compared to what CEGT will report even though the engine performed the same in both. Nothing extraordinary but few elos differences maybe.
cheers
Daniel
Sedat Canbaz
Posts: 3018
Joined: Thu Mar 09, 2006 11:58 am
Location: Antalya/Turkey

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Sedat Canbaz »

Daniel Shawul wrote:Hello Sedat
I didn't mean to criticize the way you calculate ratings. Bayeselo should be used with mm 1 1 because both parameters are unique to it and are also important for correct elo modeling. White advantage and draw ratio (indicated by 1 1) are better calculated from your data and sometimes using default values from elesewhere may not be appropriate. So in your case the difference b/n mm 1 1 and mm 0 1 was small because the default values happened to be same as those obtained from your data. But in general it is better to always do an mm 1 1.
Anyway why I commented in this thread was because I was surprized that you switched from bayeselo to elostat 1.3. I can assure you gain nothing going that way :) There is no white advantage , no draw model, no prior , no LOS etc.. Bayeselo wins hands down if we don't make it a popularity contest. But anyway for most situations all seem to do a fairly good job. Using different elo estimators in different lists causes confusion as to by how much elos an engine improved. For example CCRL stoped using scale parameter which means the improvement it reports for an engine is slightly magnified compared to what CEGT will report even though the engine performed the same in both. Nothing extraordinary but few elos differences maybe.
cheers
Daniel
Dear Daniel,

No..No... its quite clear that you don't criticize my work :)

Even i appreciate a lot your useful comments and i'd like to thank you again

Actually there is nothing wrong even in criticizing,but only in friendly way

I mean where we can learn and improve our work in better way

Btw,i like this saying:
“He has a right to criticize, who has a heart to help.”
- Abraham Lincoln

Ok...in the next calculations i will use only mm 1 1

For example,the latest SCCT is calculated with mm 11:
http://www.sedatcanbaz.com/chess/scct-rating/


Best Regards,
Sedat
Last edited by Sedat Canbaz on Mon Aug 27, 2012 11:17 pm, edited 1 time in total.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Laskos »

Daniel Shawul wrote:For example CCRL stoped using scale parameter which means the improvement it reports for an engine is slightly magnified compared to what CEGT will report even though the engine performed the same in both. Nothing extraordinary but few elos differences maybe.
cheers
Daniel
Why this scale parameter was there at all, for years it gave a rating deflation in most CC ratings. I understood that matching the derivative of the logistic in 0 gives the scale parameter 1, which must be used, and not 0.87 or who knows what. And the difference was not small, often 15%, so for a difference of 100 Elos some 15 Elos, and for 400 some 60. Maybe one has to advice everybody to use mm 1 1 and scale 1, that would be the most meaningful way to do it. By the way, Ordo agrees in this case very well with Bayeselo.

Kai
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Daniel Shawul »

Laskos wrote:
Daniel Shawul wrote:For example CCRL stoped using scale parameter which means the improvement it reports for an engine is slightly magnified compared to what CEGT will report even though the engine performed the same in both. Nothing extraordinary but few elos differences maybe.
cheers
Daniel
Why this scale parameter was there at all, for years it gave a rating deflation in most CC ratings. I understood that matching the derivative of the logistic in 0 gives the scale parameter 1, which must be used, and not 0.87 or who knows what. And the difference was not small, often 15%, so for a difference of 100 Elos some 15 Elos, and for 400 some 60. Maybe one has to advice everybody to use mm 1 1 and scale 1, that would be the most meaningful way to do it. By the way, Ordo agrees in this case very well with Bayeselo.

Kai
Hello Kai
I am undecided about the use of the scale parameter. On the one hand you want to compare different computer rating lists and also with human rating lists. For that you would need to have something that translates to something like a 200 elo diff is 75%. The scaling serves only that purpose otherwise it distorts a result otherwise obtained through a legitimate procedure. On the other hand ,if you don't use it your rating differences will be bigger. Infact the default bayeselo model as it is right now doesn't need that much scaling but with a changed draw model it can be big, I mean really big. I can't see CCRL using that without scaling like it is right now. It needs to be scaled down to make any comparisons with others IMO. In essence it doesn't matter what numbers you assign as elos as long as you have a model that fits the data. But we also want to make comparisons so it is a dilemma.
Daniel
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Laskos »

Daniel Shawul wrote:
Laskos wrote:
Daniel Shawul wrote:For example CCRL stoped using scale parameter which means the improvement it reports for an engine is slightly magnified compared to what CEGT will report even though the engine performed the same in both. Nothing extraordinary but few elos differences maybe.
cheers
Daniel
Why this scale parameter was there at all, for years it gave a rating deflation in most CC ratings. I understood that matching the derivative of the logistic in 0 gives the scale parameter 1, which must be used, and not 0.87 or who knows what. And the difference was not small, often 15%, so for a difference of 100 Elos some 15 Elos, and for 400 some 60. Maybe one has to advice everybody to use mm 1 1 and scale 1, that would be the most meaningful way to do it. By the way, Ordo agrees in this case very well with Bayeselo.

Kai
Hello Kai
I am undecided about the use of the scale parameter. On the one hand you want to compare different computer rating lists and also with human rating lists. For that you would need to have something that translates to something like a 200 elo diff is 75%. The scaling serves only that purpose otherwise it distorts a result otherwise obtained through a legitimate procedure. On the other hand ,if you don't use it your rating differences will be bigger. Infact the default bayeselo model as it is right now doesn't need that much scaling but with a changed draw model it can be big, I mean really big. I can't see CCRL using that without scaling like it is right now. It needs to be scaled down to make any comparisons with others IMO. In essence it doesn't matter what numbers you assign as elos as long as you have a model that fits the data. But we also want to make comparisons so it is a dilemma.
Daniel
I am a bit at loss with this argument. Doesn't the derivative set at the origin for the given by the model normalized, single-parameter logistic gives the full logistic (e.g. 75% at 200)? In fact, as I understood, the default Bayeselo (when calculating scale after mm) was compressing the default logistic to something that 75% meant 150 or 180 instead of 200, and when setting the scale to 1, 75% meant exactly that 200. Also, have not you shown that the draw model used by Bayeselo is not the optimal one (although this is not very relevant for the discussion, the model differences are small compared to that huge absolute scale variations)?

Kai
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Daniel Shawul »

I am a bit at loss with this argument. Doesn't the derivative set at the origin for the given by the model normalized, single-parameter logistic gives the full logistic (e.g. 75% at 200)? In fact, as I understood, the default Bayeselo (when calculating scale after mm) was compressing the default logistic to something that 75% meant 150 or 180 instead of 200, and when setting the scale to 1, 75% meant exactly that 200.
I think the scaling is required to bring ratings to the conventional arpad elo assumption of 200elo - 75% system. What you get from bayeselo originally is something larger, so you scale it down so that the slope's match at 50% winning percentage. So in the end you do not have 150 or 180 but 200 elos. I haven't followed the 'compression' topic closely but I assumed that was what was causing the problem. The problem with using scale=1 is for some models other than the default , the rating numbers may be bigger at first. I am not even sure we get somewhat comparable numbers with the default model and scale=1.
For example for current ccrl list I get -1031 to 731 elo using default, and -1534 to 1171 elo using modified model, both of them with scale=1. Infact at first I thought my modified model was wrong but it turns out it fits the data even better. But for comparison clearly it needs to be scaled by some 66% if we assume the first result is close to elo assumption. Btw eloDraw and eloAdvantage modify the model so that is why we don't get 200elo-75% out of the box.
Also, have not you shown that the draw model used by Bayeselo is not the optimal one (although this is not very relevant for the discussion, the model differences are small compared to that huge absolute scale variations)?
Kai
Well I would not call what I did anything formal and it still needs a lot of work to show that the model is indeed an improvement. Maybe if I can find some big databases with huge draw ratio like in reversi, the effect of the draw model would be measurable to a significant result.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Adam Hair »

Daniel Shawul wrote:
Laskos wrote:
Daniel Shawul wrote:For example CCRL stoped using scale parameter which means the improvement it reports for an engine is slightly magnified compared to what CEGT will report even though the engine performed the same in both. Nothing extraordinary but few elos differences maybe.
cheers
Daniel
Why this scale parameter was there at all, for years it gave a rating deflation in most CC ratings. I understood that matching the derivative of the logistic in 0 gives the scale parameter 1, which must be used, and not 0.87 or who knows what. And the difference was not small, often 15%, so for a difference of 100 Elos some 15 Elos, and for 400 some 60. Maybe one has to advice everybody to use mm 1 1 and scale 1, that would be the most meaningful way to do it. By the way, Ordo agrees in this case very well with Bayeselo.

Kai
Hello Kai
I am undecided about the use of the scale parameter. On the one hand you want to compare different computer rating lists and also with human rating lists. For that you would need to have something that translates to something like a 200 elo diff is 75%. The scaling serves only that purpose otherwise it distorts a result otherwise obtained through a legitimate procedure. On the other hand ,if you don't use it your rating differences will be bigger. Infact the default bayeselo model as it is right now doesn't need that much scaling but with a changed draw model it can be big, I mean really big. I can't see CCRL using that without scaling like it is right now. It needs to be scaled down to make any comparisons with others IMO. In essence it doesn't matter what numbers you assign as elos as long as you have a model that fits the data. But we also want to make comparisons so it is a dilemma.
Daniel
It is definitely a dilemma. I suggested to the other CCRL members that we should change how we compute our ratings. Using 'mm 1 1' would make full use of the information contained in our data. Using 'scale 1' removes the distortion so that the ratings reflect the model. However, these acts makes comparing different databases impossible. In fact, our complete and our pure databases can not be compared to each other. The different characteristics of our own databases produce distinctly different ratings. The difference between the top and bottom engines is 2300 for the complete list and 2129 for the pure list (the pure list is a best version list).

The more I think about this, the less certain I am about what is most desirable, accuracy or comparability.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by michiguel »

Sedat Canbaz wrote:Thank you Miguel !

I never used your Ordo calculation program,bu later i hope to try it...

Btw,i noticed in your Ordo list that Rybka 4.1 SSE4.2 is rated 4 Elo stronger than Rybka 4.1 NO-SSE
*Note that i mean about Rybka 4.1 NO-SSE 1000 games per player

Elostat and Fritz GUI calculates as 4 Elo difference too

For example,BayesElo calculates exactly as 1 Elo difference

And if we check all calculation utilities,we see that some of the players have slightly different Elo results...

So...i wonder a lot which tool calculates more accurate ?!

Greetings,
Sedat
The rating difference is so small, that is not worth agonizing about it. According to Ordo, the difference 4.0 points +/- 16.7 (obtained by the simulations, table which I did not post).

BTW, I just released this version in the programming sub-forum.

Miguel
Modern Times
Posts: 3786
Joined: Thu Jun 07, 2012 11:02 pm

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Modern Times »

Adam Hair wrote: It is definitely a dilemma. I suggested to the other CCRL members that we should change how we compute our ratings. Using 'mm 1 1' would make full use of the information contained in our data. Using 'scale 1' removes the distortion so that the ratings reflect the model. However, these acts makes comparing different databases impossible. In fact, our complete and our pure databases can not be compared to each other. The different characteristics of our own databases produce distinctly different ratings. The difference between the top and bottom engines is 2300 for the complete list and 2129 for the pure list (the pure list is a best version list).

The more I think about this, the less certain I am about what is most desirable, accuracy or comparability.
Yes, I am not certain what is best either !
Sedat Canbaz
Posts: 3018
Joined: Thu Mar 09, 2006 11:58 am
Location: Antalya/Turkey

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Sedat Canbaz »

michiguel wrote:
Sedat Canbaz wrote:Thank you Miguel !

I never used your Ordo calculation program,bu later i hope to try it...

Btw,i noticed in your Ordo list that Rybka 4.1 SSE4.2 is rated 4 Elo stronger than Rybka 4.1 NO-SSE
*Note that i mean about Rybka 4.1 NO-SSE 1000 games per player

Elostat and Fritz GUI calculates as 4 Elo difference too

For example,BayesElo calculates exactly as 1 Elo difference

And if we check all calculation utilities,we see that some of the players have slightly different Elo results...

So...i wonder a lot which tool calculates more accurate ?!

Greetings,
Sedat
The rating difference is so small, that is not worth agonizing about it. According to Ordo, the difference 4.0 points +/- 16.7 (obtained by the simulations, table which I did not post).

BTW, I just released this version in the programming sub-forum.

Miguel
Many thanks dear Miguel !!

Definitely i will try your tool

Best,
Sedat