SCCT Rating List - Calculation by EloStat 1.3

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Daniel Shawul »

Sedat Canbaz wrote:Dear Experts,

I have a few questions:


I have no much idea about why BayesElo calculated a such strange Elo results:
1) I wonder,why Houdini 2.0t3's Elo performance went down from 3363 to 3359 Elo
*Note that Houdini 2.0t3 's Elo is fall down without playing single game ?

Where i noticed strange results: Fruit 090705's Elo is increased +16 Elo
*Note also that Fruit 090705 's Elo is increased just with playing only 50 games more (against Rybka 4.1 NO-SSE x64 6c)

Is that can be possible to appear a such +16 Elo increasing,even after a such low performance by Fruit ?

Code: Select all

Individual statistics:
Fruit 090705 x64 6c  vs Rybka 4.1 NO-SSE x64 6c 
 50 (+  0,=  8,- 42),  8.0 % 


Rank Name                          Elo    +    - games score oppo. draws 
   1 Houdini 2.0t3 Pro x64 6c     3363   12   12  1700   70%  3212   39% 
  39 Fruit 090705 x64 6c          2965   15   15  1150   23%  3178   29% 

Rank Name                          Elo    +    - games score oppo. draws 
   1 Houdini 2.0t3 Pro x64 6c     3359   14   14  1700   70%  3217   39% 
  39 Fruit 090705 x64 6c          2981   18   18  1200   23%  3190   29%

Thanks in advance,
Sedat
Hello Sedat
My guess is that Houdini may have performed well against Rybka. Now when fruit performs better against Rybka, Fruit's elo increases, Rybka's decreases, thus the good performance of Houdini against rybka should now be decreased (because it is now judged to have played a weaker opponent). I am sure other tools would also adjust along those lines.
If you have both collection of games before and after the fruit games were added, I would be happy to do comparisons for you.
Daniel
Sedat Canbaz
Posts: 3018
Joined: Thu Mar 09, 2006 11:58 am
Location: Antalya/Turkey

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Sedat Canbaz »

A few notes more...


1) Ordo calculation = Fruit 090705 x64 6c: 1150 games per player

Code: Select all

RANK PARTICIPANT               : RATING    POINTS   GAMES    (%)
39 Fruit 090705 x64 6c         : 2934.8     266.0    1150   23.1%
--------------------------------------------------------------------------------

2) Ordo calculation = Fruit 090705 x64 6c: 1200 games per player

Code: Select all

RANK PARTICIPANT                 : RATING    POINTS   GAMES    (%)
  40 Fruit 090705 x64 6c         : 2932.3     270.0    1200   22.5%
*Odo Elo difference (between Fruit: 1150 and 1200 games): -2,5 Elo


It seems, Ordo calculation is better than BayesElo (with the current situation)


**********************************************************

Btw,even Elostat seems to be better than BayesElo too (with the current situation)


3) Elostat calculation = Fruit 090705 x64 6c: 1150 games per player:

Code: Select all

   Program                           Elo    +   -   Games   Score   Av.Op.  Draws
 39 Fruit 090705 x64 6c            : 2971   18  18  1150    23.1 %   3180   29.0 %
--------------------------------------------------------------------------------

4) Elostat calculation = Fruit 090705 x64 6c: 1200 games per player:

Code: Select all

   Program                           Elo    +   -   Games   Score   Av.Op.  Draws
 40 Fruit 090705 x64 6c            : 2968   18  18  1200    22.5 %   3183   28.5 %
*Elostat Elo difference (between Fruit: 1150 and 1200 games): -3 Elo


**********************************************************

5) BayeElo calculation = Fruit 090705 x64 6c: 1150 games per player:

Code: Select all

Rank Name                          Elo    +    - games score oppo. draws 
  39 Fruit 090705 x64 6c          2965   15   15  1150   23%  3178   29% 
--------------------------------------------------------------------------------


6) BayeElo calculation = Fruit 090705 x64 6c: 1200 games per player:

Code: Select all

Rank Name                          Elo    +    - games score oppo. draws
  39 Fruit 090705 x64 6c          2981   18   18  1200   23%  3190   29%
*BayesElo Elo difference (between Fruit: 1150 and 1200 games): +16 Elo

And finely:
-How is that be possible with only 50 games and + 16 Elo improvement by Fruit (even after a such low performance against Rybka NO-SSE) ?


*Note also that BayesElo calculates Fruit as :19 Elo difference than Ordo and Elostat

Best,
Sedat
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Daniel Shawul »

And finely:
-How is that be possible with only 50 games and + 16 Elo improvement by Fruit (even after a such low performance against Rybka NO-SSE) ?


*Note also that BayesElo calculates Fruit as :19 Elo difference than Ordo and Elostat

Best,
Sedat
If an engine scores few points against a strong opponent , its score is sure to fall down! That doesn't mean its elo will fall down. Fruit here played a strong opponent Rybka and its score falls down as expected. You would get the same thing if you play against Houdini too but its elo should only be penalized when its expected score falls down. Bayeselo probably judged fruit to have performed more than expected but I can't say that fro sure until you provide the games before and after the games. But you are definately wrong to expect a drop in elo just because its overall score droped.
As a side note Elostat and Ordo agree because they both use simplistic methods to calculate elo. Bayeselo is far advanced than both for realistic predictions of elo. This has been researched a lot (bayeselo vs elostat) so I urge you to look at that yourself if you are into it.
Sedat Canbaz
Posts: 3018
Joined: Thu Mar 09, 2006 11:58 am
Location: Antalya/Turkey

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Sedat Canbaz »

Daniel Shawul wrote:
Sedat Canbaz wrote:Dear Experts,

I have a few questions:


I have no much idea about why BayesElo calculated a such strange Elo results:
1) I wonder,why Houdini 2.0t3's Elo performance went down from 3363 to 3359 Elo
*Note that Houdini 2.0t3 's Elo is fall down without playing single game ?

Where i noticed strange results: Fruit 090705's Elo is increased +16 Elo
*Note also that Fruit 090705 's Elo is increased just with playing only 50 games more (against Rybka 4.1 NO-SSE x64 6c)

Is that can be possible to appear a such +16 Elo increasing,even after a such low performance by Fruit ?

Code: Select all

Individual statistics:
Fruit 090705 x64 6c  vs Rybka 4.1 NO-SSE x64 6c 
 50 (+  0,=  8,- 42),  8.0 % 


Rank Name                          Elo    +    - games score oppo. draws 
   1 Houdini 2.0t3 Pro x64 6c     3363   12   12  1700   70%  3212   39% 
  39 Fruit 090705 x64 6c          2965   15   15  1150   23%  3178   29% 

Rank Name                          Elo    +    - games score oppo. draws 
   1 Houdini 2.0t3 Pro x64 6c     3359   14   14  1700   70%  3217   39% 
  39 Fruit 090705 x64 6c          2981   18   18  1200   23%  3190   29%

Thanks in advance,
Sedat
Hello Sedat
My guess is that Houdini may have performed well against Rybka. Now when fruit performs better against Rybka, Fruit's elo increases, Rybka's decreases, thus the good performance of Houdini against rybka should now be decreased (because it is now judged to have played a weaker opponent). I am sure other tools would also adjust along those lines.
If you have both collection of games before and after the fruit games were added, I would be happy to do comparisons for you.
Daniel

Hello Daniel,

Thanks for your replay...

About Houdini Elo difference,
Surprisingly,even without playing any single game, we noticed 3 Elo difference by BayesElo
Interesting to note that Ordo calculated both situations with same Houdini Elo performance

But the most strange thing,
We see that there is almost 20 Elo difference between Ordo and BayesElo

And its quite clear that Ordo and Elostat programs calculate more accurate than BayesElo (with the current Fruit situation)

Once more i'd like to point out that the '50 game results' are finished with clear winning in favor for Rybka:
Image

So...i wonder:
-How can we trust to BayesElo (in the next calculations) ??


Best,
Sedat
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Daniel Shawul »


About Houdini Elo difference,
Surprisingly,even without playing any single game, we noticed 3 Elo difference by BayesElo
Interesting to note that Ordo calculated both situations with same Houdini Elo performance

But the most strange thing,
We see that there is almost 20 Elo difference between Ordo and BayesElo

And its quite clear that Ordo and Elostat programs calculate more accurate than BayesElo (with the current Fruit situation)
You are wrong. See my other reply to you why you shouldn't expect a drop in Elo just because score falls down. I will give you a very simple example
Say scorpio has 50% over all score, and it is expected to score 20% against Houdini. Now lets say we made 50 games against Houdini and it score 23%.

Result: Overall score will definately drom (say from 50% to 45%)
ELo: On the other should increase. Why? because its expected score increased from 20 to 23.

This is what you miss and hence your conclusions are invalid.
Sedat Canbaz
Posts: 3018
Joined: Thu Mar 09, 2006 11:58 am
Location: Antalya/Turkey

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Sedat Canbaz »

Daniel Shawul wrote:
And finely:
-How is that be possible with only 50 games and + 16 Elo improvement by Fruit (even after a such low performance against Rybka NO-SSE) ?


*Note also that BayesElo calculates Fruit as :19 Elo difference than Ordo and Elostat

Best,
Sedat
If an engine scores few points against a strong opponent , its score is sure to fall down! That doesn't mean its elo will fall down. Fruit here played a strong opponent Rybka and its score falls down as expected. You would get the same thing if you play against Houdini too but its elo should only be penalized when its expected score falls down. Bayeselo probably judged fruit to have performed more than expected but I can't say that fro sure until you provide the games before and after the games. But you are definately wrong to expect a drop in elo just because its overall score droped.
As a side note Elostat and Ordo agree because they both use simplistic methods to calculate elo. Bayeselo is far advanced than both for realistic predictions of elo. This has been researched a lot (bayeselo vs elostat) so I urge you to look at that yourself if you are into it.
I can be wrong of course...maybe you are wrong ?!

Sorry, but i don't want to be an BayesElo advanced user... :)

The reason is very simple,
Just with playing 50 games (with such low results by Fruit) and gaining + 16 Elo better performance :wink:

In shortly,
I prefer to see the reality than dreams !

Greetings,
Sedat
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Daniel Shawul »

can be wrong of course...maybe you are wrong ?!

Sorry, but i don't want to be an BayesElo advanced user...

The reason is very simple,
Just with playing 50 games (with such low results by Fruit) and gaining + 16 Elo better performance

In shortly,
I prefer to see the reality than dreams !

Greetings,
Sedat
Sure I can be wrong, don't know why you said that? You made a statement that Bayeselo is inferior giving an example, but if you bolk on the first challenge then what is there to discuss. Clearly the notion you have that a drop in overall score should result in drop in elo seems wrong to me atleast. Like I said from the beginning this is not a popularity contest for me. Can't say my argument helped people to use bayeselo more, seem to have the negative effect infact). But I want to see fair comparisons that didn't happen in the past with all the 'compression' stuff.
But hey the king is probably more popular than Rybka in the chess community. That is life Sedat :)
I guess we just agree to diagree.
Daniel
Sedat Canbaz
Posts: 3018
Joined: Thu Mar 09, 2006 11:58 am
Location: Antalya/Turkey

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Sedat Canbaz »

Daniel Shawul wrote:
can be wrong of course...maybe you are wrong ?!

Sorry, but i don't want to be an BayesElo advanced user...

The reason is very simple,
Just with playing 50 games (with such low results by Fruit) and gaining + 16 Elo better performance

In shortly,
I prefer to see the reality than dreams !

Greetings,
Sedat
Sure I can be wrong, don't know why you said that? You made a statement that Bayeselo is inferior giving an example, but if you bolk on the first challenge then what is there to discuss. Clearly the notion you have that a drop in overall score should result in drop in elo seems wrong to me atleast. Like I said from the beginning this is not a popularity contest for me. Can't say my argument helped people to use bayeselo more, seem to have the negative effect infact). But I want to see fair comparisons that didn't happen in the past with all the 'compression' stuff.
But hey the king is probably more popular than Rybka in the chess community. That is life Sedat :)
I guess we just agree to diagree.
Daniel
Dear Daniel,

Just my two cents more over this issue,
Looking at the current calculations,we noticed some strange or maybe wrong calculations by BayesElo

And to be honest,
This is the main reason about why i switched to use Ordo for the next SCCT calculations

Sure there is no any perfect work (including mine)
But however, personally i believe that Ordo calculation program seems to be a great tool,easy for use...
Mostly of the participants are with accurate Elo standings,in case of using Ordo
And what a Tester can need more from Ordo calculation tool ?

Best Wishes,
Sedat
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Daniel Shawul »

Dear Daniel,

Just my two cents more over this issue,
Looking at the current calculations,we noticed some strange or maybe wrong calculations by BayesElo

And to be honest,
This is the main reason about why i switched to use Ordo for the next SCCT calculations

Sure there is no any perfect work (including mine)
But however, personally i believe that Ordo calculation program seems to be a great tool,easy for use...
Mostly of the participants are with accurate Elo standings,in case of using Ordo
And what a Tester can need more from Ordo calculation tool ?

Best Wishes,
Sedat
Dear Sedat
People see what they want to see, doesn't make it the truth.
Best Wishes
Daniel
Sedat Canbaz
Posts: 3018
Joined: Thu Mar 09, 2006 11:58 am
Location: Antalya/Turkey

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Sedat Canbaz »

Daniel Shawul wrote:
Dear Daniel,

Just my two cents more over this issue,
Looking at the current calculations,we noticed some strange or maybe wrong calculations by BayesElo

And to be honest,
This is the main reason about why i switched to use Ordo for the next SCCT calculations

Sure there is no any perfect work (including mine)
But however, personally i believe that Ordo calculation program seems to be a great tool,easy for use...
Mostly of the participants are with accurate Elo standings,in case of using Ordo
And what a Tester can need more from Ordo calculation tool ?

Best Wishes,
Sedat
Dear Sedat
People see what they want to see, doesn't make it the truth.
Best Wishes
Daniel
Agreed with you...

One thing more,
-Some of us are well-experienced in Theory
-Some of us are well-experienced in Practice

But as far as i know:
-'Practice' is more important than 'Theory'

And usually the winner is 'Practice'

Btw,i like this saying:
“In theory, theory and practice are the same. In practice, they are not.”
-Albert Einstein


Best,
Sedat