SCCT Rating List - Calculation by EloStat 1.3

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Laskos »

Daniel Shawul wrote:
Laskos wrote:
Daniel Shawul wrote:This is a gigantic waste of time. I redid his calculation with his data and I get exactly 1 elo difference between fruit and rybka
Good, something of order 1-2-3 Elos increment in difference is what was expected.
Why exactly ? You even said it should decrease ,which it didn't. I see so many ridiculous claims it is not funny anymore... Like I said so many times it is not a popularity contest.
No, I said that Fruit performed worse than expected against Rybka, so I expect a slight increase in difference between Rybka and Fruit. It happened using all rating tools.

Kai
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Daniel Shawul »

Laskos wrote:
Daniel Shawul wrote:
Laskos wrote:
Daniel Shawul wrote:This is a gigantic waste of time. I redid his calculation with his data and I get exactly 1 elo difference between fruit and rybka
Good, something of order 1-2-3 Elos increment in difference is what was expected.
Why exactly ? You even said it should decrease ,which it didn't. I see so many ridiculous claims it is not funny anymore... Like I said so many times it is not a popularity contest.
No, I said that Fruit performed worse than expected against Rybka, so I expect a slight increase in difference between Rybka and Fruit. It happened using all rating tools.

Kai
I really can't tell what Sedat was doing . He was saying since Fruit's score decreased its elo should decrease too. Then I duly pointed out to him that it is the expected score that should decrease for a drop in elo. I requested for data before and after the fruit games are added since that is the only way to tell what was going on, only now did he provide it. But he ignored me and went on a rampage for long ... till now when he vanishes :)
Sedat Canbaz
Posts: 3018
Joined: Thu Mar 09, 2006 11:58 am
Location: Antalya/Turkey

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Sedat Canbaz »

Daniel Shawul wrote:And here is elostat's output i.e using tool inside bayeselo, guess what difference I got? Yes it is a 1 elo increment which is exactly sameas bayeselo's. Enough said...

Before

Code: Select all

version 0056, Copyright (C) 1997-2007 Remi Coulom.
compiled Jan 30 2007 20:30:07.
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under the terms and conditions of the GNU General Public License.
See http://www.gnu.org/copyleft/gpl.html for details.
ResultSet>readpgn scct1.pgn
29250 game(s) loaded, 0 game(s) with unknown result ignored.
ResultSet>elostat
Unknown command: elostat
type '?' for help
ResultSet>elo
ResultSet-EloRating>elostat
16 iterations
00:00:00,00
ResultSet-EloRating>ratings
Rank Name                          Elo    +    - games score oppo. draws
   1 Houdini 2.0t3* Pro x64 6c     164   18   17  1000   75%   -24   37%
   2 Houdini 2.0t3 Pro x64 6c      158   13   13  1700   70%    11   39%
   3 Houdini 2.0s2 Pro x64 6c      154   19   18  1000   74%   -30   34%
   4 Houdini 2.0z Pro x64 6c       151   15   14  1550   71%    -8   36%
   5 Houdini 2.0Bar2 x64 6c        149   17   16  1000   73%   -23   43%
   6 Houdini 2.0Higgs Pro x64 6c   140   17   16  1000   71%   -14   42%
   7 Houdini 2.0c Pro x64 6c       138   15   14  1450   71%   -18   39%
   8 Houdini 1.5a x64 6c           138   16   16  1100   68%    11   41%
   9 Houdini2Bar1 Pro x64 6c       128   15   15  1100   69%    -8   46%
  10 Critter 1.6 x64 6c             98   11   11  1900   63%     9   53%
  11 Critter 1.4 x64 6c             86   15   14  1150   67%   -36   47%
  12 Rybka 4.1 79DT v1 x64 6c       82   17   16  1100   66%   -33   38%
  13 Stockfish 120430P x64 6c       79   11   11  1850   60%     8   50%
  14 Rybka 4.1 NO-SSE x64 6c        72   16   15  1000   63%   -20   49%
  15 Stockfish 2.2.2 JA x64 6c      72   15   14  1200   62%   -16   47%
  16 Deep Rybka 4.1 x64 6c          72   12   12  1750   60%     4   48%
  17 Ivanhoe B46fE.02 x64 6c        71   11   11  1900   59%     9   53%
  18 Ivanhoe B46fC x64 6c           69   14   14  1200   64%   -28   47%
  19 Stockfish VE09 x64 6c          65   16   15  1000   63%   -31   48%
  20 Fire 2.2 xTreme x64 6c         56   11   11  1900   57%    10   52%
  21 Vitruvius 1.11C x64 6c         55   11   11  1900   56%    10   51%
  22 Gull II beta2 x64 6c            7   13   13  1400   50%     4   51%
  23 Strelka 5.5 x64 1c            -13   12   12  1650   45%    23   48%
  24 Bouquet 1.4 x64 6c            -25   14   14  1250   46%     0   47%
  25 Naum 4.2 x64 6c               -32   12   12  1900   44%    12   44%
  26 Komodo 4.0 x64 1c             -52   12   12  1900   41%    12   42%
  27 Equinox 1.35 x64 6c           -82   13   14  1550   40%   -14   40%
  28 Deep Fritz 13 w32 6c          -83   12   12  1900   36%    13   43%
  29 Spike 1.4 Leiden w32 6c      -102   12   13  1900   34%    14   38%
  30 Chiron 1.1a x64 6c           -104   12   13  1900   34%    14   39%
  31 Deep Fritz 12 w32 6c         -119   15   16  1150   37%   -27   42%
  32 Deep Junior 13.3 x64 6c      -120   13   14  1700   31%    22   36%
  33 Protector 1.4.0 x64 6c       -126   13   13  1900   31%    14   36%
  34 Deep Junior 13 x64 6c        -127   15   16  1300   35%   -20   36%
  35 Spark 1.0 x64 6c             -128   12   13  1850   31%    11   39%
  36 Deep Shredder 12 x64 6c      -132   13   13  1900   30%    15   37%
  37 Hiarcs 13.2 w32 6c           -144   13   14  1900   29%    15   32%
  38 Zappa Mexico II x64 6c       -161   14   15  1550   29%    -4   34%
  39 Fruit 090705 x64 6c          -231   18   19  1150   23%   -23   29%
ResultSet-EloRating>
After

Code: Select all

version 0056, Copyright (C) 1997-2007 Remi Coulom.
compiled Jan 30 2007 20:30:07.
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under the terms and conditions of the GNU General Public License.
See http://www.gnu.org/copyleft/gpl.html for details.
ResultSet>read scct2.pgn
Unknown command: read
type '?' for help
ResultSet>readpgn scct2.pgn
30408 game(s) loaded, 0 game(s) with unknown result ignored.
ResultSet>elo
ResultSet-EloRating>elostat
16 iterations
00:00:00,00
ResultSet-EloRating>ratings
Rank Name                          Elo    +    - games score oppo. draws
   1 Houdini 2.0t3* Pro x64 6c     164   18   17  1000   75%   -24   37%
   2 Houdini 2.0t3 Pro x64 6c      159   13   13  1735   70%     9   39%
   3 Houdini 2.0s2 Pro x64 6c      154   19   18  1000   74%   -30   34%
   4 Houdini 2.0z Pro x64 6c       150   14   14  1600   71%    -5   36%
   5 Houdini 2.0Bar2 x64 6c        150   16   15  1050   72%   -19   44%
   6 Houdini 2.0Higgs Pro x64 6c   140   17   16  1050   70%   -10   42%
   7 Houdini 1.5a x64 6c           138   16   16  1100   68%    11   41%
   8 Houdini 2.0c Pro x64 6c       137   14   14  1500   71%   -15   39%
   9 Houdini2Bar1 Pro x64 6c       128   15   15  1100   69%    -8   46%
  10 Critter 1.6 x64 6c             99   11   10  1935   63%     7   53%
  11 Critter 1.4 x64 6c             86   14   14  1200   66%   -32   47%
  12 Rybka 4.1 79DT v1 x64 6c       84   16   16  1134   66%   -33   38%
  13 Stockfish 120430P x64 6c       78   11   11  1884   60%     6   50%
  14 Stockfish 2.2.2 JA x64 6c      72   15   14  1200   62%   -16   47%
  15 Rybka 4.1 SSE42 x64 6c         72   12   11  1800   59%     5   49%
  16 Ivanhoe B46fE.02 x64 6c        71   11   11  1935   59%     8   52%
  17 Rybka 4.1 NO-SSE x64 6c        70   13   13  1500   60%    -4   49%
  18 Ivanhoe B46fC x64 6c           69   14   14  1250   63%   -24   48%
  19 Stockfish VE09 x64 6c          65   16   15  1000   63%   -31   48%
  20 Fire 2.2 xTreme x64 6c         56   11   11  1935   57%     8   52%
  21 Vitruvius 1.11C x64 6c         55   11   11  1934   57%     8   51%
  22 Gull II beta2 x64 6c            8   13   13  1435   51%     3   51%
  23 Strelka 5.5 x64 1c            -13   12   12  1684   45%    21   48%
  24 Bouquet 1.4 x64 6c            -25   14   14  1285   46%    -1   47%
  25 Naum 4.2 x64 6c               -32   12   12  1935   44%    11   44%
  26 Komodo 4.0 x64 1c             -52   12   12  1935   41%    11   42%
  27 Deep Hiarcs 14 WCSC w32 6c    -53   20   20   658   45%   -16   44%
  28 Equinox 1.35 x64 6c           -82   13   14  1550   40%   -14   40%
  29 Deep Fritz 13 w32 6c          -83   12   12  1935   37%    12   44%
  30 Spike 1.4 Leiden w32 6c      -102   12   13  1934   34%    13   38%
  31 Chiron 1.1a x64 6c           -104   12   12  1935   34%    13   39%
  32 Deep Junior 13.3 x64 6c      -120   13   14  1735   31%    20   36%
  33 Deep Fritz 12 w32 6c         -121   15   15  1200   36%   -24   42%
  34 Protector 1.4.0 x64 6c       -127   13   13  1934   31%    13   37%
  35 Deep Junior 13 x64 6c        -127   15   16  1300   35%   -20   36%
  36 Spark 1.0 x64 6c             -129   12   13  1884   31%    10   39%
  37 Deep Shredder 12 x64 6c      -132   12   13  1935   30%    13   37%
  38 Hiarcs 13.2 w32 6c           -144   13   14  1900   29%    15   32%
  39 Zappa Mexico II x64 6c       -160   14   15  1600   29%    -2   34%
  40 Fruit 090705 x64 6c          -234   18   19  1200   23%   -19   29%
ResultSet-EloRating>
Difference:
Diff1 = 72 - (-231) = 303
Diff2 = 70 - (-234) = 304
Increment = 1 elo!

Bye
Daniel
Thanks for your replay...

But however,
Wait a minute please...
Did you use the same database,which i calculated on 27.08.2012 ???
Do you know how many games i've used for the calculation on 27.08.2012 ???

And now what ? it looks like i have made a mistake or maybe BayesElo made a mistake ???

And my feelings say that there is something strange in BayesElo 0056

To be honest,
I am not sure now about how many games were included (during exactly my calculation on 27.08.2012)
But there is one true,where i've noticed that BayesElo calculated Fruit +16 Elo better
And still i strongly believe in that date (27.08.2012) BayesElo calculated Fruit as + 16 Elo stronger and Houdini 4 Elo less too

Code: Select all

Rank Name                          Elo    +    - games score oppo. draws 
   1 Houdini 2.0t3 Pro x64 6c     3359   14   14  1700   70%  3217   39% 
   2 Houdini 2.0t3* Pro x64 6c    3359   19   19  1000   75%  3185   37% 
   3 Houdini 2.0z Pro x64 6c      3356   15   15  1600   71%  3202   36% 
   4 Houdini 2.0s2 Pro x64 6c     3355   19   19  1000   74%  3179   34% 
   5 Houdini 1.5a x64 6c          3342   17   17  1100   68%  3218   41% 
   6 Houdini 2.0Bar2 x64 6c       3342   18   18  1050   72%  3190   44% 
   7 Houdini 2.0c Pro x64 6c      3341   15   15  1500   71%  3193   39% 
   8 Houdini 2.0Higgs Pro x64 6c  3338   18   18  1050   70%  3198   42% 
   9 Houdini2Bar1 Pro x64 6c      3328   17   17  1100   69%  3200   46% 
  10 Critter 1.6 x64 6c           3300   13   13  1900   63%  3215   53% 
  11 Critter 1.4 x64 6c           3290   16   16  1200   66%  3177   47% 
  12 Rybka 4.1 79DT v1 x64 6c     3287   17   17  1100   66%  3176   38% 
  13 Stockfish 120430P x64 6c     3284   13   13  1850   60%  3214   50% 
  14 Rybka 4.1 SSE42 x64 6c       3276   13   13  1800   59%  3212   49% 
  15 Ivanhoe B46fC x64 6c         3276   16   16  1250   63%  3185   48% 
  16 Ivanhoe B46fE.02 x64 6c      3276   13   13  1900   59%  3216   53% 
  17 Stockfish 2.2.2 JA x64 6c    3275   16   16  1200   62%  3192   47% 
  18 Rybka 4.1 NO-SSE x64 6c      3275   14   14  1500   60%  3204   49% 
  19 Fire 2.2 xTreme x64 6c       3263   12   12  1900   57%  3216   52% 
  20 Stockfish VE09 x64 6c        3263   17   17  1000   63%  3178   48% 
  21 Vitruvius 1.11C x64 6c       3261   13   13  1900   56%  3216   51% 
  22 Gull II beta2 x64 6c         3215   15   14  1400   50%  3211   51% 
  23 Strelka 5.5 x64 1c           3198   14   14  1650   45%  3229   48% 
  24 Bouquet 1.4 x64 6c           3185   15   15  1250   46%  3207   47% 
  25 Naum 4.2 x64 6c              3178   13   13  1900   44%  3218   44% 
  26 Komodo 4.0 x64 1c            3160   13   13  1900   41%  3219   42% 
  27 Equinox 1.35 x64 6c          3129   14   14  1550   40%  3194   40% 
  28 Deep Fritz 13 w32 6c         3129   13   13  1900   36%  3220   43% 
  29 Spike 1.4 Leiden w32 6c      3110   13   14  1900   34%  3220   38% 
  30 Chiron 1.1a x64 6c           3108   13   13  1900   34%  3220   39% 
  31 Deep Fritz 12 w32 6c         3093   16   17  1200   36%  3185   42% 
  32 Deep Junior 13.3 x64 6c      3092   14   15  1700   31%  3228   36% 
  33 Protector 1.4.0 x64 6c       3087   14   14  1900   31%  3221   36% 
  34 Spark 1.0 x64 6c             3084   14   14  1850   31%  3218   39% 
  35 Deep Junior 13 x64 6c        3082   16   16  1300   35%  3189   36% 
  36 Deep Shredder 12 x64 6c      3080   14   14  1900   30%  3221   37% 
  37 Hiarcs 13.2 w32 6c           3064   14   14  1900   29%  3221   32% 
  38 Zappa Mexico II x64 6c       3053   15   15  1600   29%  3206   34% 
  39 Fruit 090705 x64 6c          2981   18   18  1200   23%  3190   29% 

Btw,here is the Bayeslo file (without any modification):
Image

For those who still have some suspicious,
The above BayesElo file (without any modification) is available for downloading:
http://www.sedatcanbaz.com/chess/files/Fruit_16_Elo.rar

One thing more,
Next time probably i need to take my calculations in video - LIVE :)

Hope this help,
Sedat
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Laskos »

Daniel Shawul wrote:
Laskos wrote:
Daniel Shawul wrote:
Laskos wrote:
Daniel Shawul wrote:This is a gigantic waste of time. I redid his calculation with his data and I get exactly 1 elo difference between fruit and rybka
Good, something of order 1-2-3 Elos increment in difference is what was expected.
Why exactly ? You even said it should decrease ,which it didn't. I see so many ridiculous claims it is not funny anymore... Like I said so many times it is not a popularity contest.
No, I said that Fruit performed worse than expected against Rybka, so I expect a slight increase in difference between Rybka and Fruit. It happened using all rating tools.

Kai
I really can't tell what Sedat was doing . He was saying since Fruit's score decreased its elo should decrease too. Then I duly pointed out to him that it is the expected score that should decrease for a drop in elo. I requested for data before and after the fruit games are added since that is the only way to tell what was going on, only now did he provide it. But he ignored me and went on a rampage for long ... till now when he vanishes :)
Doesn't matter, no problem with Bayeselo or other tools in this case.

Kai
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Daniel Shawul »

Doesn't matter, no problem with Bayeselo or other tools in this case.

Kai
There never was a problem for bayeselo. People see what they want to see...proved many times. 'compression' , 'magnification' etc.. I am yet to see anything conrete, all it proved is incompetence of users.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Laskos »

Daniel Shawul wrote:
Doesn't matter, no problem with Bayeselo or other tools in this case.

Kai
There never was a problem for bayeselo. People see what they want to see...proved many times. 'compression' , 'magnification' etc.. I am yet to see anything conrete, all it proved is incompetence of users.
No, there is, at least that that Bayeselo points are not exactly Elo points, or that default Bayeselo result for IPON was hard to interpret compared to individual Elo performances.
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Daniel Shawul »

Laskos wrote:
Daniel Shawul wrote:
Doesn't matter, no problem with Bayeselo or other tools in this case.

Kai
There never was a problem for bayeselo. People see what they want to see...proved many times. 'compression' , 'magnification' etc.. I am yet to see anything conrete, all it proved is incompetence of users.
No, there is, at least that that Bayeselo points are not exactly Elo points, or that default Bayeselo result for IPON was hard to interpret compared to individual Elo performances.
Which is what I brought up. I told CCRL not to use scale=1 while you and the rest supported it to 'avoid' compression. Do you want to take that away from me now ? You know discussion start to become one liners when one side starts to loose..
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Laskos »

Daniel Shawul wrote:
Laskos wrote:
Daniel Shawul wrote:
Doesn't matter, no problem with Bayeselo or other tools in this case.

Kai
There never was a problem for bayeselo. People see what they want to see...proved many times. 'compression' , 'magnification' etc.. I am yet to see anything conrete, all it proved is incompetence of users.
No, there is, at least that that Bayeselo points are not exactly Elo points, or that default Bayeselo result for IPON was hard to interpret compared to individual Elo performances.
Which is what I brought up. I told CCRL not to use scale=1 while you and the rest supported it to 'avoid' compression. Do you want to take that away from me now ? You know discussion start to become one liners when one side starts to loose..
The IPON problem was using default Bayeselo. I think using scale=1 eliminates the IPON problem to compare with performances. I do not know with what scale the real Elos are shown. I think Adam has shown clearly that default Bayeselo compresses ratings. I do not know what I am loosing, I was not winning here something either.

Kai
Sedat Canbaz
Posts: 3018
Joined: Thu Mar 09, 2006 11:58 am
Location: Antalya/Turkey

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Sedat Canbaz »

Dear Friends,

I don't clam that i am always right,of course i can be wrong ...

But however,
I have a question to all BayesElo Experts:
-Please download my BayesElo files and i need urgently your feedback

In other words,
I wonder a lot:where i've made mistake ?

Image

Btw,the above files are available for downloading too:
http://www.sedatcanbaz.com/chess/files/13_14.rar


Thanks in advance,
Sedat
Last edited by Sedat Canbaz on Wed Aug 29, 2012 11:20 pm, edited 3 times in total.
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: SCCT Rating List - Calculation by EloStat 1.3

Post by Daniel Shawul »

The IPON problem was using default Bayeselo. I think using scale=1 eliminates the problem to compare with performances. I do not know whith what scale the real Elos are shown.

Kai
I do not know of the IPON problem. Neither did I ventured to guess the what caused the difference b/n pure/complete rating lists of CCRL. But I know not using the scale is worse than using scale = 1 to make comparisons between different lists. If you take the above example I did, mm calculated the scale to be around 0.7 which is why the elostat and bayeselo ratings numbers are more or less equal. If I used scale=1, bayeselo output would be magnified by 1/0.7 = 1.43 so a 100 elo difference maybe magnified to 140 elo. This definately would make comparisons difficult.
If you look at using scale=1, there isn't really any advantage. Staying true to the model ? why anyway because one can assume the model multiplies by a factor. What advantage does scale=1 bring? I know for sure using scaled rating atleast makes comparisons somewhat more acceptable.

Edit:
To your edited addition
I think Adam has shown clearly that default Bayeselo compresses ratings. I do not know what I am loosing, I was not winning here something either.
Ask Adam about it and see if he still thinks bayeselo compresses or anything like that. Well you keep on writing one liners so it seems you are interested in keeping me busy when false claims are out of the window. I do not wish to engage until 'another' data comes up ... It is amusing to say the least :)