H4 or S5 !?

IWB · Post by **IWB** » Mon Jun 02, 2014 5:42 pm

Hello all,

This is quite interesting:

The official method for the IPON is Bayeselo with mm 0 1, draw rate consideration. The pure TOP 16 one on one looks like this:

Code: Select all

   1 Houdini 4           3111    9    9  3300   75%  2921   31% 
   2 Stockfish 5         3106    9    8  3300   75%  2921   39% 
   3 Komodo 7a           3088    9    9  3300   72%  2922   37% 
   4 Gull 3              3057    8    8  3300   68%  2924   41% 
   5 Critter 1.4a        2980    8    8  3300   57%  2930   46% 
   6 Equinox 2.02        2975    8    8  3300   56%  2930   47% 
   7 Deep Rybka 4.1      2959    8    8  3300   54%  2931   45% 
   8 Deep Fritz 14       2894    8    8  3300   44%  2935   45% 
   9 Chiron 2            2889    8    8  3300   44%  2936   45% 
  10 Protector 1.6.0     2870    8    8  3300   41%  2937   44% 
  11 Hannibal 1.4b       2870    8    8  3300   41%  2937   43% 
  12 Naum 4.2            2838    8    9  3300   36%  2939   41% 
  13 Texel 1.04          2838    8    8  3300   37%  2939   38% 
  14 Senpai 1.0          2838    8    8  3300   36%  2939   41% 
  15 HIARCS 14 WCSC 32b  2812    9    9  3300   33%  2941   37% 
  16 Jonny 6.00          2798    9    9  3300   31%  2942   36%

The same set of data with Bayes default:

Code: Select all

   1 Houdini 4           3111   11   11  3300   75%  2931   31% 
   2 Stockfish 5         3105   10   10  3300   75%  2931   39% 
   3 Komodo 7a           3088   10   10  3300   72%  2932   37% 
   4 Gull 3              3057   10   10  3300   68%  2934   41% 
   5 Critter 1.4a        2984   10    9  3300   57%  2939   46% 
   6 Equinox 2.02        2980    9   10  3300   56%  2939   47% 
   7 Deep Rybka 4.1      2964   10   10  3300   54%  2940   45% 
   8 Deep Fritz 14       2905    9   10  3300   44%  2944   45% 
   9 Chiron 2            2900   10   10  3300   44%  2945   45% 
  10 Protector 1.6.0     2883   10   10  3300   41%  2946   44% 
  11 Hannibal 1.4b       2883   10   10  3300   41%  2946   43% 
  12 Naum 4.2            2854   10   10  3300   36%  2948   41% 
  13 Texel 1.04          2854   10   10  3300   37%  2948   38% 
  14 Senpai 1.0          2853   10   10  3300   36%  2948   41% 
  15 HIARCS 14 WCSC 32b  2830   10   10  3300   33%  2949   37% 
  16 Jonny 6.00          2816   10   10  3300   31%  2950   36%

Now with Elostat:

Code: Select all

  1 Stockfish 5                    &#58; 3115   10  10  3300    74.9 %   2924   38.6 %
  2 Houdini 4                      &#58; 3111   11  10  3300    74.5 %   2925   30.7 %
  3 Komodo 7a                      &#58; 3091   10  10  3300    72.1 %   2926   37.0 %
  4 Gull 3                         &#58; 3059    9   9  3300    68.0 %   2928   41.0 %
  5 Critter 1.4a                   &#58; 2982    9   9  3300    57.0 %   2933   46.1 %
  6 Equinox 2.02                   &#58; 2978    9   9  3300    56.3 %   2933   46.9 %
  7 Deep Rybka 4.1                 &#58; 2962    9   9  3300    53.9 %   2935   45.2 %
  8 Deep Fritz 14                  &#58; 2899    9   9  3300    44.4 %   2939   44.9 %
  9 Chiron 2                       &#58; 2894    9   9  3300    43.5 %   2939   45.1 %
 10 Protector 1.6.0                &#58; 2877    9   9  3300    40.9 %   2940   44.1 %
 11 Hannibal 1.4b                  &#58; 2875    9   9  3300    40.7 %   2940   42.6 %
 12 Texel 1.04                     &#58; 2846    9   9  3300    36.5 %   2942   38.5 %
 13 Naum 4.2                       &#58; 2845    9   9  3300    36.4 %   2942   40.9 %
 14 Senpai 1.0                     &#58; 2845    9   9  3300    36.3 %   2942   40.7 %
 15 HIARCS 14 WCSC 32b             &#58; 2822   10  10  3300    33.2 %   2944   37.5 %
 16 Jonny 6.00                     &#58; 2808   10  10  3300    31.2 %   2945   35.7 %

and finaly with ORDO:

Code: Select all

   # PLAYER                &#58; RATING    POINTS  PLAYED    (%)
   1 Stockfish 5           &#58; 3115.1    2473.0    3300   74.9%
   2 Houdini 4             &#58; 3111.0    2458.5    3300   74.5%
   3 Komodo 7a             &#58; 3089.3    2379.0    3300   72.1%
   4 Gull 3                &#58; 3054.9    2245.5    3300   68.0%
   5 Critter 1.4a          &#58; 2968.9    1882.0    3300   57.0%
   6 Equinox 2.02          &#58; 2963.8    1859.5    3300   56.3%
   7 Deep Rybka 4.1        &#58; 2945.6    1778.5    3300   53.9%
   8 Deep Fritz 14         &#58; 2875.7    1464.5    3300   44.4%
   9 Chiron 2              &#58; 2869.4    1436.5    3300   43.5%
  10 Protector 1.6.0       &#58; 2850.1    1351.0    3300   40.9%
  11 Hannibal 1.4b         &#58; 2848.3    1343.0    3300   40.7%
  12 Texel 1.04            &#58; 2816.4    1204.5    3300   36.5%
  13 Naum 4.2              &#58; 2815.5    1200.5    3300   36.4%
  14 Senpai 1.0            &#58; 2814.9    1198.0    3300   36.3%
  15 HIARCS 14 WCSC 32b    &#58; 2790.6    1096.0    3300   33.2%
  16 Jonny 6.00            &#58; 2774.4    1030.0    3300   31.2%

That is very good, as everyone can take the list he likes

Regards
Ingo

Modern Times · Post by **Modern Times** » Mon Jun 02, 2014 6:46 pm

Very interesting !

Yes indeed, take your pick.

michiguel · Post by **michiguel** » Mon Jun 02, 2014 7:33 pm

Ingo,

You used to provide the pgn file with only the results. Can you do that again? In that way, we can toy around with the rating programs and/or algorithms.

Miguel

michiguel · Post by **michiguel** » Mon Jun 02, 2014 7:43 pm

IWB wrote:Hello all,

This is quite interesting:

The official method for the IPON is Bayeselo with mm 0 1, draw rate consideration. The pure TOP 16 one on one looks like this:

Code: Select all

   1 Houdini 4           3111    9    9  3300   75%  2921   31% 
   2 Stockfish 5         3106    9    8  3300   75%  2921   39% 
   3 Komodo 7a           3088    9    9  3300   72%  2922   37% 
   4 Gull 3              3057    8    8  3300   68%  2924   41% 
   5 Critter 1.4a        2980    8    8  3300   57%  2930   46% 
   6 Equinox 2.02        2975    8    8  3300   56%  2930   47% 
   7 Deep Rybka 4.1      2959    8    8  3300   54%  2931   45% 
   8 Deep Fritz 14       2894    8    8  3300   44%  2935   45% 
   9 Chiron 2            2889    8    8  3300   44%  2936   45% 
  10 Protector 1.6.0     2870    8    8  3300   41%  2937   44% 
  11 Hannibal 1.4b       2870    8    8  3300   41%  2937   43% 
  12 Naum 4.2            2838    8    9  3300   36%  2939   41% 
  13 Texel 1.04          2838    8    8  3300   37%  2939   38% 
  14 Senpai 1.0          2838    8    8  3300   36%  2939   41% 
  15 HIARCS 14 WCSC 32b  2812    9    9  3300   33%  2941   37% 
  16 Jonny 6.00          2798    9    9  3300   31%  2942   36%

The same set of data with Bayes default:

Code: Select all

   1 Houdini 4           3111   11   11  3300   75%  2931   31% 
   2 Stockfish 5         3105   10   10  3300   75%  2931   39% 
   3 Komodo 7a           3088   10   10  3300   72%  2932   37% 
   4 Gull 3              3057   10   10  3300   68%  2934   41% 
   5 Critter 1.4a        2984   10    9  3300   57%  2939   46% 
   6 Equinox 2.02        2980    9   10  3300   56%  2939   47% 
   7 Deep Rybka 4.1      2964   10   10  3300   54%  2940   45% 
   8 Deep Fritz 14       2905    9   10  3300   44%  2944   45% 
   9 Chiron 2            2900   10   10  3300   44%  2945   45% 
  10 Protector 1.6.0     2883   10   10  3300   41%  2946   44% 
  11 Hannibal 1.4b       2883   10   10  3300   41%  2946   43% 
  12 Naum 4.2            2854   10   10  3300   36%  2948   41% 
  13 Texel 1.04          2854   10   10  3300   37%  2948   38% 
  14 Senpai 1.0          2853   10   10  3300   36%  2948   41% 
  15 HIARCS 14 WCSC 32b  2830   10   10  3300   33%  2949   37% 
  16 Jonny 6.00          2816   10   10  3300   31%  2950   36%

Now with Elostat:

Code: Select all

  1 Stockfish 5                    &#58; 3115   10  10  3300    74.9 %   2924   38.6 %
  2 Houdini 4                      &#58; 3111   11  10  3300    74.5 %   2925   30.7 %
  3 Komodo 7a                      &#58; 3091   10  10  3300    72.1 %   2926   37.0 %
  4 Gull 3                         &#58; 3059    9   9  3300    68.0 %   2928   41.0 %
  5 Critter 1.4a                   &#58; 2982    9   9  3300    57.0 %   2933   46.1 %
  6 Equinox 2.02                   &#58; 2978    9   9  3300    56.3 %   2933   46.9 %
  7 Deep Rybka 4.1                 &#58; 2962    9   9  3300    53.9 %   2935   45.2 %
  8 Deep Fritz 14                  &#58; 2899    9   9  3300    44.4 %   2939   44.9 %
  9 Chiron 2                       &#58; 2894    9   9  3300    43.5 %   2939   45.1 %
 10 Protector 1.6.0                &#58; 2877    9   9  3300    40.9 %   2940   44.1 %
 11 Hannibal 1.4b                  &#58; 2875    9   9  3300    40.7 %   2940   42.6 %
 12 Texel 1.04                     &#58; 2846    9   9  3300    36.5 %   2942   38.5 %
 13 Naum 4.2                       &#58; 2845    9   9  3300    36.4 %   2942   40.9 %
 14 Senpai 1.0                     &#58; 2845    9   9  3300    36.3 %   2942   40.7 %
 15 HIARCS 14 WCSC 32b             &#58; 2822   10  10  3300    33.2 %   2944   37.5 %
 16 Jonny 6.00                     &#58; 2808   10  10  3300    31.2 %   2945   35.7 %

and finaly with ORDO:

Code: Select all

   # PLAYER                &#58; RATING    POINTS  PLAYED    (%)
   1 Stockfish 5           &#58; 3115.1    2473.0    3300   74.9%
   2 Houdini 4             &#58; 3111.0    2458.5    3300   74.5%
   3 Komodo 7a             &#58; 3089.3    2379.0    3300   72.1%
   4 Gull 3                &#58; 3054.9    2245.5    3300   68.0%
   5 Critter 1.4a          &#58; 2968.9    1882.0    3300   57.0%
   6 Equinox 2.02          &#58; 2963.8    1859.5    3300   56.3%
   7 Deep Rybka 4.1        &#58; 2945.6    1778.5    3300   53.9%
   8 Deep Fritz 14         &#58; 2875.7    1464.5    3300   44.4%
   9 Chiron 2              &#58; 2869.4    1436.5    3300   43.5%
  10 Protector 1.6.0       &#58; 2850.1    1351.0    3300   40.9%
  11 Hannibal 1.4b         &#58; 2848.3    1343.0    3300   40.7%
  12 Texel 1.04            &#58; 2816.4    1204.5    3300   36.5%
  13 Naum 4.2              &#58; 2815.5    1200.5    3300   36.4%
  14 Senpai 1.0            &#58; 2814.9    1198.0    3300   36.3%
  15 HIARCS 14 WCSC 32b    &#58; 2790.6    1096.0    3300   33.2%
  16 Jonny 6.00            &#58; 2774.4    1030.0    3300   31.2%

That is very good, as everyone can take the list he likes

Regards
Ingo

There is only one correct answer, and that is SF5 should be #1 (by a very tiny small margin, though). Why? this is a round robin, so everybody played each other in the same conditions etc. etc. so, the programs who score mores points overall should be #1. This is one of the cases in which there is no doubt about the relative order. As a reference, in the output of Ordo you can see the actual points (the others give %). Whatever program you use, the relative order should follow the number of points. Basically, SF won this gigantic RR tournament, and should be #1.

1 Stockfish 5 : 3115.1 2473.0 3300 74.9%
2 Houdini 4 : 3111.0 2458.5 3300 74.5%

Miguel

IWB · Post by **IWB** » Mon Jun 02, 2014 8:33 pm

Hi

michiguel wrote:Ingo,

You used to provide the pgn file with only the results. Can you do that again? In that way, we can toy around with the rating programs and/or algorithms.

Miguel

There was little interest in it but as this is interesting ...:

http://www.inwoba.de/TOPRES.7z

I will delete this in a few days.

Bye
Ingo

IWB · Post by **IWB** » Mon Jun 02, 2014 8:50 pm

michiguel wrote:
There is only one correct answer, and that is SF5 should be #1 (by a very tiny small margin, though). Why? this is a round robin, so everybody played each other in the same conditions etc. etc. so, the programs who score mores points overall should be #1. This is one of the cases in which there is no doubt about the relative order. As a reference, in the output of Ordo you can see the actual points (the others give %). Whatever program you use, the relative order should follow the number of points. Basically, SF won this gigantic RR tournament, and should be #1.

1 Stockfish 5 : 3115.1 2473.0 3300 74.9%
2 Houdini 4 : 3111.0 2458.5 3300 74.5%

Miguel

Even if I can follow your argumentation here are 3 argument which are valid as well:

1. Your argument are there for years for No. 6 and 7 or 3 and 4 or 13 and 14 but nobody cared, in contrary the draw consideration was an important argument ... and now it is wrong?
2. Humans tend to value a decisive game more than a tie hence a small reward for 8% more decided games is not that bad ...
3. You are talking about 14.5 Points out of 3300 games, that are just 8 wins/losses for one side more (8/3300 = 0.24%) . 300 games before the end S5 was 3 Elo below its final rating and gained a lot then, that is more than the 0.24%. I don't want to complain, I just want to point out the "random" factor

For me personaly it doesn't matter. Everything within a +/- 10 Elo range is something no human can see anyhow. I know that most people look at the ranking and not at the error bar (not to talk about conditions) or how and why something happened or how likely it is - but thats something I don't mind to much anymore

Bye
Ingo

michiguel · Post by **michiguel** » Mon Jun 02, 2014 9:06 pm

IWB wrote:
michiguel wrote:
There is only one correct answer, and that is SF5 should be #1 (by a very tiny small margin, though). Why? this is a round robin, so everybody played each other in the same conditions etc. etc. so, the programs who score mores points overall should be #1. This is one of the cases in which there is no doubt about the relative order. As a reference, in the output of Ordo you can see the actual points (the others give %). Whatever program you use, the relative order should follow the number of points. Basically, SF won this gigantic RR tournament, and should be #1.

1 Stockfish 5 : 3115.1 2473.0 3300 74.9%
2 Houdini 4 : 3111.0 2458.5 3300 74.5%

Miguel
Even if I can follow your argumentation here are 3 argument which are valid as well:

1. Your argument are there for years for No. 6 and 7 or 3 and 4 or 13 and 14 but nobody cared, in contrary the draw consideration was an important argument ... and now it is wrong?

I do not understand what you mean. What argument have I had before? I am confused about those numbers 6,7 3,4 13 and 14. What are those?

If you had exactly the same opposition and got more points, how can you not have higher rating? Yes, the error bar could be bigger than the difference, but that is a different issue of precision that says that in practical terms, H and SF are about equal. I agree with that.

Miguel

2. Humans tend to value a decisive game more than a tie hence a small reward for 8% more decided games is not that bad ...
3. You are talking about 14.5 Points out of 3300 games, that are just 8 wins/losses for one side more (8/3300 = 0.24%) . 300 games before the end S5 was 3 Elo below its final rating and gained a lot then, that is more than the 0.24%. I don't want to complain, I just want to point out the "random" factor

For me personaly it doesn't matter. Everything within a +/- 10 Elo range is something no human can see anyhow. I know that most people look at the ranking and not at the error bar (not to talk about conditions) or how and why something happened or how likely it is - but thats something I don't mind to much anymore

Bye
Ingo

IWB · Post by **IWB** » Mon Jun 02, 2014 9:13 pm

michiguel wrote: I do not understand what you mean. What argument have I had before? I am confused about those numbers 6,7 3,4 13 and 14. What are those?

That should be just an example that, what now is obvious for No 1 and 2, might be the case for Engines ranked 6 and 7 or 3 and 4 or whatever pair you like in the past. Just examples where nobody cared ... and not it is important suddenly? (Because of 5 Elo which are fully in one SD ... No! It is because of the Number in front - if it is a one or a two

)

My problem is that people usually do not mind conditions but just rankings! Worse, they look for No 1, 2 and maybe 3. Thats it!

At least we agree that there is very little difference between the Tops

Bye
Ingo

Ozymandias · Post by **Ozymandias** » Mon Jun 02, 2014 9:40 pm

IWB wrote:My problem is that people usually do not mind conditions but just rankings! Worse, they look for No 1, 2 and maybe 3. Thats it!

And yet, you were asking about H4 or S5, not even Nº 3. Miguel only provided an answer based on the conditions (RR) which had produced those rankings.

IWB · Post by **IWB** » Mon Jun 02, 2014 9:44 pm

Ozymandias wrote:
And yet, you were asking about H4 or S5, not even Nº 3. Miguel only provided an answer based on the conditions (RR) which had produced those rankings.

There is not a single question in my inital posting - just some interesting points.

But you are right, much to much attention already.

Bye
Ingo

H4 or S5 !?

H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?