H4 or S5 !?

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

H4 or S5 !?

Post by IWB »

Hello all,

This is quite interesting:

The official method for the IPON is Bayeselo with mm 0 1, draw rate consideration. The pure TOP 16 one on one looks like this:

Code: Select all

   1 Houdini 4           3111    9    9  3300   75%  2921   31% 
   2 Stockfish 5         3106    9    8  3300   75%  2921   39% 
   3 Komodo 7a           3088    9    9  3300   72%  2922   37% 
   4 Gull 3              3057    8    8  3300   68%  2924   41% 
   5 Critter 1.4a        2980    8    8  3300   57%  2930   46% 
   6 Equinox 2.02        2975    8    8  3300   56%  2930   47% 
   7 Deep Rybka 4.1      2959    8    8  3300   54%  2931   45% 
   8 Deep Fritz 14       2894    8    8  3300   44%  2935   45% 
   9 Chiron 2            2889    8    8  3300   44%  2936   45% 
  10 Protector 1.6.0     2870    8    8  3300   41%  2937   44% 
  11 Hannibal 1.4b       2870    8    8  3300   41%  2937   43% 
  12 Naum 4.2            2838    8    9  3300   36%  2939   41% 
  13 Texel 1.04          2838    8    8  3300   37%  2939   38% 
  14 Senpai 1.0          2838    8    8  3300   36%  2939   41% 
  15 HIARCS 14 WCSC 32b  2812    9    9  3300   33%  2941   37% 
  16 Jonny 6.00          2798    9    9  3300   31%  2942   36%
The same set of data with Bayes default:

Code: Select all

   1 Houdini 4           3111   11   11  3300   75%  2931   31% 
   2 Stockfish 5         3105   10   10  3300   75%  2931   39% 
   3 Komodo 7a           3088   10   10  3300   72%  2932   37% 
   4 Gull 3              3057   10   10  3300   68%  2934   41% 
   5 Critter 1.4a        2984   10    9  3300   57%  2939   46% 
   6 Equinox 2.02        2980    9   10  3300   56%  2939   47% 
   7 Deep Rybka 4.1      2964   10   10  3300   54%  2940   45% 
   8 Deep Fritz 14       2905    9   10  3300   44%  2944   45% 
   9 Chiron 2            2900   10   10  3300   44%  2945   45% 
  10 Protector 1.6.0     2883   10   10  3300   41%  2946   44% 
  11 Hannibal 1.4b       2883   10   10  3300   41%  2946   43% 
  12 Naum 4.2            2854   10   10  3300   36%  2948   41% 
  13 Texel 1.04          2854   10   10  3300   37%  2948   38% 
  14 Senpai 1.0          2853   10   10  3300   36%  2948   41% 
  15 HIARCS 14 WCSC 32b  2830   10   10  3300   33%  2949   37% 
  16 Jonny 6.00          2816   10   10  3300   31%  2950   36%
Now with Elostat:

Code: Select all

  1 Stockfish 5                    : 3115   10  10  3300    74.9 %   2924   38.6 %
  2 Houdini 4                      : 3111   11  10  3300    74.5 %   2925   30.7 %
  3 Komodo 7a                      : 3091   10  10  3300    72.1 %   2926   37.0 %
  4 Gull 3                         : 3059    9   9  3300    68.0 %   2928   41.0 %
  5 Critter 1.4a                   : 2982    9   9  3300    57.0 %   2933   46.1 %
  6 Equinox 2.02                   : 2978    9   9  3300    56.3 %   2933   46.9 %
  7 Deep Rybka 4.1                 : 2962    9   9  3300    53.9 %   2935   45.2 %
  8 Deep Fritz 14                  : 2899    9   9  3300    44.4 %   2939   44.9 %
  9 Chiron 2                       : 2894    9   9  3300    43.5 %   2939   45.1 %
 10 Protector 1.6.0                : 2877    9   9  3300    40.9 %   2940   44.1 %
 11 Hannibal 1.4b                  : 2875    9   9  3300    40.7 %   2940   42.6 %
 12 Texel 1.04                     : 2846    9   9  3300    36.5 %   2942   38.5 %
 13 Naum 4.2                       : 2845    9   9  3300    36.4 %   2942   40.9 %
 14 Senpai 1.0                     : 2845    9   9  3300    36.3 %   2942   40.7 %
 15 HIARCS 14 WCSC 32b             : 2822   10  10  3300    33.2 %   2944   37.5 %
 16 Jonny 6.00                     : 2808   10  10  3300    31.2 %   2945   35.7 %
and finaly with ORDO:

Code: Select all

   # PLAYER                : RATING    POINTS  PLAYED    (%)
   1 Stockfish 5           : 3115.1    2473.0    3300   74.9%
   2 Houdini 4             : 3111.0    2458.5    3300   74.5%
   3 Komodo 7a             : 3089.3    2379.0    3300   72.1%
   4 Gull 3                : 3054.9    2245.5    3300   68.0%
   5 Critter 1.4a          : 2968.9    1882.0    3300   57.0%
   6 Equinox 2.02          : 2963.8    1859.5    3300   56.3%
   7 Deep Rybka 4.1        : 2945.6    1778.5    3300   53.9%
   8 Deep Fritz 14         : 2875.7    1464.5    3300   44.4%
   9 Chiron 2              : 2869.4    1436.5    3300   43.5%
  10 Protector 1.6.0       : 2850.1    1351.0    3300   40.9%
  11 Hannibal 1.4b         : 2848.3    1343.0    3300   40.7%
  12 Texel 1.04            : 2816.4    1204.5    3300   36.5%
  13 Naum 4.2              : 2815.5    1200.5    3300   36.4%
  14 Senpai 1.0            : 2814.9    1198.0    3300   36.3%
  15 HIARCS 14 WCSC 32b    : 2790.6    1096.0    3300   33.2%
  16 Jonny 6.00            : 2774.4    1030.0    3300   31.2%
That is very good, as everyone can take the list he likes :-)

Regards
Ingo
Modern Times
Posts: 3546
Joined: Thu Jun 07, 2012 11:02 pm

Re: H4 or S5 !?

Post by Modern Times »

Very interesting !

Yes indeed, take your pick.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: H4 or S5 !?

Post by michiguel »

Ingo,

You used to provide the pgn file with only the results. Can you do that again? In that way, we can toy around with the rating programs and/or algorithms.

Miguel
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: H4 or S5 !?

Post by michiguel »

IWB wrote:Hello all,

This is quite interesting:

The official method for the IPON is Bayeselo with mm 0 1, draw rate consideration. The pure TOP 16 one on one looks like this:

Code: Select all

   1 Houdini 4           3111    9    9  3300   75%  2921   31% 
   2 Stockfish 5         3106    9    8  3300   75%  2921   39% 
   3 Komodo 7a           3088    9    9  3300   72%  2922   37% 
   4 Gull 3              3057    8    8  3300   68%  2924   41% 
   5 Critter 1.4a        2980    8    8  3300   57%  2930   46% 
   6 Equinox 2.02        2975    8    8  3300   56%  2930   47% 
   7 Deep Rybka 4.1      2959    8    8  3300   54%  2931   45% 
   8 Deep Fritz 14       2894    8    8  3300   44%  2935   45% 
   9 Chiron 2            2889    8    8  3300   44%  2936   45% 
  10 Protector 1.6.0     2870    8    8  3300   41%  2937   44% 
  11 Hannibal 1.4b       2870    8    8  3300   41%  2937   43% 
  12 Naum 4.2            2838    8    9  3300   36%  2939   41% 
  13 Texel 1.04          2838    8    8  3300   37%  2939   38% 
  14 Senpai 1.0          2838    8    8  3300   36%  2939   41% 
  15 HIARCS 14 WCSC 32b  2812    9    9  3300   33%  2941   37% 
  16 Jonny 6.00          2798    9    9  3300   31%  2942   36%
The same set of data with Bayes default:

Code: Select all

   1 Houdini 4           3111   11   11  3300   75%  2931   31% 
   2 Stockfish 5         3105   10   10  3300   75%  2931   39% 
   3 Komodo 7a           3088   10   10  3300   72%  2932   37% 
   4 Gull 3              3057   10   10  3300   68%  2934   41% 
   5 Critter 1.4a        2984   10    9  3300   57%  2939   46% 
   6 Equinox 2.02        2980    9   10  3300   56%  2939   47% 
   7 Deep Rybka 4.1      2964   10   10  3300   54%  2940   45% 
   8 Deep Fritz 14       2905    9   10  3300   44%  2944   45% 
   9 Chiron 2            2900   10   10  3300   44%  2945   45% 
  10 Protector 1.6.0     2883   10   10  3300   41%  2946   44% 
  11 Hannibal 1.4b       2883   10   10  3300   41%  2946   43% 
  12 Naum 4.2            2854   10   10  3300   36%  2948   41% 
  13 Texel 1.04          2854   10   10  3300   37%  2948   38% 
  14 Senpai 1.0          2853   10   10  3300   36%  2948   41% 
  15 HIARCS 14 WCSC 32b  2830   10   10  3300   33%  2949   37% 
  16 Jonny 6.00          2816   10   10  3300   31%  2950   36%
Now with Elostat:

Code: Select all

  1 Stockfish 5                    : 3115   10  10  3300    74.9 %   2924   38.6 %
  2 Houdini 4                      : 3111   11  10  3300    74.5 %   2925   30.7 %
  3 Komodo 7a                      : 3091   10  10  3300    72.1 %   2926   37.0 %
  4 Gull 3                         : 3059    9   9  3300    68.0 %   2928   41.0 %
  5 Critter 1.4a                   : 2982    9   9  3300    57.0 %   2933   46.1 %
  6 Equinox 2.02                   : 2978    9   9  3300    56.3 %   2933   46.9 %
  7 Deep Rybka 4.1                 : 2962    9   9  3300    53.9 %   2935   45.2 %
  8 Deep Fritz 14                  : 2899    9   9  3300    44.4 %   2939   44.9 %
  9 Chiron 2                       : 2894    9   9  3300    43.5 %   2939   45.1 %
 10 Protector 1.6.0                : 2877    9   9  3300    40.9 %   2940   44.1 %
 11 Hannibal 1.4b                  : 2875    9   9  3300    40.7 %   2940   42.6 %
 12 Texel 1.04                     : 2846    9   9  3300    36.5 %   2942   38.5 %
 13 Naum 4.2                       : 2845    9   9  3300    36.4 %   2942   40.9 %
 14 Senpai 1.0                     : 2845    9   9  3300    36.3 %   2942   40.7 %
 15 HIARCS 14 WCSC 32b             : 2822   10  10  3300    33.2 %   2944   37.5 %
 16 Jonny 6.00                     : 2808   10  10  3300    31.2 %   2945   35.7 %
and finaly with ORDO:

Code: Select all

   # PLAYER                : RATING    POINTS  PLAYED    (%)
   1 Stockfish 5           : 3115.1    2473.0    3300   74.9%
   2 Houdini 4             : 3111.0    2458.5    3300   74.5%
   3 Komodo 7a             : 3089.3    2379.0    3300   72.1%
   4 Gull 3                : 3054.9    2245.5    3300   68.0%
   5 Critter 1.4a          : 2968.9    1882.0    3300   57.0%
   6 Equinox 2.02          : 2963.8    1859.5    3300   56.3%
   7 Deep Rybka 4.1        : 2945.6    1778.5    3300   53.9%
   8 Deep Fritz 14         : 2875.7    1464.5    3300   44.4%
   9 Chiron 2              : 2869.4    1436.5    3300   43.5%
  10 Protector 1.6.0       : 2850.1    1351.0    3300   40.9%
  11 Hannibal 1.4b         : 2848.3    1343.0    3300   40.7%
  12 Texel 1.04            : 2816.4    1204.5    3300   36.5%
  13 Naum 4.2              : 2815.5    1200.5    3300   36.4%
  14 Senpai 1.0            : 2814.9    1198.0    3300   36.3%
  15 HIARCS 14 WCSC 32b    : 2790.6    1096.0    3300   33.2%
  16 Jonny 6.00            : 2774.4    1030.0    3300   31.2%
That is very good, as everyone can take the list he likes :-)

Regards
Ingo
There is only one correct answer, and that is SF5 should be #1 (by a very tiny small margin, though). Why? this is a round robin, so everybody played each other in the same conditions etc. etc. so, the programs who score mores points overall should be #1. This is one of the cases in which there is no doubt about the relative order. As a reference, in the output of Ordo you can see the actual points (the others give %). Whatever program you use, the relative order should follow the number of points. Basically, SF won this gigantic RR tournament, and should be #1.

1 Stockfish 5 : 3115.1 2473.0 3300 74.9%
2 Houdini 4 : 3111.0 2458.5 3300 74.5%

Miguel
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: H4 or S5 !?

Post by IWB »

Hi
michiguel wrote:Ingo,

You used to provide the pgn file with only the results. Can you do that again? In that way, we can toy around with the rating programs and/or algorithms.

Miguel
There was little interest in it but as this is interesting ...:

http://www.inwoba.de/TOPRES.7z

I will delete this in a few days.

Bye
Ingo
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: H4 or S5 !?

Post by IWB »

michiguel wrote:
There is only one correct answer, and that is SF5 should be #1 (by a very tiny small margin, though). Why? this is a round robin, so everybody played each other in the same conditions etc. etc. so, the programs who score mores points overall should be #1. This is one of the cases in which there is no doubt about the relative order. As a reference, in the output of Ordo you can see the actual points (the others give %). Whatever program you use, the relative order should follow the number of points. Basically, SF won this gigantic RR tournament, and should be #1.

1 Stockfish 5 : 3115.1 2473.0 3300 74.9%
2 Houdini 4 : 3111.0 2458.5 3300 74.5%

Miguel
Even if I can follow your argumentation here are 3 argument which are valid as well:

1. Your argument are there for years for No. 6 and 7 or 3 and 4 or 13 and 14 but nobody cared, in contrary the draw consideration was an important argument ... and now it is wrong?
2. Humans tend to value a decisive game more than a tie hence a small reward for 8% more decided games is not that bad ...
3. You are talking about 14.5 Points out of 3300 games, that are just 8 wins/losses for one side more (8/3300 = 0.24%) . 300 games before the end S5 was 3 Elo below its final rating and gained a lot then, that is more than the 0.24%. I don't want to complain, I just want to point out the "random" factor :-)

For me personaly it doesn't matter. Everything within a +/- 10 Elo range is something no human can see anyhow. I know that most people look at the ranking and not at the error bar (not to talk about conditions) or how and why something happened or how likely it is - but thats something I don't mind to much anymore :-)

Bye
Ingo
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: H4 or S5 !?

Post by michiguel »

IWB wrote:
michiguel wrote:
There is only one correct answer, and that is SF5 should be #1 (by a very tiny small margin, though). Why? this is a round robin, so everybody played each other in the same conditions etc. etc. so, the programs who score mores points overall should be #1. This is one of the cases in which there is no doubt about the relative order. As a reference, in the output of Ordo you can see the actual points (the others give %). Whatever program you use, the relative order should follow the number of points. Basically, SF won this gigantic RR tournament, and should be #1.

1 Stockfish 5 : 3115.1 2473.0 3300 74.9%
2 Houdini 4 : 3111.0 2458.5 3300 74.5%

Miguel
Even if I can follow your argumentation here are 3 argument which are valid as well:

1. Your argument are there for years for No. 6 and 7 or 3 and 4 or 13 and 14 but nobody cared, in contrary the draw consideration was an important argument ... and now it is wrong?
I do not understand what you mean. What argument have I had before? I am confused about those numbers 6,7 3,4 13 and 14. What are those?

If you had exactly the same opposition and got more points, how can you not have higher rating? Yes, the error bar could be bigger than the difference, but that is a different issue of precision that says that in practical terms, H and SF are about equal. I agree with that.

Miguel
2. Humans tend to value a decisive game more than a tie hence a small reward for 8% more decided games is not that bad ...
3. You are talking about 14.5 Points out of 3300 games, that are just 8 wins/losses for one side more (8/3300 = 0.24%) . 300 games before the end S5 was 3 Elo below its final rating and gained a lot then, that is more than the 0.24%. I don't want to complain, I just want to point out the "random" factor :-)

For me personaly it doesn't matter. Everything within a +/- 10 Elo range is something no human can see anyhow. I know that most people look at the ranking and not at the error bar (not to talk about conditions) or how and why something happened or how likely it is - but thats something I don't mind to much anymore :-)

Bye
Ingo
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: H4 or S5 !?

Post by IWB »

michiguel wrote: I do not understand what you mean. What argument have I had before? I am confused about those numbers 6,7 3,4 13 and 14. What are those?
:-)

That should be just an example that, what now is obvious for No 1 and 2, might be the case for Engines ranked 6 and 7 or 3 and 4 or whatever pair you like in the past. Just examples where nobody cared ... and not it is important suddenly? (Because of 5 Elo which are fully in one SD ... No! It is because of the Number in front - if it is a one or a two ;-) )

My problem is that people usually do not mind conditions but just rankings! Worse, they look for No 1, 2 and maybe 3. Thats it!

At least we agree that there is very little difference between the Tops :-)

Bye
Ingo
User avatar
Ozymandias
Posts: 1533
Joined: Sun Oct 25, 2009 2:30 am

Re: H4 or S5 !?

Post by Ozymandias »

IWB wrote:My problem is that people usually do not mind conditions but just rankings! Worse, they look for No 1, 2 and maybe 3. Thats it!
And yet, you were asking about H4 or S5, not even Nº 3. Miguel only provided an answer based on the conditions (RR) which had produced those rankings.
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: H4 or S5 !?

Post by IWB »

Ozymandias wrote:
And yet, you were asking about H4 or S5, not even Nº 3. Miguel only provided an answer based on the conditions (RR) which had produced those rankings.
There is not a single question in my inital posting - just some interesting points.

But you are right, much to much attention already.

Bye
Ingo