CEGT - rating lists September 15th 2013

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Werner
Posts: 3014
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

CEGT - rating lists September 15th 2013

Post by Werner »

Hi all, :D

our actual rating lists are online and can be found under the attached links.

40 / 20:
New games: 2058; 38 different engines
Total: 690.174

NEW Engines
575 Arasan 16.1 x64 1CPU: 2566 - 900 games (-6 to v. 16.0)
681 Redqueen 1.1.4 x64 1CPU: 2494 - 796 games (-8 to v. 1.1.3)

UPDATES
3 Stockfish 4.0 x64 4CPU: 3098 - 667 games (-1)
599 ICE 1.0 x64: 2549 - 936 games (-13)
568 Naum 2.1 x64 1CPU: 2569 - 689 games (+24)

40 / 4: (from Sept. 14th)
New games = 4.700
Total now = 1.251.207

New engines
3 Stockfish 4.0 x64 4CPU: 3133 - 1300 games (+50 to v. 2.2.2!)
96 Jonny 6.00 x64 4CPU: 2913 - 1000 games (+116 to 1CPU)
759 Tucano 3.00 x64: 2456 - 1000 games (+76 to to v.2.0 w32)

Updates
191 Jonny 6.00 x64 1CPU: 2797 - 1400 games (-9)
1266 MangoPaola 1.0: 2042 - 800 games (+3)
1278 Maverick 0.2 w32: 2001 - 1000 games (+-0)

40/120
See here our new single-list ):
http://www.husvankempen.de/nunn//40120n ... liste.html.
Last update was August 29th with 12150 games and 44 engines.
We are testing Hannibal 1.3 x64 (-9) and Toga II 3.0 (-37)

40/20 pb=on
Last update was Sept. 09th with 19.440 games and 32 different engines.
We are testing Hannibal 1.3 x64. See more in our forum.

A big „Thank you“ to all testers as usual!!

Links

40/20: http://www.husvankempen.de/nunn/rating.htm
Blitz: http://www.husvankempen.de/nunn/blitz.htm
40/120: http://www.husvankempen.de/nunn/rating120.htm
Tester: http://www.husvankempen.de/nunn/testers/testers.htm
40/20 pb=on: http://www.husvankempen.de/nunn/rating4020PBON.htm
Games of the week: http://www.husvankempen.de/nunn/40_40%2 ... on/gow.jpg

Werner Schuele
CEGT-Team
Werner
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: CEGT - rating lists September 15th 2013

Post by lkaufman »

Would it be easy for you (or someone else) to show what the CEGT 40/20 rating list would look like if we only include games among the top programs (let's say Houdini, Komodo, Stockfish, Critter, and Rybka) (best two versions of each)? I'm curious to see whether that would produce a similar list to the actual one or one that is markedly different. There seems to be quite a disparity between the rating lists and direct play results between top engines; this would be a simple way to determine if this is a real phenomenon or not.

Larry
User avatar
Werner
Posts: 3014
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

Re: CEGT - rating lists September 15th 2013

Post by Werner »

lkaufman wrote:Would it be easy for you (or someone else) to show what the CEGT 40/20 rating list would look like if we only include games among the top programs (let's say Houdini, Komodo, Stockfish, Critter, and Rybka) (best two versions of each)? I'm curious to see whether that would produce a similar list to the actual one or one that is markedly different. There seems to be quite a disparity between the rating lists and direct play results between top engines; this would be a simple way to determine if this is a real phenomenon or not.

Larry
Hi Larry,
download cegttotal.zip
copy all games with These 10 engines to a new Folder
delete double games and games against other engines:
Now we have only around 2000 games left in our 40/20 list !
Now run elostat and correct startrating :

Code: Select all

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Houdini 3 x64 1CPU             : 3049   21  21   550    58.6 %   2989   48.5 %
  2 Komodo CCT x64                 : 3013   22  22   500    53.9 %   2985   48.2 %
  3 Komodo 5.1r2 x64 1CPU          : 3009   21  21   500    53.4 %   2985   51.6 %
  4 Stockfish 4.0 x64 1CPU         : 3002   23  23   451    49.7 %   3004   50.6 %
  5 Critter 1.6 x64 1CPU           : 2985   20  20   569    47.9 %   3000   51.5 %
  6 Houdini 2.0c x64 1CPU          : 2972   33  33   217    51.4 %   2963   49.3 %
  7 Critter 1.4 x64 1CPU           : 2967   31  31   200    48.5 %   2978   58.0 %
  8 Stockfish 3.0 x64 1CPU         : 2965   22  22   450    45.9 %   2994   53.1 %
  9 Deep Rybka 4.1 x64 1CPU        : 2925   23  23   501    39.6 %   2998   44.5 %
Well - does that say more ??

Best wishes
Werner
Vinvin
Posts: 5312
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: CEGT - rating lists September 15th 2013

Post by Vinvin »

Hello Werner, when I click on "single version" ( http://www.husvankempen.de/nunn/40_40%2 ... liste.html ).
I got only 1 Houdini version in the list but several versions of other engines. Is it a bug ?

Thanks,
Vincent
User avatar
Werner
Posts: 3014
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

Re: CEGT - rating lists September 15th 2013

Post by Werner »

Werner wrote:
lkaufman wrote:Would it be easy for you (or someone else) to show what the CEGT 40/20 rating list would look like if we only include games among the top programs (let's say Houdini, Komodo, Stockfish, Critter, and Rybka) (best two versions of each)? I'm curious to see whether that would produce a similar list to the actual one or one that is markedly different. There seems to be quite a disparity between the rating lists and direct play results between top engines; this would be a simple way to determine if this is a real phenomenon or not.

Larry
Hi Larry,
download cegttotal.zip
copy all games with These 10 engines to a new Folder
delete double games and games against other engines:
Now we have only around 2000 games left in our 40/20 list !
Now run elostat and correct startrating :

Code: Select all

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Houdini 3 x64 1CPU             : 3049   21  21   550    58.6 %   2989   48.5 %
  2 Komodo CCT x64                 : 3013   22  22   500    53.9 %   2985   48.2 %
  3 Komodo 5.1r2 x64 1CPU          : 3009   21  21   500    53.4 %   2985   51.6 %
  4 Stockfish 4.0 x64 1CPU         : 3002   23  23   451    49.7 %   3004   50.6 %
  5 Critter 1.6 x64 1CPU           : 2985   20  20   569    47.9 %   3000   51.5 %
  6 Houdini 2.0c x64 1CPU          : 2972   33  33   217    51.4 %   2963   49.3 %
  7 Critter 1.4 x64 1CPU           : 2967   31  31   200    48.5 %   2978   58.0 %
  8 Stockfish 3.0 x64 1CPU         : 2965   22  22   450    45.9 %   2994   53.1 %
  9 Deep Rybka 4.1 x64 1CPU        : 2925   23  23   501    39.6 %   2998   44.5 %
Well - does that say more ??

Best wishes
Werner
Here is the same list with the blitz-results. Startelo tuned to give Houdini 3 the same Rating as in our blitz-list:

Code: Select all

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Houdini 3.0 x64 1CPU           : 3076   21  20   600    57.4 %   3024   45.8 %
  2 Stockfish 4.0 x64 1CPU         : 3062   21  21   503    53.7 %   3037   50.9 %
  3 Komodo CCT x64 1CPU            : 3051   22  22   501    53.7 %   3025   48.3 %
  4 Komodo 5.1r2 x64 1CPU          : 3045   22  22   500    52.9 %   3025   48.6 %
  5 Critter 1.6 x64 1CPU           : 3036   17  17   804    51.7 %   3024   52.4 %
  6 Houdini 2.0c x64 1CPU          : 3033   35  35   200    54.0 %   3006   48.0 %
  7 Critter 1.4 x64 1CPU           : 2986   50  50   100    51.5 %   2975   47.0 %
  8 Stockfish 3.0 x64 1CPU         : 2976   21  21   500    41.3 %   3037   51.4 %
  9 Rybka 4.0 x64 1CPU             : 2975   17  18   802    41.7 %   3033   47.3 %
 10 Rybka 4.1 x64 1CPU             : 2974   41  43   102    41.2 %   3036   60.8 %
Werner
User avatar
Werner
Posts: 3014
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

Re: CEGT - rating lists September 15th 2013

Post by Werner »

Vinvin wrote:Hello Werner, when I click on "single version" ( http://www.husvankempen.de/nunn/40_40%2 ... liste.html ).
I got only 1 Houdini version in the list but several versions of other engines. Is it a bug ?
Thanks,
Vincent
Sorry, this list is not consistent. Could be better..
Werner
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: CEGT - rating lists September 15th 2013

Post by Lyudmil Tsvetkov »

Werner wrote:
Werner wrote:
lkaufman wrote:Would it be easy for you (or someone else) to show what the CEGT 40/20 rating list would look like if we only include games among the top programs (let's say Houdini, Komodo, Stockfish, Critter, and Rybka) (best two versions of each)? I'm curious to see whether that would produce a similar list to the actual one or one that is markedly different. There seems to be quite a disparity between the rating lists and direct play results between top engines; this would be a simple way to determine if this is a real phenomenon or not.

Larry
Hi Larry,
download cegttotal.zip
copy all games with These 10 engines to a new Folder
delete double games and games against other engines:
Now we have only around 2000 games left in our 40/20 list !
Now run elostat and correct startrating :

Code: Select all

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Houdini 3 x64 1CPU             : 3049   21  21   550    58.6 %   2989   48.5 %
  2 Komodo CCT x64                 : 3013   22  22   500    53.9 %   2985   48.2 %
  3 Komodo 5.1r2 x64 1CPU          : 3009   21  21   500    53.4 %   2985   51.6 %
  4 Stockfish 4.0 x64 1CPU         : 3002   23  23   451    49.7 %   3004   50.6 %
  5 Critter 1.6 x64 1CPU           : 2985   20  20   569    47.9 %   3000   51.5 %
  6 Houdini 2.0c x64 1CPU          : 2972   33  33   217    51.4 %   2963   49.3 %
  7 Critter 1.4 x64 1CPU           : 2967   31  31   200    48.5 %   2978   58.0 %
  8 Stockfish 3.0 x64 1CPU         : 2965   22  22   450    45.9 %   2994   53.1 %
  9 Deep Rybka 4.1 x64 1CPU        : 2925   23  23   501    39.6 %   2998   44.5 %
Well - does that say more ??

Best wishes
Werner
Here is the same list with the blitz-results. Startelo tuned to give Houdini 3 the same Rating as in our blitz-list:

Code: Select all

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Houdini 3.0 x64 1CPU           : 3076   21  20   600    57.4 %   3024   45.8 %
  2 Stockfish 4.0 x64 1CPU         : 3062   21  21   503    53.7 %   3037   50.9 %
  3 Komodo CCT x64 1CPU            : 3051   22  22   501    53.7 %   3025   48.3 %
  4 Komodo 5.1r2 x64 1CPU          : 3045   22  22   500    52.9 %   3025   48.6 %
  5 Critter 1.6 x64 1CPU           : 3036   17  17   804    51.7 %   3024   52.4 %
  6 Houdini 2.0c x64 1CPU          : 3033   35  35   200    54.0 %   3006   48.0 %
  7 Critter 1.4 x64 1CPU           : 2986   50  50   100    51.5 %   2975   47.0 %
  8 Stockfish 3.0 x64 1CPU         : 2976   21  21   500    41.3 %   3037   51.4 %
  9 Rybka 4.0 x64 1CPU             : 2975   17  18   802    41.7 %   3033   47.3 %
 10 Rybka 4.1 x64 1CPU             : 2974   41  43   102    41.2 %   3036   60.8 %
Hi, thanks for the stats.

I do not know to whom I shall refer my question, I probably should not have asked it at all, but still: the CEGT tests show that in blitz Stockfish 4 has gained 86 elo in comparison to Stockfish 3, while in rapid (40/20 should be rapid) just 37 elo. That is more than 2 times lower elo gain, and considerable at that.
Does that suggest that there is the probability that under the newly elaborated testing and development framework Stockfish suddenly started being less scalable? Any ideas if this might be so, and why?

The 3 Champs of Clemens with a version of Stockfish close to Stockfish 4, compared to Ingo's blitz results somehow also imply something like this.

Does someone have any additional results with different TC that would support or reject this hypothesis? (but not the scalability measurements done under the Stock framework, with extremely fast TC only very slightly increased, no where near to even blitz). Is it possible that Stockfish already does not scale so well with normal and very long TC?

Best, Lyudmil
Modern Times
Posts: 3799
Joined: Thu Jun 07, 2012 11:02 pm

Re: CEGT - rating lists September 15th 2013

Post by Modern Times »

CCRL shows +35 Elo at 40/40 and +54 Elo at blitz. But with all these numbers, you have to be careful because of the error margins. I'm not a statistics expert, but the Elo gain could just as easily be the same for both.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: CEGT - rating lists September 15th 2013

Post by Lyudmil Tsvetkov »

Modern Times wrote:CCRL shows +35 Elo at 40/40 and +54 Elo at blitz. But with all these numbers, you have to be careful because of the error margins. I'm not a statistics expert, but the Elo gain could just as easily be the same for both.
Thanks Ray.
Do you have by chance the data for the same time control of Stockfish 3 and, say, Stockfish 2.2.1 (or some other earlier version)?
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: CEGT - rating lists September 15th 2013

Post by Adam Hair »

Lyudmil Tsvetkov wrote:
Modern Times wrote:CCRL shows +35 Elo at 40/40 and +54 Elo at blitz. But with all these numbers, you have to be careful because of the error margins. I'm not a statistics expert, but the Elo gain could just as easily be the same for both.
Thanks Ray.
Do you have by chance the data for the same time control of Stockfish 3 and, say, Stockfish 2.2.1 (or some other earlier version)?
Hi Lyudmil,

The ratings for every member of the Glaurung/Stockfish family that have been tested and included in the CCRL database can be found here:

http://www.computerchess.org.uk/ccrl/40 ... +opponents

Clicking on an individual engine will bring up its match statistics.