CEGT - rating lists June 08th 2014

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

User avatar
Werner
Posts: 2871
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

CEGT - rating lists June 08th 2014

Post by Werner »

Hi all, :D

our actual rating lists are online and can be found under the attached links.

40 / 20:
New games: 1794; 18 different engines
Total: 744.841

NEW Engines
1 Stockfish 5.0 x64 4CPU: 3189 - 243 games (+45 to v. DD)
13 Stockfish 5.0 x64 1CPU: 3088 - 1090 games (+25 to v. DD)

UPDATES
2 Komodo 7.0 x64 4CPU: 3174 - 832 games (+9)
18 Komodo 7.0 x64 1CPU: 3066 - 1252 games (-2)
5 Gull 3.0 x64 4CPU: 3141 - 664 games (-7)

Following situation today:

Code: Select all

1 Stockfish 5.0 x64 4CPU 3189 +27 -27 243  
2 Komodo 7.0 x64 4CPU 3174 +15 -15 832  
3 Houdini 4.0 x64 4CPU 3154 +11 -11 2044

Code: Select all

1 Stockfish 5.0 x64 1CPU 3088 +15 -15 1090 
2 Houdini 4.0 x64 1CPU 3077 +14 -14 1334 
3 Komodo 7.0 x64 1CPU 3066 +14 -14 1252 
40 / 4: from last Saturday
New games = 14.938
Total now = 1.375.809

New engines
1358 Beowulf 2.4: 2051 - 775 games (-)
937 Cheese 1.6 x64: 2419 - 900 games (+44 to v. 1.5)
493 ExChess 7.26b x64 1CPU: 2639 - 1000 games (+38 to v. 7.11b)
477 Fizbo 1.1 x64: 2645 - 1100 games (new entry)
8 Gull 3.0 x64 4CPU: 3130 - 1010 games (+15 to v. 2.8b)
20 Houdini 4.0 w32 1CPU: 3091 - 1000 games (new entry)
2 Komodo 7.0a x64 4CPU: 3190 - 100 games (startrating)
24 Komodo 7.0a x64 1CPU: 3080 - 2000 games (+17 to v. TCEC)
1360 Orion 0.1: 2048 - 1100 games (-)
827 Rodin 7.0: 2464 - 1100 games (+70 to v. 6.0)
1 Stockfish 5.0 x64 4CPU: 3208 - 300 games (startrating)
10 Stockfish 5.0 x64 1CPU: 3125 - 1900 games (+52 to v. DD)
18 Stockfish 5.0 x64 2CPU: 3094 - 310 games (startrating)
180 Texel 1.04 x64 1CPU: 2848 - 1000 games (+67 to v. 1.04)

Updates
15 Houdini 4.0 x64 1CPU: 3098 - 2800 games (-8)
85 Protector 1.6.0 x64 4CPU: 2974 - 1000 games (+23)
162 Protector 1.6.0 x64 1CPU: 2880 - 2100 games (+1)

Following situation today:

Code: Select all

1 Stockfish 5.0 x64 4CPU 3208 +28 -28 300 
2 Komodo 7.0a x64 4CPU 3190 +47 -47 100  
3 Houdini 4.0 x64 4CPU 3189 +13 -13 2400 

Code: Select all

1 Stockfish 5.0 x64 1CPU 3124 +12 -12 1900 
2 Houdini 4.0 x64 1CPU 3098 +10 -10 2800 
3 Komodo 7.0a x64 1CPU 3079 +11 -11 2000 
40/120
See here our single-list ):
http://www.husvankempen.de/nunn//40120n ... liste.html.
Last update was April 03rd:

5'+3'' pb=on

Code: Select all

1 Houdini 4.0 x64 3108 15 15 1700 
2 Stockfish 5.0 x64 3102 out of 1300 games now
3 Komodo 7.0a x64 3070 14 14 1600 
A big „Thank you“ to all testers as usual!!

Links

40/20: http://www.husvankempen.de/nunn/rating.htm
Blitz: http://www.husvankempen.de/nunn/blitz.htm
40/120: http://www.husvankempen.de/nunn/rating120.htm
Tester: http://www.husvankempen.de/nunn/testers/testers.htm
40/20 pb=on: http://www.husvankempen.de/nunn/rating4020PBON.htm
5+3 pb=on: http://www.husvankempen.de/nunn/rating5plus3pbon.htm
Games of the week: http://www.husvankempen.de/nunn/40_40%2 ... on/gow.jpg

Werner Schuele
CEGT-Team
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: CEGT - rating lists June 08th 2014

Post by lkaufman »

Werner wrote:Hi all, :D

our actual rating lists are online and can be found under the attached links.

40 / 20:
New games: 1794; 18 different engines
Total: 744.841

NEW Engines
1 Stockfish 5.0 x64 4CPU: 3189 - 243 games (+45 to v. DD)
13 Stockfish 5.0 x64 1CPU: 3088 - 1090 games (+25 to v. DD)

UPDATES
2 Komodo 7.0 x64 4CPU: 3174 - 832 games (+9)
18 Komodo 7.0 x64 1CPU: 3066 - 1252 games (-2)
5 Gull 3.0 x64 4CPU: 3141 - 664 games (-7)

Following situation today:

Code: Select all

1 Stockfish 5.0 x64 4CPU 3189 +27 -27 243  
2 Komodo 7.0 x64 4CPU 3174 +15 -15 832  
3 Houdini 4.0 x64 4CPU 3154 +11 -11 2044

Code: Select all

1 Stockfish 5.0 x64 1CPU 3088 +15 -15 1090 
2 Houdini 4.0 x64 1CPU 3077 +14 -14 1334 
3 Komodo 7.0 x64 1CPU 3066 +14 -14 1252 
40 / 4: from last Saturday
New games = 14.938
Total now = 1.375.809

New engines
1358 Beowulf 2.4: 2051 - 775 games (-)
937 Cheese 1.6 x64: 2419 - 900 games (+44 to v. 1.5)
493 ExChess 7.26b x64 1CPU: 2639 - 1000 games (+38 to v. 7.11b)
477 Fizbo 1.1 x64: 2645 - 1100 games (new entry)
8 Gull 3.0 x64 4CPU: 3130 - 1010 games (+15 to v. 2.8b)
20 Houdini 4.0 w32 1CPU: 3091 - 1000 games (new entry)
2 Komodo 7.0a x64 4CPU: 3190 - 100 games (startrating)
24 Komodo 7.0a x64 1CPU: 3080 - 2000 games (+17 to v. TCEC)
1360 Orion 0.1: 2048 - 1100 games (-)
827 Rodin 7.0: 2464 - 1100 games (+70 to v. 6.0)
1 Stockfish 5.0 x64 4CPU: 3208 - 300 games (startrating)
10 Stockfish 5.0 x64 1CPU: 3125 - 1900 games (+52 to v. DD)
18 Stockfish 5.0 x64 2CPU: 3094 - 310 games (startrating)
180 Texel 1.04 x64 1CPU: 2848 - 1000 games (+67 to v. 1.04)

Updates
15 Houdini 4.0 x64 1CPU: 3098 - 2800 games (-8)
85 Protector 1.6.0 x64 4CPU: 2974 - 1000 games (+23)
162 Protector 1.6.0 x64 1CPU: 2880 - 2100 games (+1)

Following situation today:

Code: Select all

1 Stockfish 5.0 x64 4CPU 3208 +28 -28 300 
2 Komodo 7.0a x64 4CPU 3190 +47 -47 100  
3 Houdini 4.0 x64 4CPU 3189 +13 -13 2400 

Code: Select all

1 Stockfish 5.0 x64 1CPU 3124 +12 -12 1900 
2 Houdini 4.0 x64 1CPU 3098 +10 -10 2800 
3 Komodo 7.0a x64 1CPU 3079 +11 -11 2000 
40/120
See here our single-list ):
http://www.husvankempen.de/nunn//40120n ... liste.html.
Last update was April 03rd:

5'+3'' pb=on

Code: Select all

1 Houdini 4.0 x64 3108 15 15 1700 
2 Stockfish 5.0 x64 3102 out of 1300 games now
3 Komodo 7.0a x64 3070 14 14 1600 
A big „Thank you“ to all testers as usual!!

Links

40/20: http://www.husvankempen.de/nunn/rating.htm
Blitz: http://www.husvankempen.de/nunn/blitz.htm
40/120: http://www.husvankempen.de/nunn/rating120.htm
Tester: http://www.husvankempen.de/nunn/testers/testers.htm
40/20 pb=on: http://www.husvankempen.de/nunn/rating4020PBON.htm
5+3 pb=on: http://www.husvankempen.de/nunn/rating5plus3pbon.htm
Games of the week: http://www.husvankempen.de/nunn/40_40%2 ... on/gow.jpg

Werner Schuele
CEGT-Team
It is nice to see Komodo 7 leading Houdini 4 by twenty points on your 40/20 four-cpu list, because the margin of error for the difference is 18.6 points (calculated by the square root of 15 squared plus 11 squared, which I believe is the correct way to do this). So we can now claim that we have passed Houdini 4 under these conditions with 95% confidence.
I would like to ask you what is the average actual time limit used in these 40/20 games? I understand that the 40/20 figure is based on obsolete hardware and that faster time limits are used on modern machines.
Wolfgang
Posts: 895
Joined: Sat May 13, 2006 1:08 am

Re: CEGT - rating lists June 08th 2014

Post by Wolfgang »

Hi Larry,

on my i7-4770k @ 3,4 GHZ I use 40/8 and IIRC Werner and Johann (our main 40/20 testers) have similar machines.

Best
Wolfgang
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: CEGT - rating lists June 08th 2014

Post by lkaufman »

Wolfgang wrote:Hi Larry,

on my i7-4770k @ 3,4 GHZ I use 40/8 and IIRC Werner and Johann (our main 40/20 testers) have similar machines.

Best
Wolfgang
Thanks! So it's practically the same level as the 5' plus 3" list if you allow for ponder on in that list. A bit longer, but not significantly so.
This makes the dramatic difference between those two lists most interesting! It suggests to me that there is something wrong with Komodo's time management in increment games, which apparently is not so in 40/x controls. I already wrote that I suspected time management in increment play with SF could be a problem, and this confirms it. Another possibility is that there is something wrong with our time management with ponder on, although I don't think so. Of course the 4 cpu vs 1 cpu lists complicate the picture. It certainly appears that Komodo does better (relative to Houdini) on 4 cpu than on 1.
By the way, maybe it's time to consider changing the name of the 40/20 list to 40/8 if that is what is used on all the I7 machines, since nowadays I think the I7 is very common and is pretty much the standard. It really is a blitz rating now, and the 40/4 list is more like a bullet list. Of course all the lists would have to be appropriately titled.
Hugo
Posts: 782
Joined: Tue Dec 01, 2009 11:10 am

Re: CEGT - rating lists June 08th 2014

Post by Hugo »

lkaufman wrote:
Wolfgang wrote:Hi Larry,

on my i7-4770k @ 3,4 GHZ I use 40/8 and IIRC Werner and Johann (our main 40/20 testers) have similar machines.

Best
Wolfgang
Thanks! So it's practically the same level as the 5' plus 3" list if you allow for ponder on in that list. A bit longer, but not significantly so.
This makes the dramatic difference between those two lists most interesting! It suggests to me that there is something wrong with Komodo's time management in increment games, which apparently is not so in 40/x controls. I already wrote that I suspected time management in increment play with SF could be a problem, and this confirms it. Another possibility is that there is something wrong with our time management with ponder on, although I don't think so. Of course the 4 cpu vs 1 cpu lists complicate the picture. It certainly appears that Komodo does better (relative to Houdini) on 4 cpu than on 1.
By the way, maybe it's time to consider changing the name of the 40/20 list to 40/8 if that is what is used on all the I7 machines, since nowadays I think the I7 is very common and is pretty much the standard. It really is a blitz rating now, and the 40/4 list is more like a bullet list. Of course all the lists would have to be appropriately titled.
First thing which is wrong in Komodo:
it moves much too fast in the first move when out of book.
btw:
here is my private list of 14 engines.
10min + 10 sec per game
ponder OFF
all engines 6 cpu:

Code: Select all

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Stockfish 5 64 SSE4.2 x6       : 3152   22  22   650    78.0 %   2932   36.6 %
  2 Houdini 4 Pro x64 x6           : 3103   22  22   650    72.4 %   2936   35.8 %
  3 Komodo 7 64-bit x6             : 3095   21  21   650    71.4 %   2937   41.8 %
  4 Gull 3 x64 x6                  : 3084   21  21   650    69.9 %   2937   41.7 %
  5 Deep Rybka 4.1 SSE42 x64 x6    : 2980   20  20   650    55.0 %   2945   46.3 %
  6 Fire 3.0 x64 x6                : 2975   19  19   650    54.2 %   2946   48.2 %
  7 Chiron 2 64bit x6              : 2918   19  20   650    45.3 %   2950   46.9 %
  8 Hannibal 1.4bx64 x6            : 2910   19  19   650    44.2 %   2951   49.4 %
  9 Protector 1.6.0 x64 x6         : 2890   20  20   650    41.1 %   2952   45.2 %
 10 Deep HIARCS 14 WCSC x6         : 2864   21  21   650    37.3 %   2954   40.8 %
 11 Senpai 1.0 x6                  : 2850   21  21   650    35.3 %   2955   40.8 %
 12 Jonny 6.00 x6                  : 2829   21  21   650    32.4 %   2957   38.6 %
 13 Deep Junior 13 x6              : 2826   22  22   650    31.9 %   2957   36.5 %
 14 Texel 1.03 x6                  : 2823   21  21   650    31.5 %   2958   41.2 %
regards, Clemens Keck
NATIONAL12
Posts: 305
Joined: Sat Jan 02, 2010 11:31 pm
Location: bristol,uk

Re: CEGT - rating lists June 08th 2014

Post by NATIONAL12 »

lkaufman wrote:
Wolfgang wrote:Hi Larry,

on my i7-4770k @ 3,4 GHZ I use 40/8 and IIRC Werner and Johann (our main 40/20 testers) have similar machines.

Best
Wolfgang
Thanks! So it's practically the same level as the 5' plus 3" list if you allow for ponder on in that list. A bit longer, but not significantly so.
This makes the dramatic difference between those two lists most interesting! It suggests to me that there is something wrong with Komodo's time management in increment games, which apparently is not so in 40/x controls. I already wrote that I suspected time management in increment play with SF could be a problem, and this confirms it. Another possibility is that there is something wrong with our time management with ponder on, although I don't think so. Of course the 4 cpu vs 1 cpu lists complicate the picture. It certainly appears that Komodo does better (relative to Houdini) on 4 cpu than on 1.
By the way, maybe it's time to consider changing the name of the 40/20 list to 40/8 if that is what is used on all the I7 machines, since nowadays I think the I7 is very common and is pretty much the standard. It really is a blitz rating now, and the 40/4 list is more like a bullet list. Of course all the lists would have to be appropriately titled.
I agree with you Larry,i would like to see a new 40/40 list based on recent computers.Maybe CCRL and CEGT could do this as they are well respected testers.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: CEGT - rating lists June 08th 2014

Post by Milos »

lkaufman wrote:This makes the dramatic difference between those two lists most interesting! It suggests to me that there is something wrong with Komodo's time management in increment games, which apparently is not so in 40/x controls. I already wrote that I suspected time management in increment play with SF could be a problem, and this confirms it. Another possibility is that there is something wrong with our time management with ponder on, although I don't think so.
Lol so many assumptions and all wrong. Actual reason is that Houdini TM is totally screwed for 40/x TCs.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: CEGT - rating lists June 08th 2014

Post by lkaufman »

Milos wrote:
lkaufman wrote:This makes the dramatic difference between those two lists most interesting! It suggests to me that there is something wrong with Komodo's time management in increment games, which apparently is not so in 40/x controls. I already wrote that I suspected time management in increment play with SF could be a problem, and this confirms it. Another possibility is that there is something wrong with our time management with ponder on, although I don't think so.
Lol so many assumptions and all wrong. Actual reason is that Houdini TM is totally screwed for 40/x TCs.
Thanks. Could you describe in what way it is bad for 40/x, other than "totally screwed"? Anyway I suppose there is more than one reason for this.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: CEGT - rating lists June 08th 2014

Post by lkaufman »

Hugo wrote:
lkaufman wrote:
Wolfgang wrote:Hi Larry,

on my i7-4770k @ 3,4 GHZ I use 40/8 and IIRC Werner and Johann (our main 40/20 testers) have similar machines.

Best
Wolfgang
Thanks! So it's practically the same level as the 5' plus 3" list if you allow for ponder on in that list. A bit longer, but not significantly so.
This makes the dramatic difference between those two lists most interesting! It suggests to me that there is something wrong with Komodo's time management in increment games, which apparently is not so in 40/x controls. I already wrote that I suspected time management in increment play with SF could be a problem, and this confirms it. Another possibility is that there is something wrong with our time management with ponder on, although I don't think so. Of course the 4 cpu vs 1 cpu lists complicate the picture. It certainly appears that Komodo does better (relative to Houdini) on 4 cpu than on 1.
By the way, maybe it's time to consider changing the name of the 40/20 list to 40/8 if that is what is used on all the I7 machines, since nowadays I think the I7 is very common and is pretty much the standard. It really is a blitz rating now, and the 40/4 list is more like a bullet list. Of course all the lists would have to be appropriately titled.
First thing which is wrong in Komodo:
it moves much too fast in the first move when out of book.
btw:
here is my private list of 14 engines.
10min + 10 sec per game
ponder OFF
all engines 6 cpu:

Code: Select all

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Stockfish 5 64 SSE4.2 x6       : 3152   22  22   650    78.0 %   2932   36.6 %
  2 Houdini 4 Pro x64 x6           : 3103   22  22   650    72.4 %   2936   35.8 %
  3 Komodo 7 64-bit x6             : 3095   21  21   650    71.4 %   2937   41.8 %
  4 Gull 3 x64 x6                  : 3084   21  21   650    69.9 %   2937   41.7 %
  5 Deep Rybka 4.1 SSE42 x64 x6    : 2980   20  20   650    55.0 %   2945   46.3 %
  6 Fire 3.0 x64 x6                : 2975   19  19   650    54.2 %   2946   48.2 %
  7 Chiron 2 64bit x6              : 2918   19  20   650    45.3 %   2950   46.9 %
  8 Hannibal 1.4bx64 x6            : 2910   19  19   650    44.2 %   2951   49.4 %
  9 Protector 1.6.0 x64 x6         : 2890   20  20   650    41.1 %   2952   45.2 %
 10 Deep HIARCS 14 WCSC x6         : 2864   21  21   650    37.3 %   2954   40.8 %
 11 Senpai 1.0 x6                  : 2850   21  21   650    35.3 %   2955   40.8 %
 12 Jonny 6.00 x6                  : 2829   21  21   650    32.4 %   2957   38.6 %
 13 Deep Junior 13 x6              : 2826   22  22   650    31.9 %   2957   36.5 %
 14 Texel 1.03 x6                  : 2823   21  21   650    31.5 %   2958   41.2 %
regards, Clemens Keck
Thanks, we thought so too, but forcing it to take more time on the first move actually lowered the elo a bit. Strange...
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: CEGT - rating lists June 08th 2014

Post by Milos »

lkaufman wrote:Thanks. Could you describe in what way it is bad for 40/x, other than "totally screwed"? Anyway I suppose there is more than one reason for this.
"Flat" TM that RH didn't bother to change from Robbo, which allocates approximately equal time for each move, i.e. same time is used for move 1 and move 39, actually since there is always some time left due to buffers more time is allocated for move 39 and moves 40+ than for move 1, which is totally bogus.