CEGT - rating lists June 30th 2013

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Werner
Posts: 3017
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

CEGT - rating lists June 30th 2013

Post by Werner »

Hi all, :D

our actual rating lists are online and can be found under the attached links.

40 / 20:
New games: 990 ; 13 different engines
Total: 675.176

NEW Engines
413 Crafty 23.6 x64 1CPU: 2655 - 300 games (good start - strongest Crafty here)

UPDATES
2 Komodo 5.1 x64 4CPU: 3116 - 646 games (-19; Close to the blitz-rating)
6 Stockfish 3.0 x64 4CPU: 3062 - 1716 games (-4)
8 Equinox 1.70 x64 4CPU: 3050 - 2110 games (-4)

40 / 4:
New games: 8760
Total number: 1.211.028

New engines
5 Komodo 5.1 x64 4CPU: 3117 - 1500 games (just behind the Houdinis)
48 Gull 2.1 x64 1CPU: 2967 - 1600 games (+15 to Gull II)
529 ICE 1.0 v1619 x64 1CPU: 2566 - 800 games (+264 to v. 0.3!!)
536 Octochess r5132 x64 1CPU: 2560 - 1000 games (+131 to v. r4558!)
899 DanaSah 5.06: 2380 - 500 games (starts with -12 to v. 4.88)

Updates
1152 Betsy 6.51: 2193 - 1011 games (+25)
1190 GreKo 5.9: 2147- 1020 games (-14)
697 Vajolet 2.03 w32 1CPU: 2472 - 800 games (+15)
1150 Capture R1: 2195- 969 games (+2)


40/120
See here our new single-list ):
http://www.husvankempen.de/nunn//40120n ... liste.html.
Last update was May 15th with 11500 games and now 42 engines.

40/20 pb=on
Last update was June 24th.
5 Stockfish 3.0 x64 2973 +15 -15 1160 games
10 Gull II x64 2923 +15 -15 1160 games

A big „Thank you“ to all testers as usual!!

Links

40/20: http://www.husvankempen.de/nunn/rating.htm
Blitz: http://www.husvankempen.de/nunn/blitz.htm
40/120: http://www.husvankempen.de/nunn/rating120.htm
Tester: http://www.husvankempen.de/nunn/testers/testers.htm
40/20 pb=on: http://www.husvankempen.de/nunn/rating4020PBON.htm
Games of the week: http://www.husvankempen.de/nunn/40_40%2 ... on/gow.jpg

Werner Schuele
CEGT-Team
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: CEGT - rating lists June 30th 2013

Post by lkaufman »

Werner wrote:Hi all, :D


UPDATES
2 Komodo 5.1 x64 4CPU: 3116 - 646 games (-19; Close to the blitz-rating)

Werner Schuele
CEGT-Team
The above statement, while literally true, is rather misleading as the ratings are only meaningful compared to the competition. At 40/4 Komodo is rated well below all Houdini versions, far below Houdini 3. At 40/20 Komodo is 35 and 40 points above the older Houdini versions, and just 31 below Houdini 3. Even allowing for margin of error, I think this demonstrates beyond reasonable doubt that Komodo scales better than Houdini, and leaves open the question as to whether there might be some time limit at which Komodo might pass Houdini 3.

Thanks for your excellent testing work!

Regards,
Larry Kaufman
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: CEGT - rating lists June 30th 2013

Post by mwyoung »

lkaufman wrote:
Werner wrote:Hi all, :D


UPDATES
2 Komodo 5.1 x64 4CPU: 3116 - 646 games (-19; Close to the blitz-rating)

Werner Schuele
CEGT-Team
The above statement, while literally true, is rather misleading as the ratings are only meaningful compared to the competition. At 40/4 Komodo is rated well below all Houdini versions, far below Houdini 3. At 40/20 Komodo is 35 and 40 points above the older Houdini versions, and just 31 below Houdini 3. Even allowing for margin of error, I think this demonstrates beyond reasonable doubt that Komodo scales better than Houdini, and leaves open the question as to whether there might be some time limit at which Komodo might pass Houdini 3.

Thanks for your excellent testing work!

Regards,
Larry Kaufman
The jury is still out. The new match between komodo 5.1 and Houdini 3. The revenge match played on 11 cores may tell us something. The conditions are the same except more CPUs and hash. In effect testing with a longer time control then the last match when komodo mp was - 21 elo against Houdini 3.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: CEGT - rating lists June 30th 2013

Post by IWB »

lkaufman wrote: The above statement, while literally true, is rather misleading as the ratings are only meaningful compared to the competition. At 40/4 Komodo is rated well below all Houdini versions, far below Houdini 3. At 40/20 Komodo is 35 and 40 points above the older Houdini versions, and just 31 below Houdini 3. Even allowing for margin of error, I think this demonstrates beyond reasonable doubt that Komodo scales better than Houdini, and leaves open the question as to whether there might be some time limit at which Komodo might pass Houdini 3.
(All my speculation is for 4 cores)

The numbers are correct, but leave out the fact that at 40/4 H3 has 25.1%draw rate and K5.1 35%, at 40/20 it is already 32.5% for H3 and 51.1%(!) for K5.1.
I conclude the following here:
1. All engines getting closer as the draw rate increases and
2. 51.1% is extraodiary high, I like to see more opponents here! (H3 had 33 opponents, K5.1 just 7 and that were 5 very good ones, one "medium" and one week one ...)

To me it seems it is important to see if K5.1 can get more wins vs the weaker opponents than H3 ... and I have some doubts hat this is happeneng (at elast in my test that was the point where K5.1 lost rating points)

Anyhow, this version brought back competition - and I like that :-)

Bye
Ingo
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: CEGT - rating lists June 30th 2013

Post by lkaufman »

IWB wrote:
lkaufman wrote: The above statement, while literally true, is rather misleading as the ratings are only meaningful compared to the competition. At 40/4 Komodo is rated well below all Houdini versions, far below Houdini 3. At 40/20 Komodo is 35 and 40 points above the older Houdini versions, and just 31 below Houdini 3. Even allowing for margin of error, I think this demonstrates beyond reasonable doubt that Komodo scales better than Houdini, and leaves open the question as to whether there might be some time limit at which Komodo might pass Houdini 3.
(All my speculation is for 4 cores)

The numbers are correct, but leave out the fact that at 40/4 H3 has 25.1%draw rate and K5.1 35%, at 40/20 it is already 32.5% for H3 and 51.1%(!) for K5.1.
I conclude the following here:
1. All engines getting closer as the draw rate increases and
2. 51.1% is extraodiary high, I like to see more opponents here! (H3 had 33 opponents, K5.1 just 7 and that were 5 very good ones, one "medium" and one week one ...)

To me it seems it is important to see if K5.1 can get more wins vs the weaker opponents than H3 ... and I have some doubts hat this is happeneng (at elast in my test that was the point where K5.1 lost rating points)

Anyhow, this version brought back competition - and I like that :-)

Bye
Ingo
It's easy to lower the draw percentage and to raise results against weaker opponents, at the expense of hurting results against the strongest opponents, by using higher contempt values. Houdini uses a much higher contempt value than we (or anyone else as far as I know) do, which might account for what you observe. For optimum results, we could tell Komodo to use zero contempt when opponent's name starts with an "H", otherwise to double it :) .
Yes, it is true that rating differences contract with longer time limits. In this case though we go from well below Houdini 1.5 and Houdini 2 at blitz to quite a bit above them at 40/20, so this has nothing to do with draw rates or contraction. One could argue that only Houdini 1.5 and 2 but not Houdini 3 have scaling problems, but I don't think the evidence supports this idea.

Best regards,
Larry
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: CEGT - rating lists June 30th 2013

Post by Don »

lkaufman wrote:
IWB wrote:
lkaufman wrote: The above statement, while literally true, is rather misleading as the ratings are only meaningful compared to the competition. At 40/4 Komodo is rated well below all Houdini versions, far below Houdini 3. At 40/20 Komodo is 35 and 40 points above the older Houdini versions, and just 31 below Houdini 3. Even allowing for margin of error, I think this demonstrates beyond reasonable doubt that Komodo scales better than Houdini, and leaves open the question as to whether there might be some time limit at which Komodo might pass Houdini 3.
(All my speculation is for 4 cores)

The numbers are correct, but leave out the fact that at 40/4 H3 has 25.1%draw rate and K5.1 35%, at 40/20 it is already 32.5% for H3 and 51.1%(!) for K5.1.
I conclude the following here:
1. All engines getting closer as the draw rate increases and
2. 51.1% is extraodiary high, I like to see more opponents here! (H3 had 33 opponents, K5.1 just 7 and that were 5 very good ones, one "medium" and one week one ...)

To me it seems it is important to see if K5.1 can get more wins vs the weaker opponents than H3 ... and I have some doubts hat this is happeneng (at elast in my test that was the point where K5.1 lost rating points)

Anyhow, this version brought back competition - and I like that :-)

Bye
Ingo
It's easy to lower the draw percentage and to raise results against weaker opponents, at the expense of hurting results against the strongest opponents, by using higher contempt values. Houdini uses a much higher contempt value than we (or anyone else as far as I know) do, which might account for what you observe. For optimum results, we could tell Komodo to use zero contempt when opponent's name starts with an "H", otherwise to double it :) .
Yes, it is true that rating differences contract with longer time limits. In this case though we go from well below Houdini 1.5 and Houdini 2 at blitz to quite a bit above them at 40/20, so this has nothing to do with draw rates or contraction. One could argue that only Houdini 1.5 and 2 but not Houdini 3 have scaling problems, but I don't think the evidence supports this idea.

Best regards,
Larry
I'm not so sure Houdini 3 has scaling problems but we also have reports that Stockfish is beating or staying with Houdini 3 - but i"m not sure what time controls that involves.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
beram
Posts: 1187
Joined: Wed Jan 06, 2010 3:11 pm

Re: CEGT - rating lists June 30th 2013

Post by beram »

lkaufman wrote:
IWB wrote:
lkaufman wrote: The above statement, while literally true, is rather misleading as the ratings are only meaningful compared to the competition. At 40/4 Komodo is rated well below all Houdini versions, far below Houdini 3. At 40/20 Komodo is 35 and 40 points above the older Houdini versions, and just 31 below Houdini 3. Even allowing for margin of error, I think this demonstrates beyond reasonable doubt that Komodo scales better than Houdini, and leaves open the question as to whether there might be some time limit at which Komodo might pass Houdini 3.
(All my speculation is for 4 cores)

The numbers are correct, but leave out the fact that at 40/4 H3 has 25.1%draw rate and K5.1 35%, at 40/20 it is already 32.5% for H3 and 51.1%(!) for K5.1.
I conclude the following here:
1. All engines getting closer as the draw rate increases and
2. 51.1% is extraodiary high, I like to see more opponents here! (H3 had 33 opponents, K5.1 just 7 and that were 5 very good ones, one "medium" and one week one ...)

To me it seems it is important to see if K5.1 can get more wins vs the weaker opponents than H3 ... and I have some doubts hat this is happeneng (at elast in my test that was the point where K5.1 lost rating points)

Anyhow, this version brought back competition - and I like that :-)

Bye
Ingo
It's easy to lower the draw percentage and to raise results against weaker opponents, at the expense of hurting results against the strongest opponents, by using higher contempt values. Houdini uses a much higher contempt value than we (or anyone else as far as I know) do, which might account for what you observe. For optimum results, we could tell Komodo to use zero contempt when opponent's name starts with an "H", otherwise to double it :) .
Yes, it is true that rating differences contract with longer time limits. In this case though we go from well below Houdini 1.5 and Houdini 2 at blitz to quite a bit above them at 40/20, so this has nothing to do with draw rates or contraction. One could argue that only Houdini 1.5 and 2 but not Houdini 3 have scaling problems, but I don't think the evidence supports this idea.

Best regards,
Larry
In fact by saying this, it implies that you lowered the contempt for getting closer to the Houdini's in direct matches and by doing so this affects in higher draw rates and lesser results against the weaker opponents.
Anway I am glad that Ingo nuanced the interpretation of Komodo results sofar.
We just have to wait for more games at 40/20 against other opponents to see if 31 ELO behind will stand, but just as Ingo I doubt it will
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: CEGT - rating lists June 30th 2013

Post by lkaufman »

Don wrote:[
I'm not so sure Houdini 3 has scaling problems but we also have reports that Stockfish is beating or staying with Houdini 3 - but i"m not sure what time controls that involves.
Just to be clear, I'm not saying that Houdini scales badly or does anything wrong. I believe that Houdini (and Ippo, Critter, and other programs with similarities to Ippo) does something that makes it especially strong at fast time limits, and that Komodo and Stockfish have yet to discover how this occurs. But whatever the secret is, it appears to be worthless at long time controls, so the results at blitz do not correlate well with results at long time controls.
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: CEGT - rating lists June 30th 2013

Post by lkaufman »

[quote="beramIn fact by saying this, it implies that you lowered the contempt for getting closer to the Houdini's in direct matches and by doing so this affects in higher draw rates and lesser results against the weaker opponents.
Anway I am glad that Ingo nuanced the interpretation of Komodo results sofar.
We just have to wait for more games at 40/20 against other opponents to see if 31 ELO behind will stand, but just as Ingo I doubt it will[/quote]

Well, we didn't "lower contempt", it hasn't changed since Komodo 4 (I think), and the last change was an increase. It is just lower than Houdini.
Yes, we need more games to measure the elo gap more precisely, but there is no significant chance that the rating will drop to the level of Houdini 2 or 1.5, just as there is little chance that the blitz rating will rise to the level of those engines. This is enough to prove that results at blitz are not a good predictor of results at long time controls with today's top engines.
Dirt
Posts: 2851
Joined: Wed Mar 08, 2006 10:01 pm
Location: Irvine, CA, USA

Re: CEGT - rating lists June 30th 2013

Post by Dirt »

Werner wrote: 40 / 20:
New games: 990 ; 13 different engines
Total: 675.176
The best single version at 40/20 is Komodo on four cores. Since everything else only uses one core for the single version that is quite an advantage.

Thanks for the ratings.