Rating list from Timo's tournaments

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

Jouni
Posts: 3283
Joined: Wed Mar 08, 2006 8:15 pm

Rating list from Timo's tournaments

Post by Jouni »

After 300 games including Deep Junior matches (note different number of threads):

1 Houdini 2.0c Pro x64___3300
2 Komodo64 SSE Version 4___3272
3 Critter 1.4a x64___3264
4 Rybka 4.1 SSE42 x64___3256
5 Stockfish 2.2.2 SSE42 ___3242

Minor surprise is Critter's 3. place - I predict it "scale" not so good.
Jouni
Jouni
Posts: 3283
Joined: Wed Mar 08, 2006 8:15 pm

Re: Rating list from Timo's tournaments

Post by Jouni »

And I think 300 long games is meaning more than 30 000 superfast one :)
Jouni
TimoK
Posts: 98
Joined: Sun Jan 03, 2010 12:28 pm
Location: Hamburg

Re: Rating list from Timo's tournaments

Post by TimoK »

Hi Jouni,

thanks for the feedback and for the rating list!

Final results of the Titan match and game downloads on my webpage:
http://www.team-oh.de/Computerschach/Clash.htm

Best regards
Timo
User avatar
Ajedrecista
Posts: 1968
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Elo performance lists from 'Clash of the Titans' tourney.

Post by Ajedrecista »

Hello:
Jouni wrote:After 300 games including Deep Junior matches (note different number of threads):

1 Houdini 2.0c Pro x64___3300
2 Komodo64 SSE Version 4___3272
3 Critter 1.4a x64___3264
4 Rybka 4.1 SSE42 x64___3256
5 Stockfish 2.2.2 SSE42 ___3242

Minor surprise is Critter's 3. place - I predict it "scale" not so good.
I have refined a little the list posted by Jouni, splitting lists according to the number of used threads by each engine; this Elo performance list is only of Clash of the Titans tourney (not an incredible number of games for narrowing the Elo uncertainties, but due to the long TC of the tourney, it is more than enough for me):

Code: Select all

1 thread:

                        ENGINE:  RATING    POINTS  PLAYED    (%)
          Houdini 2.0c Pro x64:   27.7      32.0      60   53.3%
        Critter 1.4a SSE42 x64:   10.4      30.5      60   50.8%
        Komodo64 SSE Version 4:    4.6     122.0     240   50.8%
  Stockfish 2.2.2 JA SSE42 x64:  -12.7      28.5      60   47.5%
           Rybka 4.1 SSE42 x64:  -30.1      27.0      60   45.0%

Code: Select all

2 threads:

                        ENGINE:  RATING    POINTS  PLAYED    (%)
  Stockfish 2.2.2 JA SSE42 x64:   13.5      31.5      60   52.5%
           Rybka 4.1 SSE42 x64:   -3.8      59.0     120   49.2%
        Critter 1.4a SSE42 x64:   -9.6      29.5      60   49.2%

Code: Select all

6 threads:

                        ENGINE:  RATING    POINTS  PLAYED    (%)
          Houdini 2.0c Pro x64:   32.0     101.0     180   56.1%
           Rybka 4.1 SSE42 x64:    8.9      28.0      60   46.7%
        Critter 1.4a SSE42 x64:   -1.0      60.5     120   50.4%
  Stockfish 2.2.2 JA SSE42 x64:  -39.9      50.5     120   42.1%
I downloaded the full PGN and splitted it into three new PGN files (1 thread, 2 threads and 6 threads). I did this task by hand, so it may contain errors, although I did not expect them. The lists were made with Ordo 0.4 by Ballicora, adjusting each overall average rating to 0.

Thanks to Ballicora for Ordo, and also thanks to Timo for run this great match! Congratulations.

Regards from Spain.

Ajedrecista.
User avatar
Houdini
Posts: 1471
Joined: Tue Mar 16, 2010 12:00 am

Re: Rating list from Timo's tournaments

Post by Houdini »

Jouni wrote:After 300 games including Deep Junior matches (note different number of threads):

1 Houdini 2.0c Pro x64___3300
2 Komodo64 SSE Version 4___3272
3 Critter 1.4a x64___3264
4 Rybka 4.1 SSE42 x64___3256
5 Stockfish 2.2.2 SSE42 ___3242
Interesting stuff, even if it's slightly awkward to create a rating list from a combination of 6 thread matches and 1 thread matches.
Most striking is that there's no fundamental difference with the IPON list played at about 100 times slower time control, other than some compression of the ratings.
Jouni wrote:Minor surprise is Critter's 3. place - I predict it "scale" not so good.
Stop poking fun at Don and Larry. ;)

Thanks to Timo for organizing these matches, there were a lot of interesting games!

Robert
TimoK
Posts: 98
Joined: Sun Jan 03, 2010 12:28 pm
Location: Hamburg

Re: Rating list from Timo's tournaments

Post by TimoK »

Hi Robert,
Houdini wrote:Most striking is that there's no fundamental difference with the IPON list played at about 100 times slower time control, other than some compression of the ratings.
That's true, here are both scores directly compared:

IPON Overall Scores:
Houdini: 327.0/600 (54.5%)
Critter: 304.5/600 (50.75%)
Komodo: 303.5/600 (50.58%)
Stockfish: 289.0/600 (48.17%)
Rybka: 276.0/600 (46.0%)

Titan Overall Scores:
Houdini: 133.0/240 (55.42%)
Komodo: 122.0/240 (50.83%)
Critter: 120.5/240 (50.21%)
Rybka: 114.0/240 (47.50%)
Stockfish: 110.5/240 (46.04%)

Very small differences (Rybka and Stockfish deviate most), all within the mathematical expectation. So this seems to be another proof that it isn't necessary to play games at long TCs to produce a reliable rating list that is valid for all types of TCs. IPON conditions seem to be fully sufficiant for that matter.
Houdini wrote:Thanks to Timo for organizing these matches, there were a lot of interesting games!
You are welcome, it was also a lot of fun for me thus it was a relatively high expenditure. These matches are time consuming (setup and maintain the computers and matches) and the next electricity bill will be high (~5 EURO a day when using all computers). Maybe I should also add a donate button on my homepage...

Best regards
Timo
TimoK
Posts: 98
Joined: Sun Jan 03, 2010 12:28 pm
Location: Hamburg

Re: Elo performance lists from 'Clash of the Titans' tourney

Post by TimoK »

Hello Jesús,

thanks for the feedback and for your interesting statistics. Of course more games would be necessary to draw any stable conclusions, but a tendancy is already visible.

Best regards from Hamburg, Germany
Timo
beram
Posts: 1187
Joined: Wed Jan 06, 2010 3:11 pm

Re: Rating list from Timo's tournaments

Post by beram »

Fully agree with you Timo. No need test at very long time control.
Besides the IPON list as you mentioned, there is also little difference with the CEGT 40/20 list and CCRL 40/40 list, although Komodo scores a little bit higher at CCRL.

grts Bram