Page 1 of 1

Rating list from Timo's tournaments

Posted: Mon Mar 19, 2012 9:12 am
by Jouni
After 300 games including Deep Junior matches (note different number of threads):

1 Houdini 2.0c Pro x64___3300
2 Komodo64 SSE Version 4___3272
3 Critter 1.4a x64___3264
4 Rybka 4.1 SSE42 x64___3256
5 Stockfish 2.2.2 SSE42 ___3242

Minor surprise is Critter's 3. place - I predict it "scale" not so good.

Re: Rating list from Timo's tournaments

Posted: Mon Mar 19, 2012 10:34 am
by Jouni
And I think 300 long games is meaning more than 30 000 superfast one :)

Re: Rating list from Timo's tournaments

Posted: Mon Mar 19, 2012 10:41 am
by TimoK
Hi Jouni,

thanks for the feedback and for the rating list!

Final results of the Titan match and game downloads on my webpage:
http://www.team-oh.de/Computerschach/Clash.htm

Best regards
Timo

Elo performance lists from 'Clash of the Titans' tourney.

Posted: Mon Mar 19, 2012 12:16 pm
by Ajedrecista
Hello:
Jouni wrote:After 300 games including Deep Junior matches (note different number of threads):

1 Houdini 2.0c Pro x64___3300
2 Komodo64 SSE Version 4___3272
3 Critter 1.4a x64___3264
4 Rybka 4.1 SSE42 x64___3256
5 Stockfish 2.2.2 SSE42 ___3242

Minor surprise is Critter's 3. place - I predict it "scale" not so good.
I have refined a little the list posted by Jouni, splitting lists according to the number of used threads by each engine; this Elo performance list is only of Clash of the Titans tourney (not an incredible number of games for narrowing the Elo uncertainties, but due to the long TC of the tourney, it is more than enough for me):

Code: Select all

1 thread:

                        ENGINE:  RATING    POINTS  PLAYED    (%)
          Houdini 2.0c Pro x64:   27.7      32.0      60   53.3%
        Critter 1.4a SSE42 x64:   10.4      30.5      60   50.8%
        Komodo64 SSE Version 4:    4.6     122.0     240   50.8%
  Stockfish 2.2.2 JA SSE42 x64:  -12.7      28.5      60   47.5%
           Rybka 4.1 SSE42 x64:  -30.1      27.0      60   45.0%

Code: Select all

2 threads:

                        ENGINE:  RATING    POINTS  PLAYED    (%)
  Stockfish 2.2.2 JA SSE42 x64:   13.5      31.5      60   52.5%
           Rybka 4.1 SSE42 x64:   -3.8      59.0     120   49.2%
        Critter 1.4a SSE42 x64:   -9.6      29.5      60   49.2%

Code: Select all

6 threads:

                        ENGINE:  RATING    POINTS  PLAYED    (%)
          Houdini 2.0c Pro x64:   32.0     101.0     180   56.1%
           Rybka 4.1 SSE42 x64:    8.9      28.0      60   46.7%
        Critter 1.4a SSE42 x64:   -1.0      60.5     120   50.4%
  Stockfish 2.2.2 JA SSE42 x64:  -39.9      50.5     120   42.1%
I downloaded the full PGN and splitted it into three new PGN files (1 thread, 2 threads and 6 threads). I did this task by hand, so it may contain errors, although I did not expect them. The lists were made with Ordo 0.4 by Ballicora, adjusting each overall average rating to 0.

Thanks to Ballicora for Ordo, and also thanks to Timo for run this great match! Congratulations.

Regards from Spain.

Ajedrecista.

Re: Rating list from Timo's tournaments

Posted: Mon Mar 19, 2012 12:31 pm
by Houdini
Jouni wrote:After 300 games including Deep Junior matches (note different number of threads):

1 Houdini 2.0c Pro x64___3300
2 Komodo64 SSE Version 4___3272
3 Critter 1.4a x64___3264
4 Rybka 4.1 SSE42 x64___3256
5 Stockfish 2.2.2 SSE42 ___3242
Interesting stuff, even if it's slightly awkward to create a rating list from a combination of 6 thread matches and 1 thread matches.
Most striking is that there's no fundamental difference with the IPON list played at about 100 times slower time control, other than some compression of the ratings.
Jouni wrote:Minor surprise is Critter's 3. place - I predict it "scale" not so good.
Stop poking fun at Don and Larry. ;)

Thanks to Timo for organizing these matches, there were a lot of interesting games!

Robert

Re: Rating list from Timo's tournaments

Posted: Mon Mar 19, 2012 1:01 pm
by TimoK
Hi Robert,
Houdini wrote:Most striking is that there's no fundamental difference with the IPON list played at about 100 times slower time control, other than some compression of the ratings.
That's true, here are both scores directly compared:

IPON Overall Scores:
Houdini: 327.0/600 (54.5%)
Critter: 304.5/600 (50.75%)
Komodo: 303.5/600 (50.58%)
Stockfish: 289.0/600 (48.17%)
Rybka: 276.0/600 (46.0%)

Titan Overall Scores:
Houdini: 133.0/240 (55.42%)
Komodo: 122.0/240 (50.83%)
Critter: 120.5/240 (50.21%)
Rybka: 114.0/240 (47.50%)
Stockfish: 110.5/240 (46.04%)

Very small differences (Rybka and Stockfish deviate most), all within the mathematical expectation. So this seems to be another proof that it isn't necessary to play games at long TCs to produce a reliable rating list that is valid for all types of TCs. IPON conditions seem to be fully sufficiant for that matter.
Houdini wrote:Thanks to Timo for organizing these matches, there were a lot of interesting games!
You are welcome, it was also a lot of fun for me thus it was a relatively high expenditure. These matches are time consuming (setup and maintain the computers and matches) and the next electricity bill will be high (~5 EURO a day when using all computers). Maybe I should also add a donate button on my homepage...

Best regards
Timo

Re: Elo performance lists from 'Clash of the Titans' tourney

Posted: Mon Mar 19, 2012 1:05 pm
by TimoK
Hello Jesús,

thanks for the feedback and for your interesting statistics. Of course more games would be necessary to draw any stable conclusions, but a tendancy is already visible.

Best regards from Hamburg, Germany
Timo

Re: Rating list from Timo's tournaments

Posted: Mon Mar 19, 2012 6:26 pm
by beram
Fully agree with you Timo. No need test at very long time control.
Besides the IPON list as you mentioned, there is also little difference with the CEGT 40/20 list and CCRL 40/40 list, although Komodo scores a little bit higher at CCRL.

grts Bram