Top 5 engines. Large, 10,000 games test

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Top 5 engines. Large, 10,000 games test

Post by Laskos »

GUI: LittleBlitzer
Opening: SWCR.pgn, 8 move 3395 openings
TC: 1s + 0.1s
All engines on 1 core.
TB: none

Code: Select all

Games Completed = 10000 of 10000 (Avg game length = 14.615 sec)
Settings = RR/16MB/1000ms+100ms/M 10000cp for 1000 moves, D 200 moves/PGN:K:\Downloads\LittleBlitzer\swcr.pgn(3395)
Time = 163821 sec elapsed

 1.  Houdini 1.5a x64         	2607.0/4000	2032-818-1150  	(tpm=96.2 d=11.3 nps=1304494)
 2.  Deep Rybka 4.1 x64       	2185.0/4000	1533-1163-1304    (tpm=106.0 d=9.3 nps=60255)
 3.  Critter 1.01 64-bit      	1605.5/4000	985-1774-1241  	(tpm=97.0 d=12.7 nps=1199675)
 4.  Ivanhoe B47cBx64         	2161.5/4000	1456-1133-1411    (tpm=101.4 d=12.5 nps=1090165)
 5.  Stockfish 2.0.1 JA 64bit 	1441.0/4000	822-1940-1238  	(tpm=95.8 d=11.9 nps=897037)
Elo (Rybka 4.1 = 0) and 95% confidence intervals:

Code: Select all

    Program                            Score       %     Elo    +   -    Draws

  1 Houdini 1.5a x64               : 2607.0/4000  65.2    61    9   9   28.8 %
  2 Deep Rybka 4.1 x64             : 2185.0/4000  54.6     0    9   9   32.6 %
  3 Ivanhoe B47cBx64               : 2161.5/4000  54.0    -4    9   9   35.3 %
  4 Critter 1.01 64-bit            : 1605.5/4000  40.1   -82    9   9   31.0 %
  5 Stockfish 2.0.1 JA 64bit       : 1441.0/4000  36.0  -106    9   9   30.9 %

Individual statistics:

Code: Select all

1 Houdini 1.5a x64          :   61  4000 (+2032,=1150,-818), 65.2 %

Deep Rybka 4.1 x64            : 1000 (+418,=310,-272), 57.3 %
Critter 1.01 64-bit           : 1000 (+573,=246,-181), 69.6 %
Ivanhoe B47cBx64-1            : 1000 (+440,=339,-221), 61.0 %
Stockfish 2.0.1 JA 64bit      : 1000 (+601,=255,-144), 72.9 %

2 Deep Rybka 4.1 x64        :    0  4000 (+1533,=1304,-1163), 54.6 %

Houdini 1.5a x64              : 1000 (+272,=310,-418), 42.7 %
Critter 1.01 64-bit           : 1000 (+458,=306,-236), 61.1 %
Ivanhoe B47cBx64-1            : 1000 (+320,=380,-300), 51.0 %
Stockfish 2.0.1 JA 64bit      : 1000 (+483,=308,-209), 63.7 %

3 Ivanhoe B47cBx64-1        :   -4  4000 (+1456,=1411,-1133), 54.0 %

Houdini 1.5a x64              : 1000 (+221,=339,-440), 39.1 %
Deep Rybka 4.1 x64            : 1000 (+300,=380,-320), 49.0 %
Critter 1.01 64-bit           : 1000 (+444,=353,-203), 62.1 %
Stockfish 2.0.1 JA 64bit      : 1000 (+491,=339,-170), 66.0 %

4 Critter 1.01 64-bit       :  -82  4000 (+985,=1241,-1774), 40.1 %

Houdini 1.5a x64              : 1000 (+181,=246,-573), 30.4 %
Deep Rybka 4.1 x64            : 1000 (+236,=306,-458), 38.9 %
Ivanhoe B47cBx64-1            : 1000 (+203,=353,-444), 38.0 %
Stockfish 2.0.1 JA 64bit      : 1000 (+365,=336,-299), 53.3 %

5 Stockfish 2.0.1 JA 64bit  : -106  4000 (+822,=1238,-1940), 36.0 %

Houdini 1.5a x64              : 1000 (+144,=255,-601), 27.2 %
Deep Rybka 4.1 x64            : 1000 (+209,=308,-483), 36.3 %
Critter 1.01 64-bit           : 1000 (+299,=336,-365), 46.7 %
Ivanhoe B47cBx64-1            : 1000 (+170,=339,-491), 34.0 %
General statistics

Code: Select all

Games        :  10000 (finished)

White Wins   :   3674 (36.7 %)
Black Wins   :   3154 (31.5 %)
Draws        :   3172 (31.7 %)
Unfinished   :      0

White Perf.  : 52.6 %
Black Perf.  : 47.4 %
At this time control Critter 1.01 seems stronger than Stockfish 2.0.1. Ivanhoe B47cB is within error margins the strength of Rybka 4.1. Rybka 4.1 uses its allotted time the most effectively.

Kai
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Top 5 engines. Large, 10,000 games test

Post by Laskos »

Meanwhile, the Round Robin continues, now 12,000 games.

Elo (Rybka 4.1 = 0) and 95% confidence intervals:

Code: Select all

    Program                            Score       %     Elo    +   -    Draws

  1 Houdini 1.5a x64               : 3122.0/4800  65.0    62    8   8   28.8 %
  2 Ivanhoe B47cBx64               : 2629.0/4800  54.8     3    8   8   35.0 %
  3 Deep Rybka 4.1 x64             : 2603.5/4800  54.2     0    8   8   32.3 %
  4 Critter 1.01 64-bit            : 1912.0/4800  39.8   -81    8   8   30.9 %
  5 Stockfish 2.0.1 JA 64bit       : 1733.5/4800  36.1  -103    8   8   30.6 %
Individual statistivs:

Code: Select all

1 Houdini 1.5a x64          :   62  4800 (+2431,=1382,-987), 65.0 %

Deep Rybka 4.1 x64            : 1200 (+516,=359,-325), 58.0 %
Critter 1.01 64-bit           : 1200 (+689,=296,-215), 69.8 %
Ivanhoe B47cBx64-1            : 1200 (+513,=419,-268), 60.2 %
Stockfish 2.0.1 JA 64bit      : 1200 (+713,=308,-179), 72.2 %

2 Ivanhoe B47cBx64-1        :    3  4800 (+1788,=1682,-1330), 54.8 %

Houdini 1.5a x64              : 1200 (+268,=419,-513), 39.8 %
Deep Rybka 4.1 x64            : 1200 (+378,=448,-374), 50.2 %
Critter 1.01 64-bit           : 1200 (+546,=420,-234), 63.0 %
Stockfish 2.0.1 JA 64bit      : 1200 (+596,=395,-209), 66.1 %

3 Deep Rybka 4.1 x64        :    0  4800 (+1828,=1551,-1421), 54.2 %

Houdini 1.5a x64              : 1200 (+325,=359,-516), 42.0 %
Critter 1.01 64-bit           : 1200 (+545,=372,-283), 60.9 %
Ivanhoe B47cBx64-1            : 1200 (+374,=448,-378), 49.8 %
Stockfish 2.0.1 JA 64bit      : 1200 (+584,=372,-244), 64.2 %

4 Critter 1.01 64-bit       :  -81  4800 (+1171,=1482,-2147), 39.8 %

Houdini 1.5a x64              : 1200 (+215,=296,-689), 30.2 %
Deep Rybka 4.1 x64            : 1200 (+283,=372,-545), 39.1 %
Ivanhoe B47cBx64-1            : 1200 (+234,=420,-546), 37.0 %
Stockfish 2.0.1 JA 64bit      : 1200 (+439,=394,-367), 53.0 %

5 Stockfish 2.0.1 JA 64bit  : -103  4800 (+999,=1469,-2332), 36.1 %

Houdini 1.5a x64              : 1200 (+179,=308,-713), 27.8 %
Deep Rybka 4.1 x64            : 1200 (+244,=372,-584), 35.8 %
Critter 1.01 64-bit           : 1200 (+367,=394,-439), 47.0 %
Ivanhoe B47cBx64-1            : 1200 (+209,=395,-596), 33.9 %
Houdini 1.5a 62 +/- 8 Elo points stronger than Rybka 4.1, Ivanhoe B47cB and Rybka 4.1 equal within error margins, Critter 1.01 stronger than Stockfish 2.01, but clearly weaker than the leading trio.

Kai
Jouni
Posts: 3845
Joined: Wed Mar 08, 2006 8:15 pm
Full name: Jouni Uski

Re: Top 5 engines. Large, 10,000 games test

Post by Jouni »

What's the explanation to difference with pal larkin test :?: Is Critter SMP management worse than Stockfish may be? And is this time control actually too short to real test? Just wondering...

Jouni
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Top 5 engines. Large, 10,000 games test

Post by Laskos »

Jouni wrote:What's the explanation to difference with pal larkin test :?: Is Critter SMP management worse than Stockfish may be? And is this time control actually too short to real test? Just wondering...

Jouni
The apparent difference is the number of cores, different opening suite (albeit both 8 movers), most importantly, the smaller number of games per engine in Pal's RR. Frankly, I see more anomalies (which are probably statistical flukes) in his results than in mine. The only "strange" result of my test is the underperformance of SF 2.01. We all generally admit that later IvanHoes are pretty much the same strength as Rybka, confirmed in my test. Same with ~60 Elo points advantage of Houdini over Rybka and IvanHoe. Critter 1.01 is, as expected, weaker than this trio. In Pal's test there are many, probably statistical anomalies. Rybka overperforms, IvanHoe underperforms, Critter underperfoms.

For the underperfomance of SF 2.01 in my test I have no clear explanation, there were no time losses, the full line in LittleBlitzer about SF 2.01 is

5. Stockfish 2.0.1 JA 64bit 2179.0/6000 1274-2916-1810 (L: m=2916 t=0 i=0 a=0) (D: r=999 i=384 f=354 s=25 a=48) (tpm=95.9 d=11.8 nps=887382)

No time losses, no irregular moves, time used is 95.9 ms / move, similar to Houdini and Critter. NPS are not strange.

________________________

I finished 15,000 games, this is the final result:

Elo (Rybka 4.1 = 0) and 95% confidence intervals

Code: Select all


    Program                             Score      %         Elo    +   -   Draws

  1 Houdini 1.5a x64               : 3891.5/6000  64.9        63    7   7   28.6 %
  2 Ivanhoe B47cBx64               : 3281.0/6000  54.7         4    7   7   34.8 %
  3 Deep Rybka 4.1 x64             : 3241.0/6000  54.0         0    7   7   31.9 %
  4 Critter 1.01 64-bit            : 2407.5/6000  40.1       -78    7   7   30.8 %
  5 Stockfish 2.0.1 JA 64bit       : 2179.0/6000  36.3      -100    7   7   30.2 %
Individual statistics:

Code: Select all

1 Houdini 1.5a x64          :   63  6000 (+3032,=1719,-1249), 64.9 %

Deep Rybka 4.1 x64            : 1500 (+643,=444,-413), 57.7 %
Critter 1.01 64-bit           : 1500 (+867,=370,-263), 70.1 %
Ivanhoe B47cBx64              : 1500 (+631,=526,-343), 59.6 %
Stockfish 2.0.1 JA 64bit      : 1500 (+891,=379,-230), 72.0 %

2 Ivanhoe B47cBx64          :    4  6000 (+2236,=2090,-1674), 54.7 %

Houdini 1.5a x64              : 1500 (+343,=526,-631), 40.4 %
Deep Rybka 4.1 x64            : 1500 (+481,=552,-467), 50.5 %
Critter 1.01 64-bit           : 1500 (+675,=529,-296), 62.6 %
Stockfish 2.0.1 JA 64bit      : 1500 (+737,=483,-280), 65.2 %

3 Deep Rybka 4.1 x64        :    0  6000 (+2284,=1914,-1802), 54.0 %

Houdini 1.5a x64              : 1500 (+413,=444,-643), 42.3 %
Critter 1.01 64-bit           : 1500 (+671,=460,-369), 60.1 %
Ivanhoe B47cBx64              : 1500 (+467,=552,-481), 49.5 %
Stockfish 2.0.1 JA 64bit      : 1500 (+733,=458,-309), 64.1 %

4 Critter 1.01 64-bit       :  -78  6000 (+1483,=1849,-2668), 40.1 %

Houdini 1.5a x64              : 1500 (+263,=370,-867), 29.9 %
Deep Rybka 4.1 x64            : 1500 (+369,=460,-671), 39.9 %
Ivanhoe B47cBx64              : 1500 (+296,=529,-675), 37.4 %
Stockfish 2.0.1 JA 64bit      : 1500 (+555,=490,-455), 53.3 %

5 Stockfish 2.0.1 JA 64bit  : -100  6000 (+1274,=1810,-2916), 36.3 %

Houdini 1.5a x64              : 1500 (+230,=379,-891), 28.0 %
Deep Rybka 4.1 x64            : 1500 (+309,=458,-733), 35.9 %
Critter 1.01 64-bit           : 1500 (+455,=490,-555), 46.7 %
Ivanhoe B47cBx64              : 1500 (+280,=483,-737), 34.8 %
The conclusions are same as before. Maybe I will restart my comp and test again with 8moves.epd opening suite.

Kai
Frank Quisinsky
Posts: 7285
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: Top 5 engines. Large, 10,000 games test

Post by Frank Quisinsky »

Hi Kai,

nice to see that you are using my Shredder Random Book. After now 52.000 games I played with this book in SWCR (40-minutes-games) I had 41 games with Remis under 16 moves. Such games I replayed in SWCR.

Short question:
How many games you have with remis under 16 moves (fast remis games)? Here I can make the PGN file better.

Could you send me the games (remis under 16 moves)?
My mail address can be found in my "Impressum" page, contact form.

This would be great!

Have thanks for your work.
At the moment I don't have many time for computer chess and look only weakly in the messages by the others.

Best and
Have a nice sunday!

Frank