Testing Stockfish 11-03-13. 480 Games.

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

beram
Posts: 1187
Joined: Wed Jan 06, 2010 3:11 pm

Re: Testing Stockfish 11-03-13. 480 Games.

Post by beram »

Modern Times wrote:Bram, the final result of my 6CPU match was:

KOMODO 1339 vs STOCKFISH 221214

189.0 - 211.0 ( +55, =268, -77 ) 47.25%


Details here:
http://kirill-kryukov.com/chess/discuss ... f=7&t=7907

I've posted it here rather than continue in the TCEC thread in the General section in case the mods start to get annoyed about posting match results there...
Thx for testing Ray,

Nice result for K1339, better than at LTC over 50 games :?

But now i am just curious to know what result will be for K1339 with default drawscore :)
Modern Times
Posts: 3755
Joined: Thu Jun 07, 2012 11:02 pm

Re: Testing Stockfish 11-03-13. 480 Games.

Post by Modern Times »

beram wrote: Nice result for K1339, better than at LTC over 50 games :?
Yes I noticed that, although 50 games is a really small sample...
beram wrote: But now i am just curious to know what result will be for K1339 with default drawscore :)
It would be interesting to see the difference, if any. I'll re-run it.
Tomcass
Posts: 786
Joined: Sun Apr 16, 2006 9:09 pm

Re: Testing Stockfish 11-03-13. 480 Games.

Post by Tomcass »

TESTING STOCKFISH 010115MZ : 480 Games.

i7 980 3.33 Ghz.
6 real cores
Ponder: Off.
GUI: Fritz 12
Book: Perfect 2014t – Limit 8 moves. Sedat –
No tablebases. No RTB used.
Hash 512
Relative Speed: 29.54
Knodes per second: 14.177

Time Control = 4+0

SF010115MZ x64 POPCNT_ - Houdini 4 x64_st_X6_CT0 23.0 - 17.0 +11/=24/-5 57.50%
SF010115MZ x64 POPCNT_ - Komodo 8 64-bit_6_NOB 23.5 - 16.5 +14/=19/-7 58.75%
SF010115MZ x64 POPCNT_ - Gull 3 x64 XP 27.0 - 13.0 +16/=22/-2 67.50%

Time Control= 2+2

SF010115MZ x64 POPCNT_ - Houdini 4 x64_st_X6_CT0 22.5 - 17.5 +9/=27/-4 56.25%
SF010115MZ x64 POPCNT_ - Komodo 8 64-bit_6_NOB 21.0 - 19.0 +6/=30/-4 52.50%
SF010115MZ x64 POPCNT_ - Gull 3 x64 XP 26.5 - 13.5 +17/=19/-4 66.25%

Score using 6 cores: 143.5 – 96.5= 59.79%

240 Games =
http://www.mediafire.com/download/mx49z ... 0Games.pgn

i7 975 3.33 Ghz.
4 real cores
Ponder: Off.
GUI: Fritz 12
Book: Perfect 2012t –Limit 8 moves. Sedat-
No tablebases. No RTB used.
Hash 256
Relative Speed: 20.62
Knodes per second: 9.899

Time Control: 4+0

SF010115MZ x64 POPCNT_ - Houdini 4 Pro x64-A_OK 25.5 - 14.5 +12/=27/-1 63.75%
SF010115MZ x64 POPCNT_ - Komodo 8 64-bit_4_NOB 23.5 - 16.5 +15/=17/-8 58.75%
SF010115MZ x64 POPCNT_ - Gull 3 x64 XP 26.0 - 14.0 +12/=28/-0 65.00%

Time Control= 2+2

SF010115MZ x64 POPCNT_ - Houdini 4 Pro x64-A_OK 25.5 - 14.5 +13/=25/-2 63.75%
SF010115MZ x64 POPCNT_ - Komodo 8 64-bit_4_NOB 23.5 - 16.5 +8/=31/-1 58.75%
SF010115MZ x64 POPCNT_ - Gull 3 x64 XP 25.0 - 15.0 +14/=22/-4 62.50%

Score using 4 cores: 149.0 – 91.0= 62.08%

240 Games:
http://www.mediafire.com/download/ib5i6 ... 0games.pgn

Segmenting by Time Control:

Fixed TC = 148.5 –91.5 = 61.87%
Incremental TC = 144.0 – 96.0 = 60.00%

GLOBAL SCORE: 292.5 – 187.5= 60.94%

Against : Houdini 4.0 St. Ct0 (3136) =60.31% ; Komodo 8 (3153) = 57.19%, Gull 3 XP (3103) = 65.31%

Average Estimated Elo Opponents = 3131
Estimated Elo Performance= 3208


Error bars= +/- 23 EEP

Regards,

Tom.
User avatar
Ozymandias
Posts: 1537
Joined: Sun Oct 25, 2009 2:30 am

Re: Testing Stockfish 11-03-13. 480 Games.

Post by Ozymandias »

Tomcass wrote:THE RANKING OF MY TESTS.

Some computer chess friends asked me to post a ranking of my tests. I have decided to start 2015 with this ranking, under my testing conditions in two computers of 6 and 4 cores.

Testing conditions:

i7 980 3.33 Ghz.
6 real cores
Ponder: Off.
GUI: Fritz 12
Book: Perfect 2014t – Limit 8 moves. Sedat –
No tablebases. No RTB used.
Hash 512
Relative Speed: 29.54
Knodes per second: 14.177

Time Control = 4+0 and 2+2

And

i7 975 3.33 Ghz.
4 real cores
Ponder: Off.
GUI: Fritz 12
Book: Perfect 2012t –Limit 8 moves. Sedat-
No tablebases. No RTB used.
Hash 256
Relative Speed: 20.62
Knodes per second: 9.899

Time Control: 4+0 and 2+2.

GLOBAL RANKING OF MY TESTS AT 01 JANUARY 2015

1.- Stockfish 221214 Marco Z 3209 (1440 games)
2.- Stockfish Development 091114 3201 (1440 games)
3.- Stock-fish-6 6st 3200 (1440 games)
4.- Stockfish Orca 281214 3197 (480 games)
5.- Stockfish Rockwood 190514 3183 (480 games)
6.- Stockfish 5 3172 (960 games)
7.- Komodo 8 3153 (8480 games)
8.- DON 271114 3139 (480 games)
9.- Houdini 4 3136 (10240 games)
10.- Komodo 7 3107 (1760 games)
11.- Gull 3 XP 3103 (10240 games)
12.- Houdini 3 3072
13.- Amitis 16102013 3052
14.- Fire 4 3047
15.- Stockfish 4 3019
16.- Critter 1.6a 3004
17.- Deep Rybka 4.1 2915

Some comments:

1.- You will observe that I have decided to decrease in 100 Elo Points all my measures. I am sure that this estimate is closer to the reality and more consistent with other rankings (more ‘professionals’ than mine).

2.- The best Stockfish compiles are usually between 10 and 12 Elo points better than the Development versions. They provide a good indicator of the strength of SF Development some weeks later.

3.- Some of the engines have been tested intensively, with over 8,000 games. In these cases, the error bars have been reduced to only +/- 5 aproximately.

Since my ranking is a ‘pure amateur’ one, I will appreciate any comment or suggestion to improve it. Thanks in advance and my wish for an EXCELLENT 2015 to all my computer chess friends.

Regards from Barcelona,

Tom.
Nice, don't miss out on 03012015MZ!
Tomcass
Posts: 786
Joined: Sun Apr 16, 2006 9:09 pm

Re: Testing Stockfish 11-03-13. 480 Games.

Post by Tomcass »

TESTING STOCKFISH 020115 DEVELOPMENT : 480 Games

Timestamp: 1420227182 Bench: 8224782

i7 980 3.33 Ghz.
6 real cores
Ponder: Off.
GUI: Fritz 12
Book: Perfect 2014t – Limit 8 moves. Sedat –
No tablebases. No RTB used.
Hash 512
Relative Speed: 29.54
Knodes per second: 14.177

Time Control = 4+0

Stockfish 020115 64 POPCNT_ - Houdini 4 x64_st_X6_CT0 24.0 - 16.0 +13/=22/-5 60.00%
Stockfish 020115 64 POPCNT_ - Komodo 8 64-bit_6_NOB 23.5 - 16.5 +11/=25/-4 58.75%
Stockfish 020115 64 POPCNT_ - Gull 3 x64 XP 25.0 - 15.0 +12/=26/-2 62.50%

120 Games:
http://www.mediafire.com/download/7dbua ... 0Games.pgn

Time Control= 2+2

Stockfish 020115 64 POPCNT_ - Houdini 4 x64_st_X6_CT0 23.5 - 16.5 +10/=27/-3 58.75%
Stockfish 020115 64 POPCNT_ - Komodo 8 64-bit_6_NOB 22.5 - 17.5 +9/=27/-4 56.25%
Stockfish 020115 64 POPCNT_ - Gull 3 x64 XP 25.0 - 15.0 +14/=22/-4 62.50%

120 Games:
http://www.mediafire.com/download/itaq7 ... es_bis.pgn

Score using 6 cores: 143.5 – 96.5= 59.79%

i7 975 3.33 Ghz.
4 real cores
Ponder: Off.
GUI: Fritz 12
Book: Perfect 2012t –Limit 8 moves. Sedat-
No tablebases. No RTB used.
Hash 256
Relative Speed: 20.62
Knodes per second: 9.899

Time Control = 4+0

Stockfish 020115 64 POPCNT_ - Houdini 4 Pro x64-A_OK 22.5 - 17.5 +11/=23/-6 56.25%
Stockfish 020115 64 POPCNT_ - Komodo 8 64-bit_4_NOB 25.0 - 15.0 +14/=22/-4 62.50%
Stockfish 020115 64 POPCNT_ - Gull 3 x64 XP 25.0 - 15.0 +11/=28/-1 62.50%

120 Games:
http://www.mediafire.com/download/aun1h ... 0games.pgn

Time Control= 2+2

Stockfish 020115 64 POPCNT_ - Houdini 4 Pro x64-A_OK 24.5 - 15.5 +11/=27/-2 61.25%
Stockfish 020115 64 POPCNT_ - Komodo 8 64-bit_4_NOB 20.0 - 20.0 +7/=26/-7 50.00%
Stockfish 020115 64 POPCNT_ - Gull 3 x64 XP 24.5 - 15.5 +12/=25/-3 61.25%

120 Games:
http://www.mediafire.com/download/c6lw6 ... 0games.pgn

Score using 4 cores: 141.5 – 98.5= 58.96%

Segmenting by Time Control:

Fixed TC = 145.0 – 95.0= 60.42%
Incremental TC = 140.0 – 100.0 = 58.33%

GLOBAL SCORE: 285.0 – 195.0 = 59.37%

Against : Houdini 4.0 St. Ct0 (3136) =59.06% ; Komodo 8 (3153) = 56.87%, Gull 3 XP (3103) = 62.19%

Average Estimated Elo Opponents = 3131
Estimated Elo Performance= 3197


Regards,

Tom.
Tomcass
Posts: 786
Joined: Sun Apr 16, 2006 9:09 pm

Re: Testing Stockfish 11-03-13. 480 Games.

Post by Tomcass »

TESTING STOCKFISH 030115 IPMAN: 480 Games

i7 980 3.33 Ghz.
6 real cores
Ponder: Off.
GUI: Fritz 12
Book: Perfect 2014t – Limit 8 moves. Sedat –
No tablebases. No RTB used.
Hash 512
Relative Speed: 29.54
Knodes per second: 14.177

Time Control = 4+0

SF 030115IPx 64 POPCNT_ - Houdini 4 x64_st_X6_CT0 24.0 - 16.0 +9/=30/-1 60.00%
SF 030115IPx 64 POPCNT_ - Komodo 8 64-bit_6_NOB 22.0 - 18.0 +9/=26/-5 55.00%
SF 030115IPx 64 POPCNT_ - Gull 3 x64 XP 24.5 - 15.5 +10/=29/-1 61.25%

Time Control= 2+2

SF 030115IPx 64 POPCNT_ - Houdini 4 x64_st_X6_CT0 24.5 - 15.5 +15/=19/-6 61.25%
SF 030115IPx 64 POPCNT_ - Komodo 8 64-bit_6_NOB 23.5 - 16.5 +14/=19/-7 58.75%
SF 030115IPx 64 POPCNT_ - Gull 3 x64 XP 25.5 - 14.5 +13/=25/-2 63.75%

Score using 6 cores: 144.0 – 96.0 = 60.00%

240 games=
http://www.mediafire.com/download/wc9f7 ... 0Games.pgn

i7 975 3.33 Ghz.
4 real cores
Ponder: Off.
GUI: Fritz 12
Book: Perfect 2012t –Limit 8 moves. Sedat-
No tablebases. No RTB used.
Hash 256
Relative Speed: 20.62
Knodes per second: 9.899

Time Control: 4+0

SF 030115IPx 64 POPCNT_ - Houdini 4 Pro x64-A_OK 25.5 - 14.5 +13/=25/-2 63.75%
SF 030115IPx 64 POPCNT_ - Komodo 8 64-bit_4_NOB 25.5 - 14.5 +16/=19/-5 63.75%
SF 030115IPx 64 POPCNT_ - Gull 3 x64 XP 27.5 - 12.5 +16/=23/-1 68.75%

Time Control= 2+2

SF 030115IPx 64 POPCNT_ - Houdini 4 Pro x64-A_OK 24.5 - 15.5 +12/=25/-3 61.25%
SF 030115IPx 64 POPCNT_ - Komodo 8 64-bit_4_NOB 23.5 - 16.5 +10/=27/-3 58.75%
SF 030115IPx 64 POPCNT_ - Gull 3 x64 XP 27.5 - 12.5 +18/=19/-3 68.75%

Score using 4 cores: 154.0 – 86.0 = 64.17%

240 games=
http://www.mediafire.com/download/a11w8 ... 0games.pgn

Segmenting by Time Control:

Fixed TC = 149.0 – 91.0= 62.08%
Incremental TC = 149.0 – 91.0= 62.08%

Global Score: 298.0 – 182.0 = 62.08%

Against : Houdini 4.0 St. Ct0 (3136) = 61.56% ; Komodo 8 (3153) = 59.06%, Gull 3 XP (3103) = 65.63%

Average Estimated Elo Opponents = 3131
Estimated Elo Performance= 3216


This is a clear new RECORD in my tests. As I usually do with the best scorers, I will extend my tests until 1440 games to check if this impressive score is confirmed so that I will be able to include this SF compile at the top of my ranking.

Great work StockFish Team! And congratulations to my friend Ipman for this exceptional SF compile. Let’s see how it behaves from game 481 to 1440.

Regards,

Tom.

… please note that the version I use for my testings is POPCNT. There is another version AVX of this SF compile available, a bit faster, which I can not load with my old and somehow exhausted computers.
Modern Times
Posts: 3755
Joined: Thu Jun 07, 2012 11:02 pm

Re: Testing Stockfish 11-03-13. 480 Games.

Post by Modern Times »

Bram, the final result of my 6CPU matches were:

KOMODO 1339 (Drawscore 0) vs STOCKFISH 221214

189.0 - 211.0 ( +55, =268, -77 ) 47.25%


KOMODO 1339 (Drawscore Default) vs STOCKFISH 221214

191.0 - 209.0 ( +50, =282, -68 ) 47.75%

So the overall results are pretty much the same, except a slightly higher draw rate with default. The results are well within the margin of error for 400 games.

Apologies to Tom for posting in his thread.
beram
Posts: 1187
Joined: Wed Jan 06, 2010 3:11 pm

Re: Testing Stockfish 11-03-13. 480 Games.

Post by beram »

Modern Times wrote:Bram, the final result of my 6CPU matches were:

KOMODO 1339 (Drawscore 0) vs STOCKFISH 221214

189.0 - 211.0 ( +55, =268, -77 ) 47.25%


KOMODO 1339 (Drawscore Default) vs STOCKFISH 221214

191.0 - 209.0 ( +50, =282, -68 ) 47.75%

So the overall results are pretty much the same, except a slightly higher draw rate with default. The results are well within the margin of error for 400 games.

Apologies to Tom for posting in his thread.
Well thx for testing Ray,
If this result would statistically be true (small enough error bar) than it means that it would be better for Komodo 1339 to use default drawscore against latest SF dev
Though also I must say I am surprised by the just small difference of just 2,5 % over 800 games. It means SF is just 17-18 ELO ahead of Komodo under these 6CPU conditions

Tom got a kind of similar results (with smaller error bars) for Houdini 4 default and Houdini 4 with contempt=0
He discovered lately after > 5000 games each, they where just equally strong, just funny
(http://www.talkchess.com/forum/viewtopi ... 79&t=47512)
It also means that persons who believe miracles can happen by just tuning Komodo's drawscore down with 1 or 3 points against SF are just trash-talking

grts Bram
-------------
Stockfish rules ! (by small margin)
Tomcass
Posts: 786
Joined: Sun Apr 16, 2006 9:09 pm

Re: Testing Stockfish 11-03-13. 480 Games.

Post by Tomcass »

Modern Times wrote:Bram, the final result of my 6CPU matches were:

KOMODO 1339 (Drawscore 0) vs STOCKFISH 221214

189.0 - 211.0 ( +55, =268, -77 ) 47.25%


KOMODO 1339 (Drawscore Default) vs STOCKFISH 221214

191.0 - 209.0 ( +50, =282, -68 ) 47.75%

So the overall results are pretty much the same, except a slightly higher draw rate with default. The results are well within the margin of error for 400 games.

Apologies to Tom for posting in his thread.
Hi, Ray.

This is obviously an open thread, not mine at all. For me it is a pleasure to read here your useful and well worked tests. :wink:

Kind regards from Barcelona.

Tom
Tomcass
Posts: 786
Joined: Sun Apr 16, 2006 9:09 pm

Re: Testing Stockfish 11-03-13. 480 Games.

Post by Tomcass »

TESTING STOCKFISH 030115 IPMAN Test 2 and 3: 960 Games

i7 980 3.33 Ghz.
6 real cores
Ponder: Off.
GUI: Fritz 12
Book: Perfect 2014t – Limit 8 moves. Sedat –
No tablebases. No RTB used.
Hash 512
Relative Speed: 29.54
Knodes per second: 14.177

Time Control = 4+0

SF 030115IPx 64 POPCNT_ - Houdini 4 x64_st_X6_CT0 45.0 - 35.0 +21/=48/-11 56.25%
SF 030115IPx 64 POPCNT_ - Komodo 8 64-bit_6_NOB 45.5 - 34.5 +20/=51/-9 56.88%
SF 030115IPx 64 POPCNT_ - Gull 3 x64 XP 53.0 - 27.0 +31/=44/-5 66.25%

Time Control= 2+2

SF 030115IPx 64 POPCNT_ - Houdini 4 x64_st_X6_CT0 49.5 - 30.5 +28/=43/-9 61.88%
SF 030115IPx 64 POPCNT_ - Komodo 8 64-bit_6_NOB 45.0 - 35.0 +17/=56/-7 56.25%
SF 030115IPx 64 POPCNT_ - Gull 3 x64 XP 53.0 - 27.0 +32/=42/-6 66.25%

Score using 6 cores: 291.0 - 189.0 = 60.63%

240 games=
http://www.mediafire.com/download/bwwok ... est2_3.pgn

i7 975 3.33 Ghz.
4 real cores
Ponder: Off.
GUI: Fritz 12
Book: Perfect 2012t –Limit 8 moves. Sedat-
No tablebases. No RTB used.
Hash 256
Relative Speed: 20.62
Knodes per second: 9.899

Time Control: 4+0
SF 030115IPx 64 POPCNT_ - Houdini 4 Pro x64-A_OK 48.0 - 32.0 +21/=54/-5 60.00%
SF 030115IPx 64 POPCNT_ - Komodo 8 64-bit_4_NOB 43.0 - 37.0 +21/=44/-15 53.75%
SF 030115IPx 64 POPCNT_ - Gull 3 x64 XP 55.5 - 24.5 +38/=35/-7 69.38%

Time Control= 2+2

SF 030115IPx 64 POPCNT_ - Houdini 4 Pro x64-A_OK 47.0 - 33.0 +25/=44/-11 58.75%
SF 030115IPx 64 POPCNT_ - Komodo 8 64-bit_4_NOB 48.5 - 31.5 +24/=49/-7 60.63%
SF 030115IPx 64 POPCNT_ - Gull 3 x64 XP 54.5 - 25.5 +33/=43/-4 68.13%


Score using 4 cores: 296.5 – 183.5 = 61.77%

240 games=
http://www.mediafire.com/download/u4bqw ... 0games.pgn

Segmenting by Time Control:

Fixed TC = 290.0 – 190.0 = 60.42%
Incremental TC = 297.5 – 182.5 = 61.98%

Global Score: = 587.5 – 372.5 = 61.20%

Against : Houdini 4.0 St. Ct0 (3136) = 59.22% ; Komodo 8 (3153) = 56.88%, Gull 3 XP (3103) = 67.50%

Average Estimated Elo Opponents = 3131
Estimated Elo Performance= 3209


………………………………….

SUMMARY OF TESTS 1, 2 AND 3 FOR STOCKFISH 030115 IPMAN AFTER 1440 GAMES.

GLOBAL SCORE: 885.5 – 554.5= 61.49%

Against : Houdini 4.0 St. Ct0 (3227) = 60.00% ; Komodo 8 (3266) = 57.61%, Gull 3 XP (3199) =
66.88%

Average Estimated Elo Opponents = 3231
Estimated Elo Performance= 3311


Error Bars +/- 13

We have a new leader in my tests. Although the difference has not been finally so big as it was after the first 480 games, this is the best SF I have tested ever. As a reference it is:

- 2 ELO points stronger than Stockfish 221214 MZ (my second best)
- 11 ELO points stronger than Stock-Fish SF6
- 14 ELO points stronger than SF 020115 Development
- 39 ELO points stronger than Stockfish 5
- 58 ELO points stronger than Komodo 8
- 75 ELO points stronger than Houdini 4 and
- 118 ELO points stronger than Gull 3 XP.

Regards from Barcelona.

Tom.