Testing Stockfish 11-03-13. 480 Games.

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

Tomcass
Posts: 786
Joined: Sun Apr 16, 2006 9:09 pm

Re: Testing Stockfish 11-03-13. 480 Games.

Post by Tomcass »

pohl4711 wrote:
Tomcass wrote:The increase in ELO I have found between SF 101213 and SF 231213 (18 points) is exactly the same that the improvement expected by SF Team in the testframe. Surprising but exciting, isn't it?. :wink:
.
Thats not correct. The +18 Elo Regression-Test in the SF-testframework is the result of Stockfish 231213 against Stockfish DD. All Regression-Tests are done with the last official release (at the moment Stockfish DD) and the new development-version.
And as you can see in my LS-ratinglist, Stockfish 101213 (I write 131210...) is +7 Elo stronger than Stockfish DD.
The result of Stockfish 131223 in its LS-testrun wil go online on Tuesday. And at the moment (4000 games) it is only +3 Elo stronger than Stockfish 131210 and +10 stronger than Stockfish DD. But there are still 6000 games to play.

Stefan
Stefan, obviously you know much better than myself this issue. This is the refference I took to get my -wrong as you say- conclussions:

24-12-13 gl master diff
ELO: 18.62 +-2.0 (95%) LOS: 100.0%
Total: 40000 W: 7761 L: 5619 D: 26620
40000 @ 60+0.05 th 1 Regression tests after king safety tweaks

Thanks for clarifying it.

Regards,

Tom.
User avatar
pohl4711
Posts: 2819
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: Testing Stockfish 11-03-13. 480 Games.

Post by pohl4711 »

ouachita wrote:
Tomcass wrote:The increase in ELO I have found between SF 101213 and SF 231213 (18 points) is exactly the same that the improvement expected by SF Team in the testframe. Surprising but exciting, isn't it?. :wink: .
pohl4711 wrote:Thats not correct.
Stefan,
There are numerous engine testers using differing test bases. Thus, the results will differ. Isn't it reasonable to say that there's no right or wrong, or correct or incorrect, but that each test result stands on its on, and each reviewer has to draw his/her own conclusions on each test result and the tests as a whole?
I only wanted to make clear, that the Stockfish Regression-Test with the +18 Elo gain was not a match 101213 against 231213, but a match of DD against 231213. Nothing else (quality of testing or anything like this)...

Stefan
Tomcass
Posts: 786
Joined: Sun Apr 16, 2006 9:09 pm

Re: Testing Stockfish 11-03-13. 480 Games.

Post by Tomcass »

TESTING STOCKFISH DEVELOPMENT 261213 = 480 GAMES.

Bench: 7665034 Timestamp: 1388056103

i7 980 3.33 Ghz.
6 real cores
Ponder: Off.
GUI: Fritz 12
Book: Perfect 2012c
No tablebases. No RTB used.
Hash 512
Relative Speed: 28.66
Knodes per second: 13.759

Time Control= 4+0

Stockfish 261213 64 SSE4.2x - Houdini 4 x64_st_X6_CT0 22.5 - 17.5 +10/=25/-5 56.25%
Stockfish 261213 64 SSE4.2x - Komodo 6 64-bitNOBx6 23.0 - 17.0 +10/=26/-4 57.50%
Stockfish 261213 64 SSE4.2x - Critter 1.6a 64-bitX6_NOB 28.0 - 12.0 +16/=24/-0 70.00%

Time Control= 2+2

Stockfish 261213 64 SSE4.2x - Houdini 4 x64_st_X6_CT0 21.5 - 18.5 +9/=25/-6 53.75%
Stockfish 261213 64 SSE4.2x - Komodo 6 64-bitNOBx6 22.5 - 17.5 +12/=21/-7 56.25%
Stockfish 261213 64 SSE4.2x - Critter 1.6a 64-bitX6_NOB 26.5 - 13.5 +15/=23/-2 66.25%

240 Games = http://www.mediafire.com/view/hiwwli1um ... 0games.pgn
Score using 6 Cores= 144.0 – 96.0 = 60.00%

i7 975 3.33 Ghz.
4 real cores
Ponder: Off.
GUI: Fritz 12
Book: Perfect 2012c
No tablebases. No RTB used.
Hash 256
Relative Speed: 20.62
Knodes per second: 9.899

Time Control = 4+0

Stockfish 261213 64 SSE4.2x - Houdini 4 x64xCT0 16.5 - 23.5 +6/=21/-13 41.25%
Stockfish 261213 64 SSE4.2x - Komodo 6 64-bitx4_NOB 25.0 - 15.0 +15/=20/-5 62.50%
Stockfish 261213 64 SSE4.2x - Critter 1.6a 64-bitnob_4 27.5 - 12.5 +18/=19/-3 68.75%

Time Control = 2+2

Stockfish 261213 64 SSE4.2x - Houdini 4 x64xCT0 18.0 - 22.0 +6/=24/-10 45.00%
Stockfish 261213 64 SSE4.2x - Komodo 6 64-bitx4_NOB 25.5 - 14.5 +16/=19/-5 63.75%
Stockfish 261213 64 SSE4.2x - Critter 1.6a 64-bitnob_4 27.0 - 13.0 +16/=22/-2 67.50%

240 Games= http://www.mediafire.com/view/oj01jeb79 ... 0games.pgn
Score using 4 Cores= 139.5 – 100.5 = 58.12%

Segmenting by Time Control:

Fixed TC = 142.5 – 97.5 = 59.37%
Incremental TC = 141.0 – 99.0 = 58.75%

Global Score= 283.5 – 196.5 = 59.06%

Against : Houdini 4.0 St. Ct0 (3233) = 49.06% ; Komodo 6 (3162) = 60.00% ; Critter 1.6a (3093) = 68.12%

Average Estimated Elo Opponents = 3163
Estimated Elo Performance= 3226


A bit below the best perfomer dev. version 231213 (3232).

Regards,

Tom.
ouachita
Posts: 454
Joined: Tue Jan 15, 2013 4:33 pm
Location: Ritz-Carlton, NYC
Full name: Bobby Johnson

Re: Testing Stockfish 11-03-13. 480 Games.

Post by ouachita »

The unpredictability of the weather comes to mind with these test results from Страна:

SF Dec 25th 2013 2013

Final
-----

Stockfish 091213SL 64 SSE4.2 8.0
Stockfish 111113SL 64 SSE4.2 7.0

Semi-final
----------

Stockfish 071213 64 SSE4.2SB 8.5
Stockfish 091213SL 64 SSE4.2 9.5

Stockfish 111113SL 64 SSE4.2 9.0
Stockwood 061213 64 SSE4.2 8.0

Quarter-final
-------------

Stockfish 071213 64 SSE4.2SB 8.0
Stockfish 151213IP 64 SSE4.2 7.0

Stockfish 151213 64 SSE4.2 SB 7.0
Stockfish 091213SL 64 SSE4.2 8.0

Stockfish 111113SL 64 SSE4.2 9.0
Stockfish 251213 64 SSE4.2 SB 8.0

Stockfish 191213 64 SSE4.2 SB 6.5
Stockwood 061213 64 SSE4.2 8.5

Round 1
-------

Stockfish IpmanSE SSE4.2 6.0
Stockfish 071213 64 SSE4.2SB 9.0

Stockfish 151213IP 64 SSE4.2 9.0
Stockfish 231213 64 SSE4.2 6.0

Stockfish 251213 64 SSE4.2 6.0
Stockfish 151213 64 SSE4.2 SB 9.0

Stockfish 091213SL 64 SSE4.2 8.5
Stockfish DD 64 SSE4.2 6.5

Stockfish 4 64 SSE4.2 6.0
Stockfish 111113SL 64 SSE4.2 9.0

Stockfish 151213 64 SSE4.2 6.0
Stockfish 251213 64 SSE4.2 SB 9.0

Stockfish 191213 64 SSE4.2 SB 8.0
Stockfish 191213 64 SSE4.2 7.0

Stockfish 071213 64 SSE4.2 5.5
Stockwood 061213 64 SSE4.2 9.5
SIM, PhD, MBA, PE
Tomcass
Posts: 786
Joined: Sun Apr 16, 2006 9:09 pm

Re: Testing Stockfish 11-03-13. 480 Games.

Post by Tomcass »

TESTING STOCKFISH DEVELOPMENT 291213 = 480 GAMES.

Bench: 7335588 Timestamp: 1388309619

i7 980 3.33 Ghz.
6 real cores
Ponder: Off.
GUI: Fritz 12
Book: Perfect 2012c
No tablebases. No RTB used.
Hash 512
Relative Speed: 28.66
Knodes per second: 13.759

Time Control= 4+0

Stockfish 291213 64 SSE4.2x - Houdini 4 x64_st_X6_CT0 20.0 - 20.0 +7/=26/-7 50.00%
Stockfish 291213 64 SSE4.2x - Komodo 6 64-bitNOBx6 22.0 - 18.0 +11/=22/-7 55.00%
Stockfish 291213 64 SSE4.2x - Critter 1.6a 64-bitX6_NOB 28.0 - 12.0 +18/=20/-2 70.00%

Time Control= 2+2

Stockfish 291213 64 SSE4.2x - Houdini 4 x64_st_X6_CT0 18.5 - 21.5 +6/=25/-9 46.25%
Stockfish 291213 64 SSE4.2x - Komodo 6 64-bitNOBx6 24.0 - 16.0 +16/=16/-8 60.00%
Stockfish 291213 64 SSE4.2x - Critter 1.6a 64-bitX6_NOB 26.0 - 14.0 +14/=24/-2 65.00%


240 Games = http://www.mediafire.com/view/725ythv1x ... 0games.pgn
Score using 6 Cores= 138.5 – 101.5 = 57.71%

i7 975 3.33 Ghz.
4 real cores
Ponder: Off.
GUI: Fritz 12
Book: Perfect 2012c
No tablebases. No RTB used.
Hash 256
Relative Speed: 20.62
Knodes per second: 9.899

Time Control = 4+0

Stockfish 291213 64 SSE4.2x - Houdini 4 x64xCT0 19.0 - 21.0 +6/=26/-8 47.50%
Stockfish 291213 64 SSE4.2x - Komodo 6 64-bitx4 27.0 - 13.0 +17/=20/-3 67.50%
Stockfish 291213 64 SSE4.2x - Critter 1.6a 64-bitnob 23.5 - 16.5 +16/=15/-9 58.75%

Time Control = 2+2

Stockfish 291213 64 SSE4.2x - Houdini 4 x64xCT0 18.0 - 22.0 +8/=20/-12 45.00%
Stockfish 291213 64 SSE4.2x - Komodo 6 64-bitx4 28.5 - 11.5 +18/=21/-1 71.25%
Stockfish 291213 64 SSE4.2x - Critter 1.6a 64-bitnob 24.0 - 16.0 +12/=24/-4 60.00%
240 Games
http://www.mediafire.com/view/5r9s9kvnv ... 0games.pgn
Score using 4 cores = 140.0 – 100.0 = 58.33%

Segmenting by Time Control:

Fixed TC = 139.5 – 100.5 = 58.12%
Incremental TC = 139.0 – 101.0 = 57.92%

Global Score= 278.5 – 201.5 = 58.02%

Against : Houdini 4.0 St. Ct0 (3233) = 47.19% ; Komodo 6 (3162) = 63.44% ; Critter 1.6a (3093) = 63.44%

Average Estimated Elo Opponents = 3163
Estimated Elo Performance= 3219


Not such a brilliant result this time.

Happy New Year to all members of this forum. :D

Regards,

Tom.
Tomcass
Posts: 786
Joined: Sun Apr 16, 2006 9:09 pm

Re: Testing Stockfish 11-03-13. 480 Games.

Post by Tomcass »

TESTING STOCKFISH DEVELOPMENT 311213 = 480 GAMES.

Bench: 7516178 Timestamp: 1388500072

i7 980 3.33 Ghz.
6 real cores
Ponder: Off.
GUI: Fritz 12
Book: Perfect 2012c
No tablebases. No RTB used.
Hash 512
Relative Speed: 28.66
Knodes per second: 13.759

Time Control= 4+0

Stockfish 311213 64 SSE4.2x - Houdini 4 x64_st_X6_CT0 19.0 - 21.0 +9/=20/-11 47.50%
Stockfish 311213 64 SSE4.2x - Komodo 6 64-bitNOBx6 23.0 - 17.0 +12/=22/-6 57.50%
Stockfish 311213 64 SSE4.2x - Critter 1.6a 64-bitX6_NOB 29.0 - 11.0 +19/=20/-1 72.50%

Time Control= 2+2

Stockfish 311213 64 SSE4.2x - Houdini 4 x64_st_X6_CT0 17.5 - 22.5 +2/=31/-7 43.75%
Stockfish 311213 64 SSE4.2x - Komodo 6 64-bitNOBx6 24.0 - 16.0 +14/=20/-6 60.00%
Stockfish 311213 64 SSE4.2x - Critter 1.6a 64-bitX6_NOB 22.5 - 17.5 +10/=25/-5 56.25%

240 Games = http://www.mediafire.com/view/t47s1yuv6 ... 0games.pgn
Score using 6 Cores= 135.0 – 105.0 = 56.25%

i7 975 3.33 Ghz.
4 real cores
Ponder: Off.
GUI: Fritz 12
Book: Perfect 2012c
No tablebases. No RTB used.
Hash 256
Relative Speed: 20.62
Knodes per second: 9.899

Time Control = 4+0

Stockfish 311213 64 SSE4.2x - Houdini 4 x64xCT0 20.0 - 20.0 +8/=24/-8 50.00%
Stockfish 311213 64 SSE4.2x - Komodo 6 64-bitx4 22.0 - 18.0 +11/=22/-7 55.00%
Stockfish 311213 64 SSE4.2x - Critter 1.6a 64-bitnob 27.0 - 13.0 +18/=18/-4 67.50%

Time Control= 2+2

Stockfish 311213 64 SSE4.2x - Houdini 4 x64xCT0 18.0 - 22.0 +5/=26/-9 45.00%
Stockfish 311213 64 SSE4.2x - Komodo 6 64-bitx4 22.0 - 18.0 +11/=22/-7 55.00%
Stockfish 311213 64 SSE4.2x - Critter 1.6a 64-bitnob 26.5 - 13.5 +17/=19/-4 66.25%

240 Games=
http://www.mediafire.com/view/zt0bqtmbv ... 0games.pgn
Score using 4 Cores= 135.5 – 104.5 = 56.46%

Segmenting by Time Control:

Fixed TC = 140.0 – 100.0 = 58.33%
Incremental TC = 130.5 – 109.5 = 54.37%

Global Score= 270.5 – 209.5 = 56.35%

Against : Houdini 4.0 St. Ct0 (3233) = 46.56% ; Komodo 6 (3162) = 56.87% ; Critter 1.6a (3093) = 65.63%

Average Estimated Elo Opponents = 3163
Estimated Elo Performance= 3207


Perhaps SF has been very unlucky in this test, but the score is 25 EEP below the best development version (SF 231213 – 3232 EEP) and even 5 EEP below Stockfish DD (3212 EEP). I have started to retest SF 231213 to minimize statistical errors.

Regards,

Tom
ouachita
Posts: 454
Joined: Tue Jan 15, 2013 4:33 pm
Location: Ritz-Carlton, NYC
Full name: Bobby Johnson

Re: Testing Stockfish 11-03-13. 480 Games.

Post by ouachita »

Tomcass wrote: Time Control= 2+2
Stockfish 311213 64 SSE4.2x - Houdini 4 x64xCT0 18.0 - 22.0 +5/=26/-9 45.00%
SF311213 v. H4B, Blitz 3m+1s, one core


1 Houdini 4 Pro x64B +31 +27/=55/-18 54.50% 54.5/100
2 Stockfish 311213 64 SSE4.2 -31 +18/=55/-27 45.50% 45.5/100

100 games

"Don't shoot the messenger" (Shakespeare in Henry IV, Part 2)
SIM, PhD, MBA, PE
Tomcass
Posts: 786
Joined: Sun Apr 16, 2006 9:09 pm

Re: Testing Stockfish 11-03-13. 480 Games.

Post by Tomcass »

TESTING STOCKFISH DEVELOPMENT 231213 (SECOND LEG) = 480 GAMES. (481 to 960)

Timestamp: 1387828530
Bench: 6835416

i7 980 3.33 Ghz.
6 real cores
Ponder: Off.
GUI: Fritz 12
Book: Perfect 2012c
No tablebases. No RTB used.
Hash 512
Relative Speed: 28.66
Knodes per second: 13.759

Time Control= 4+0

Stockfish 231213 64 SSE4.2x - Houdini 4 x64_st_X6_CT0 21.5 - 18.5 +12/=19/-9 53.75%
Stockfish 231213 64 SSE4.2x - Komodo 6 64-bitNOBx6 25.0 - 15.0 +16/=18/-6 62.50%
Stockfish 231213 64 SSE4.2x - Critter 1.6a 64-bitX6_NOB 27.5 - 12.5 +16/=23/-1 68.75%

Time Control= 2+2

Stockfish 231213 64 SSE4.2x - Houdini 4 x64_st_X6_CT0 20.5 - 19.5 +11/=19/-10 51.25%
Stockfish 231213 64 SSE4.2x - Komodo 6 64-bitNOBx6 24.5 - 15.5 +15/=19/-6 61.25%
Stockfish 231213 64 SSE4.2x - Critter 1.6a 64-bitX6_NOB_26.0 - 14.0 +17/=18/-5 65.00%

240 Games = http://www.mediafire.com/view/f2ggfokec ... es_bis.pgn
Score using 6 Cores= 145.0 – 95.0 = 60.42%

i7 975 3.33 Ghz.
4 real cores
Ponder: Off.
GUI: Fritz 12
Book: Perfect 2012c
No tablebases. No RTB used.
Hash 256
Relative Speed: 20.62
Knodes per second: 9.899

Time Control = 4+0

Stockfish 231213 64 SSE4.2x - Houdini 4 x64xCT0 19.0 - 21.0 +9/=20/-11 47.50%
Stockfish 231213 64 SSE4.2x - Komodo 6 64-bitx4_ 24.5 - 15.5 +14/=21/-5 61.25%
Stockfish 231213 64 SSE4.2x - Critter 1.6a 64-bitnob 25.5 - 14.5 +14/=23/-3 63.75%

Time Control= 2+2

Stockfish 231213 64 SSE4.2x - Houdini 4 x64xCT0 18.5 - 21.5 +6/=25/-9 46.25%
Stockfish 231213 64 SSE4.2x - Komodo 6 64-bitx4_ 24.0 - 16.0 +14/=20/-6 60.00%
Stockfish 231213 64 SSE4.2x - Critter 1.6a 64-bitnob 29.0 - 11.0 +21/=16/-3 72.50%

240 Games = http://www.mediafire.com/view/8ik16a8j8 ... es_bis.pgn

Score Using 4 cores = 140.5 – 99.5 = 58.54%

Segmenting by Time Control:

Fixed TC = 143.0 – 97.0 = 59.58%
Incremental TC = 142.5 – 97.5 = 59.38%

Global Score= 285.5 – 194.5 = 59.48%

Against : Houdini 4.0 St. Ct0 (3233) = 49.69% ; Komodo 6 (3162) = 61.25% ; Critter 1.6a (3093) = 67.50%

Average Estimated Elo Opponents = 3163
Estimated Elo Performance= 3229

After 960 games I can confirm that this development version SF 231213 is the strongest one I have tested so far.

First 480 Games= Global Score= 287.5 – 192.5 = 59.90%
Games 481 to 960 (this test) = Global Score= 285.5 – 194.5 = 59.48%

Overall performance= 573.0 – 387.0 = 59.69%

Estimated Elo Performance after 960 Games = 3231


Roughly at the level of the best scoring Houdini 4.0 in my computers –Standard B Contempt 0, 3233-. This version has performed 24 Estimated Elo Points better than SF 311213 -3207-.

Regards,

Tom.
ouachita
Posts: 454
Joined: Tue Jan 15, 2013 4:33 pm
Location: Ritz-Carlton, NYC
Full name: Bobby Johnson

Re: Testing Stockfish 11-03-13. 480 Games.

Post by ouachita »

Tomcass wrote:This version has performed 24 Estimated Elo Points better than SF 311213 -3207-.
This version has been very consistent for you.
SIM, PhD, MBA, PE
Tomcass
Posts: 786
Joined: Sun Apr 16, 2006 9:09 pm

Re: Testing Stockfish 11-03-13. 480 Games.

Post by Tomcass »

TESTING STOCKFISH IPMAN COMPILE 050114 IP = 480 GAMES.

i7 980 3.33 Ghz.
6 real cores
Ponder: Off.
GUI: Fritz 12
Book: Perfect 2012c
No tablebases. No RTB used.
Large Pages allowed.
Hash 512
Relative Speed: 28.66
Knodes per second: 13.759

Time Control= 4+0

Stockfish 050114IP 64 SSE4.2L - Houdini 4 x64_st_X6_CT0 24.0 - 16.0 +11/=26/-3 60.00%
Stockfish 050114IP 64 SSE4.2L - Komodo TCECr 64-bitx6_ 26.0 - 14.0 +15/=22/-3 65.00%
Stockfish 050114IP 64 SSE4.2L - Critter 1.6a 64-bitX6_NOB 26.0 - 14.0 +17/=18/-5 65.00%

Time Control= 2+2

Stockfish 050114IP 64 SSE4.2L - Houdini 4 x64_st_X6_CT0 21.0 - 19.0 +9/=24/-7 52.50%
Stockfish 050114IP 64 SSE4.2L - Komodo TCECr 64-bitx6 21.5 - 18.5 +8/=27/-5 53.75%
Stockfish 050114IP 64 SSE4.2L - Critter 1.6a 64-bitX6_NOB 27.5 - 12.5 +17/=21/-2 68.75%

240 Games = http://www.mediafire.com/view/ssax2eudb ... 0games.pgn
Score using 6 Cores= 146.0 – 94.0 = 60.83%

i7 975 3.33 Ghz.
4 real cores
Ponder: Off.
GUI: Fritz 12
Book: Perfect 2012c
No tablebases. No RTB used.
Large Pages allowed
Hash 256
Relative Speed: 20.62
Knodes per second: 9.899

Time Control = 4+0

Stockfish 050114IP 64 SSE4.2x - Houdini 4 x64xCT0 21.5 - 18.5 +10/=23/-7 53.75%
Stockfish 050114IP 64 SSE4.2x - Komodo TCECr 64-bitx4 24.5 - 15.5 +17/=15/-8 61.25%
Stockfish 050114IP 64 SSE4.2x - Critter 1.6a 64-bitnob_4 27.0 - 13.0 +19/=16/-5 67.50%

Time Control= 2+2

Stockfish 050114IP 64 SSE4.2x - Houdini 4 x64xCT0 24.0 - 16.0 +13/=22/-5 60.00%
Stockfish 050114IP 64 SSE4.2x - Komodo TCECr 64-bitx4 22.0 - 18.0 +11/=22/-7 55.00%
Stockfish 050114IP 64 SSE4.2x - Critter 1.6a 64-bitnob_4 26.5 - 13.5 +16/=21/-3 66.25%

240 Games= http://www.mediafire.com/view/3jj22wolg ... 0Games.pgn
Score using 4 Cores= 145.5 -94.5 = 60.62%

Segmenting by Time Control:

Fixed TC = 149.0 – 91.0 = 62.08%
Incremental TC = 142.5 – 97.5 = 59.37%

Global Score= 291.5 – 188.5 = 60.73%

Against : Houdini 4.0 St. Ct0 (3233) = 56.56% ; Komodo TCECr (3178) = 58.75 % ; Critter 1.6a (3093) = 66.87%

Average Estimated Elo Opponents = 3168
Estimated Elo Performance= 3243


A new best score ever in my tests.

This Ipman’s compile, allowing the use of Large Pages, beats the previous leader Houdini 4.0 St. Contempt 0 by 10 Estimated Elo Points. BRILLIANT!.

Queuing for next 480 games tests:

Stockfish Rockwood 040114 and
Latest development Stockfish.

Regards,

Tom.