Finally, a stronger Stockfish

Gusev · Post by **Gusev** » Sun Feb 24, 2013 2:20 am

The 02/21/2013 development version of Stockfish running at TCEC and available at http://www.abrok.eu/stockfish/ beat the strongest official release (CCRL ELO 3167) at a short control (+10-7=33):
Engine Score St St S-B
1: Stockfish_13022107_x64_modern_sse42 26.5/50 ·················································· 1========11=====1==1===1=0=0===1=1010===0=====10=0 622.75
2: Stockfish-222-sse42-ja-intel 23.5/50 0========00=====0==0===0=1=1===0=0101===1=====01=1 ·················································· 622.75

50 games played / Tournament is finished
Name of the tournament: RobboTournament36
Site/ Country: DMITRI-ASUS, United States
Level: Tournament Game in 3 Minutes
Hardware: Intel(R) Core(TM) i7 CPU Q 720 @ 1.60GHz with 5.9 GB Memory
Operating system: Windows 7 Professional Professional Service Pack 1 (Build 7601) 64 bit

The CCRL ELO can be preliminarily estimated at 3182, in potential competition against Critter 1.6a. It is my understanding that the source for the development version is not available, so I don't know if this version can count as an "open-source" engine. I take it, Stockfish 2.2.2 is still the strongest open-source SF?.. Not for a long time, I suppose.

carldaman · Post by **carldaman** » Sun Feb 24, 2013 3:33 am

Gusev wrote:The 02/21/2013 development version of Stockfish running at TCEC and available at http://www.abrok.eu/stockfish/

FYI<, I tried downloading it from the above link, but was blocked by Norton Internet Security, on account of being a supposed virus/threat.

Regards,
CL

Gusev · Post by **Gusev** » Sun Feb 24, 2013 5:09 am

Carl,

Virus Bulletin http://www.virusbtn.com/index regularly publishes ranking of anti-virus software. Norton Internet Security is not the best product on the market.

Dmitri

Kyodai · Post by **Kyodai** » Sun Feb 24, 2013 5:09 am

So this is the recommended version if you settle for just one fish - and want
the strongest one?!

lucasart · Post by **lucasart** » Sun Feb 24, 2013 6:18 am

The score you obtained is 10-7-33. In terms of statistical information (as a good approx) draws can be removed and you can just look at wins and losses. So you flip a coin 17 times, and you get 10 heads and 7 tails. Nothing to conclude from such a small sample...

As for which SF is the strongest, let's not jump to hasty conclusions. It was widely claimed that SF 2.3.1 was a clear regression compared to SF 2.2.2, only because it had an "unlucky draw" on CCRL. I just had a look on CCRL, and the likelyhood that SF 2.2.2 is better than SF 2.3.1 is only 62% (difference is well within error bar).

Whether SF 2.2.2 is stronger than SF 2.3.1 at long time control is still speculation. What we know is that at fast time control SF 2.3.1 was a small but measurable improvement over SF 2.2.2. They're probably +/- equal at long time control.

The current github version is now slightly stronger than SF 2.3.1, but again that's based on self-play at hyper fast time control, so the elo difference could at least be halfed if we want to forecast the rating list results.

For the end user, I think it really doesn't matter whether you choose SF 2.2.2, SF 2.3.1, or SF "latest". It's probably better to use recent versions to benefit from the latest bugfixes.

Kyodai · Post by **Kyodai** » Sun Feb 24, 2013 7:11 am

ok - thanks!

geots · Post by **geots** » Sun Feb 24, 2013 8:44 am

Kyodai wrote:So this is the recommended version if you settle for just one fish - and want
the strongest one?!

Not on your life! Listen to Lucas. Even tho the test was run by someone I like and respect- it is not enough games to tell you anything. Stick with 2.3.1 popcount x64 as your only version. You will NEVER EVER gain anything but a migraine testing parameter changes. The magic bullet does not exist.

gts

Ajedrecista · Post by **Ajedrecista** » Sun Feb 24, 2013 2:19 pm

Hello Dmitri and Lucas:

lucasart wrote:The score you obtained is 10-7-33. In terms of statistical information (as a good approx) draws can be removed and you can just look at wins and losses. So you flip a coin 17 times, and you get 10 heads and 7 tails. Nothing to conclude from such a small sample...

As for which SF is the strongest, let's not jump to hasty conclusions. It was widely claimed that SF 2.3.1 was a clear regression compared to SF 2.2.2, only because it had an "unlucky draw" on CCRL. I just had a look on CCRL, and the likelyhood that SF 2.2.2 is better than SF 2.3.1 is only 62% (difference is well within error bar).

Whether SF 2.2.2 is stronger than SF 2.3.1 at long time control is still speculation. What we know is that at fast time control SF 2.3.1 was a small but measurable improvement over SF 2.2.2. They're probably +/- equal at long time control.

The current github version is now slightly stronger than SF 2.3.1, but again that's based on self-play at hyper fast time control, so the elo difference could at least be halfed if we want to forecast the rating list results.

For the end user, I think it really doesn't matter whether you choose SF 2.2.2, SF 2.3.1, or SF "latest". It's probably better to use recent versions to benefit from the latest bugfixes.

I can not agree more with Lucas. Sorry for Dmitri, but usually nothing can be assured from a 50-game match between similar engines (I mean: much stronger, much weaker... except something like engines with hundreds of Elo difference). +10 -7 =33 means around +21 ± 57 Elo with 95% confidence; likelihood of superiority is around 76%, which is not conclusive (you have no less than 24% of chances of being wrong in your assumption).

Using trinomial distributions (probabilities with wins, draws and loses) and assuming equality between Stockfish_13022107_x64_modern_sse42 and Stockfish-222-sse42-ja-intel, and setting the probability of a draw in 64% (a typical draw ratio in 8000-game matches posted in GitHub site of SF):

Code: Select all

Probabilities_in_a_trinomial_distribution, ® 2013.

--------------------------------------------------------------------
Probabilities of all possible scores in a match between two engines.
--------------------------------------------------------------------

Write down the number of games of the match (from 2 up to 150):

50

Write down the engines rating difference (between -800 Elo and 800 Elo).
Elo(first player) - Elo(second player):

0

Write down the probability of a draw (%) between 0.0001 % and 99.9999 %

64

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

End of the calculations. Approximated time spent in calculations:  159 ms.

The results will be saved into Probabilities.txt file, at the same path of this programme.

The results have been successfully saved into two files:

     Probabilities.txt
     Summary_of_probabilities.txt

Approximated total elapsed time:   414 ms.

Thanks for using Probabilities_in_a_trinomial_distribution. Press Enter to exit.

Code: Select all

+ 10 = 33 -  7
P ~  1.6818 %

[...]

+  7 = 33 - 10
P ~  1.6818 %

Code: Select all

Probabilities for a match of  50 games (rounded up to 0.0001%):
 
Rating difference (rounded up to 0.01 Elo):    0.00 Elo.
 
Probability of a win  = W ~ 18.0000 %
Probability of a draw = D ~ 64.0000 %
Probability of a lose = L ~ 18.0000 %
 
-------------------------------------
 
Points:    Probabilities (%):
 
  0.0           0.0000
  0.5           0.0000
  1.0           0.0000
  1.5           0.0000
  2.0           0.0000
  2.5           0.0000
  3.0           0.0000
  3.5           0.0000
  4.0           0.0000
  4.5           0.0000
  5.0           0.0000
  5.5           0.0000
  6.0           0.0000
  6.5           0.0000
  7.0           0.0000
  7.5           0.0000
  8.0           0.0000
  8.5           0.0000
  9.0           0.0000
  9.5           0.0000
 10.0           0.0000
 10.5           0.0000
 11.0           0.0000
 11.5           0.0000
 12.0           0.0000
 12.5           0.0000
 13.0           0.0000
 13.5           0.0000
 14.0           0.0000
 14.5           0.0000
 15.0           0.0001
 15.5           0.0004
 16.0           0.0011
 16.5           0.0030
 17.0           0.0075
 17.5           0.0179
 18.0           0.0402
 18.5           0.0855
 19.0           0.1717
 19.5           0.3258
 20.0           0.5847
 20.5           0.9918
 21.0           1.5909
 21.5           2.4133
 22.0           3.4622
 22.5           4.6982
 23.0           6.0306
 23.5           7.3228
 24.0           8.4118
 24.5           9.1413
 25.0           9.3983
 25.5           9.1413
 26.0           8.4118
 26.5           7.3228
 27.0           6.0306
 27.5           4.6982
 28.0           3.4622
 28.5           2.4133
 29.0           1.5909
 29.5           0.9918
 30.0           0.5847
 30.5           0.3258
 31.0           0.1717
 31.5           0.0855
 32.0           0.0402
 32.5           0.0179
 33.0           0.0075
 33.5           0.0030
 34.0           0.0011
 34.5           0.0004
 35.0           0.0001
 35.5           0.0000
 36.0           0.0000
 36.5           0.0000
 37.0           0.0000
 37.5           0.0000
 38.0           0.0000
 38.5           0.0000
 39.0           0.0000
 39.5           0.0000
 40.0           0.0000
 40.5           0.0000
 41.0           0.0000
 41.5           0.0000
 42.0           0.0000
 42.5           0.0000
 43.0           0.0000
 43.5           0.0000
 44.0           0.0000
 44.5           0.0000
 45.0           0.0000
 45.5           0.0000
 46.0           0.0000
 46.5           0.0000
 47.0           0.0000
 47.5           0.0000
 48.0           0.0000
 48.5           0.0000
 49.0           0.0000
 49.5           0.0000
 50.0           0.0000
 
--------------------------------------------------------------
 
                           SUMMARY:
 
 Probability that the first player wins the match ~  45.3009 %
                      Probability of a tied match ~   9.3983 %
Probability that the second player wins the match ~  45.3009 %

For a 50-game match, under the assumptions I said before, the probability of finish in the interval of [23.5, 26.5] (which corresponds to a score in the range [47%, 53%] (~21 Elo of maximum difference without including error bars)) is more less 2*(7.3228% + 8.4118% + 9.1413%) + 9.3983% ~ 59.1501%.

Regards from Spain.

Ajedrecista.

Jouni · Post by **Jouni** » Sun Feb 24, 2013 2:44 pm

In my 200 game test latest SF score 49% against 2.3.1

We simply have to wait version 2.4 (they change number when it's better I hope).

Gusev · Post by **Gusev** » Sun Feb 24, 2013 7:25 pm

George is right, I did not have time to test this version thoroughly. It will show its true strength at TCEC, though. This is a development version, not an official release. I merely expressed hope that such a new release is not far away.

Finally, a stronger Stockfish

Finally, a stronger Stockfish

Re: Finally, a stronger Stockfish

Re: Finally, a stronger Stockfish

Re: Finally, a stronger Stockfish

Re: Finally, a stronger Stockfish

Re: Finally, a stronger Stockfish

Re: Finally, a stronger Stockfish

Re: Finally, a stronger Stockfish.

Re: Finally, a stronger Stockfish.

Re: Finally, a stronger Stockfish