The 02/21/2013 development version of Stockfish running at TCEC and available at http://www.abrok.eu/stockfish/ beat the strongest official release (CCRL ELO 3167) at a short control (+10-7=33):
Engine Score St St S-B
1: Stockfish_13022107_x64_modern_sse42 26.5/50 ·················································· 1========11=====1==1===1=0=0===1=1010===0=====10=0 622.75
2: Stockfish-222-sse42-ja-intel 23.5/50 0========00=====0==0===0=1=1===0=0101===1=====01=1 ·················································· 622.75
50 games played / Tournament is finished
Name of the tournament: RobboTournament36
Site/ Country: DMITRI-ASUS, United States
Level: Tournament Game in 3 Minutes
Hardware: Intel(R) Core(TM) i7 CPU Q 720 @ 1.60GHz with 5.9 GB Memory
Operating system: Windows 7 Professional Professional Service Pack 1 (Build 7601) 64 bit
The CCRL ELO can be preliminarily estimated at 3182, in potential competition against Critter 1.6a. It is my understanding that the source for the development version is not available, so I don't know if this version can count as an "open-source" engine. I take it, Stockfish 2.2.2 is still the strongest open-source SF?.. Not for a long time, I suppose.
Finally, a stronger Stockfish
Moderator: Ras
-
- Posts: 2287
- Joined: Sat Jun 02, 2012 2:13 am
Re: Finally, a stronger Stockfish
FYI<, I tried downloading it from the above link, but was blocked by Norton Internet Security, on account of being a supposed virus/threat.Gusev wrote:The 02/21/2013 development version of Stockfish running at TCEC and available at http://www.abrok.eu/stockfish/
Regards,
CL
-
- Posts: 1476
- Joined: Mon Jan 28, 2013 2:51 pm
Re: Finally, a stronger Stockfish
Carl,
Virus Bulletin http://www.virusbtn.com/index regularly publishes ranking of anti-virus software. Norton Internet Security is not the best product on the market.
Dmitri
Virus Bulletin http://www.virusbtn.com/index regularly publishes ranking of anti-virus software. Norton Internet Security is not the best product on the market.
Dmitri
-
- Posts: 325
- Joined: Wed Apr 25, 2012 3:39 pm
Re: Finally, a stronger Stockfish
So this is the recommended version if you settle for just one fish - and want
the strongest one?!
the strongest one?!
-
- Posts: 3241
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: Finally, a stronger Stockfish
The score you obtained is 10-7-33. In terms of statistical information (as a good approx) draws can be removed and you can just look at wins and losses. So you flip a coin 17 times, and you get 10 heads and 7 tails. Nothing to conclude from such a small sample...
As for which SF is the strongest, let's not jump to hasty conclusions. It was widely claimed that SF 2.3.1 was a clear regression compared to SF 2.2.2, only because it had an "unlucky draw" on CCRL. I just had a look on CCRL, and the likelyhood that SF 2.2.2 is better than SF 2.3.1 is only 62% (difference is well within error bar).
Whether SF 2.2.2 is stronger than SF 2.3.1 at long time control is still speculation. What we know is that at fast time control SF 2.3.1 was a small but measurable improvement over SF 2.2.2. They're probably +/- equal at long time control.
The current github version is now slightly stronger than SF 2.3.1, but again that's based on self-play at hyper fast time control, so the elo difference could at least be halfed if we want to forecast the rating list results.
For the end user, I think it really doesn't matter whether you choose SF 2.2.2, SF 2.3.1, or SF "latest". It's probably better to use recent versions to benefit from the latest bugfixes.
As for which SF is the strongest, let's not jump to hasty conclusions. It was widely claimed that SF 2.3.1 was a clear regression compared to SF 2.2.2, only because it had an "unlucky draw" on CCRL. I just had a look on CCRL, and the likelyhood that SF 2.2.2 is better than SF 2.3.1 is only 62% (difference is well within error bar).
Whether SF 2.2.2 is stronger than SF 2.3.1 at long time control is still speculation. What we know is that at fast time control SF 2.3.1 was a small but measurable improvement over SF 2.2.2. They're probably +/- equal at long time control.
The current github version is now slightly stronger than SF 2.3.1, but again that's based on self-play at hyper fast time control, so the elo difference could at least be halfed if we want to forecast the rating list results.
For the end user, I think it really doesn't matter whether you choose SF 2.2.2, SF 2.3.1, or SF "latest". It's probably better to use recent versions to benefit from the latest bugfixes.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
- Posts: 325
- Joined: Wed Apr 25, 2012 3:39 pm
Re: Finally, a stronger Stockfish
ok - thanks!
-
- Posts: 4790
- Joined: Sat Mar 11, 2006 12:42 am
Re: Finally, a stronger Stockfish
Kyodai wrote:So this is the recommended version if you settle for just one fish - and want
the strongest one?!
Not on your life! Listen to Lucas. Even tho the test was run by someone I like and respect- it is not enough games to tell you anything. Stick with 2.3.1 popcount x64 as your only version. You will NEVER EVER gain anything but a migraine testing parameter changes. The magic bullet does not exist.
gts
-
- Posts: 2128
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: Finally, a stronger Stockfish.
Hello Dmitri and Lucas:
Using trinomial distributions (probabilities with wins, draws and loses) and assuming equality between Stockfish_13022107_x64_modern_sse42 and Stockfish-222-sse42-ja-intel, and setting the probability of a draw in 64% (a typical draw ratio in 8000-game matches posted in GitHub site of SF):
For a 50-game match, under the assumptions I said before, the probability of finish in the interval of [23.5, 26.5] (which corresponds to a score in the range [47%, 53%] (~21 Elo of maximum difference without including error bars)) is more less 2*(7.3228% + 8.4118% + 9.1413%) + 9.3983% ~ 59.1501%.
Regards from Spain.
Ajedrecista.
I can not agree more with Lucas. Sorry for Dmitri, but usually nothing can be assured from a 50-game match between similar engines (I mean: much stronger, much weaker... except something like engines with hundreds of Elo difference). +10 -7 =33 means around +21 ± 57 Elo with 95% confidence; likelihood of superiority is around 76%, which is not conclusive (you have no less than 24% of chances of being wrong in your assumption).lucasart wrote:The score you obtained is 10-7-33. In terms of statistical information (as a good approx) draws can be removed and you can just look at wins and losses. So you flip a coin 17 times, and you get 10 heads and 7 tails. Nothing to conclude from such a small sample...
As for which SF is the strongest, let's not jump to hasty conclusions. It was widely claimed that SF 2.3.1 was a clear regression compared to SF 2.2.2, only because it had an "unlucky draw" on CCRL. I just had a look on CCRL, and the likelyhood that SF 2.2.2 is better than SF 2.3.1 is only 62% (difference is well within error bar).
Whether SF 2.2.2 is stronger than SF 2.3.1 at long time control is still speculation. What we know is that at fast time control SF 2.3.1 was a small but measurable improvement over SF 2.2.2. They're probably +/- equal at long time control.
The current github version is now slightly stronger than SF 2.3.1, but again that's based on self-play at hyper fast time control, so the elo difference could at least be halfed if we want to forecast the rating list results.
For the end user, I think it really doesn't matter whether you choose SF 2.2.2, SF 2.3.1, or SF "latest". It's probably better to use recent versions to benefit from the latest bugfixes.
Using trinomial distributions (probabilities with wins, draws and loses) and assuming equality between Stockfish_13022107_x64_modern_sse42 and Stockfish-222-sse42-ja-intel, and setting the probability of a draw in 64% (a typical draw ratio in 8000-game matches posted in GitHub site of SF):
Code: Select all
Probabilities_in_a_trinomial_distribution, ® 2013.
--------------------------------------------------------------------
Probabilities of all possible scores in a match between two engines.
--------------------------------------------------------------------
Write down the number of games of the match (from 2 up to 150):
50
Write down the engines rating difference (between -800 Elo and 800 Elo).
Elo(first player) - Elo(second player):
0
Write down the probability of a draw (%) between 0.0001 % and 99.9999 %
64
Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:
3
End of the calculations. Approximated time spent in calculations: 159 ms.
The results will be saved into Probabilities.txt file, at the same path of this programme.
The results have been successfully saved into two files:
Probabilities.txt
Summary_of_probabilities.txt
Approximated total elapsed time: 414 ms.
Thanks for using Probabilities_in_a_trinomial_distribution. Press Enter to exit.
Code: Select all
+ 10 = 33 - 7
P ~ 1.6818 %
[...]
+ 7 = 33 - 10
P ~ 1.6818 %
Code: Select all
Probabilities for a match of 50 games (rounded up to 0.0001%):
Rating difference (rounded up to 0.01 Elo): 0.00 Elo.
Probability of a win = W ~ 18.0000 %
Probability of a draw = D ~ 64.0000 %
Probability of a lose = L ~ 18.0000 %
-------------------------------------
Points: Probabilities (%):
0.0 0.0000
0.5 0.0000
1.0 0.0000
1.5 0.0000
2.0 0.0000
2.5 0.0000
3.0 0.0000
3.5 0.0000
4.0 0.0000
4.5 0.0000
5.0 0.0000
5.5 0.0000
6.0 0.0000
6.5 0.0000
7.0 0.0000
7.5 0.0000
8.0 0.0000
8.5 0.0000
9.0 0.0000
9.5 0.0000
10.0 0.0000
10.5 0.0000
11.0 0.0000
11.5 0.0000
12.0 0.0000
12.5 0.0000
13.0 0.0000
13.5 0.0000
14.0 0.0000
14.5 0.0000
15.0 0.0001
15.5 0.0004
16.0 0.0011
16.5 0.0030
17.0 0.0075
17.5 0.0179
18.0 0.0402
18.5 0.0855
19.0 0.1717
19.5 0.3258
20.0 0.5847
20.5 0.9918
21.0 1.5909
21.5 2.4133
22.0 3.4622
22.5 4.6982
23.0 6.0306
23.5 7.3228
24.0 8.4118
24.5 9.1413
25.0 9.3983
25.5 9.1413
26.0 8.4118
26.5 7.3228
27.0 6.0306
27.5 4.6982
28.0 3.4622
28.5 2.4133
29.0 1.5909
29.5 0.9918
30.0 0.5847
30.5 0.3258
31.0 0.1717
31.5 0.0855
32.0 0.0402
32.5 0.0179
33.0 0.0075
33.5 0.0030
34.0 0.0011
34.5 0.0004
35.0 0.0001
35.5 0.0000
36.0 0.0000
36.5 0.0000
37.0 0.0000
37.5 0.0000
38.0 0.0000
38.5 0.0000
39.0 0.0000
39.5 0.0000
40.0 0.0000
40.5 0.0000
41.0 0.0000
41.5 0.0000
42.0 0.0000
42.5 0.0000
43.0 0.0000
43.5 0.0000
44.0 0.0000
44.5 0.0000
45.0 0.0000
45.5 0.0000
46.0 0.0000
46.5 0.0000
47.0 0.0000
47.5 0.0000
48.0 0.0000
48.5 0.0000
49.0 0.0000
49.5 0.0000
50.0 0.0000
--------------------------------------------------------------
SUMMARY:
Probability that the first player wins the match ~ 45.3009 %
Probability of a tied match ~ 9.3983 %
Probability that the second player wins the match ~ 45.3009 %
Regards from Spain.
Ajedrecista.
-
- Posts: 3661
- Joined: Wed Mar 08, 2006 8:15 pm
- Full name: Jouni Uski
Re: Finally, a stronger Stockfish.
In my 200 game test latest SF score 49% against 2.3.1
We simply have to wait version 2.4 (they change number when it's better I hope).

Jouni
-
- Posts: 1476
- Joined: Mon Jan 28, 2013 2:51 pm
Re: Finally, a stronger Stockfish
George is right, I did not have time to test this version thoroughly. It will show its true strength at TCEC, though. This is a development version, not an official release. I merely expressed hope that such a new release is not far away.