Can someone enlighten me as to how stable SF Dev is c/p to official versions? My intention is to run it in IDeA projects for at least a month 24/7, on many cores.
What are the differences between the official releases and the Dev versions? I've heard the official releases get more testing, but SF Dev usually has 10,000 games or more played, so it must be quite stable just to achieve that.
Reliablity of Stockfish Dev
Moderator: Ras
-
Werewolf
- Posts: 2089
- Joined: Thu Sep 18, 2008 10:24 pm
- Full name: Carl Bicknell
-
zullil
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
Re: Reliablity of Stockfish Dev
Extremely stable, in general.Werewolf wrote:Can someone enlighten me as to how stable SF Dev is c/p to official versions?
-
APassionForCriminalJustic
- Posts: 417
- Joined: Sat May 24, 2014 9:16 am
Re: Reliablity of Stockfish Dev
Stable and simply the strongest SF on the planet.Werewolf wrote:Can someone enlighten me as to how stable SF Dev is c/p to official versions? My intention is to run it in IDeA projects for at least a month 24/7, on many cores.
What are the differences between the official releases and the Dev versions? I've heard the official releases get more testing, but SF Dev usually has 10,000 games or more played, so it must be quite stable just to achieve that.
-
Eelco de Groot
- Posts: 4724
- Joined: Sun Mar 12, 2006 2:40 am
- Full name: Eelco de Groot
Re: Reliablity of Stockfish Dev
Maybe a good illustration is that for for instance the TCEC tournament, the Stockfish developers always sent in the very latest development version. The only thing they might do is make a very fast compile of it, especially if Martin gets new hardware, they might want to make a compile for that on a similar system for instance to get some extra speed. But the code is just the latest development code.
The official release got some extra tests but I think that is mainly to make sure something isn't broken and you find out about that a few days after the official release and it could have been tested very simply, but it is something that is only used on an obscure system or not in every day use and had no impact on strength testing etc. It is just a nuisance if you make an official version and three days later you have to patch it when all the testing has already begun.
One other thing maybe is that I always thought IDeA was mainly done with several single thread engines at different positions, and in that case you would not profit from Joona's recent SMP improvements. In that case, Stefan Pohl has not yet detected much progress relative to the fast compile of official SF 6 (with fast timecontrols, but pretty reliable). But if IDeA is done with multithreaded engines as you write then I would certainly use the Dev version even if the compile is a bit slower than the SF6.
The official release got some extra tests but I think that is mainly to make sure something isn't broken and you find out about that a few days after the official release and it could have been tested very simply, but it is something that is only used on an obscure system or not in every day use and had no impact on strength testing etc. It is just a nuisance if you make an official version and three days later you have to patch it when all the testing has already begun.
One other thing maybe is that I always thought IDeA was mainly done with several single thread engines at different positions, and in that case you would not profit from Joona's recent SMP improvements. In that case, Stefan Pohl has not yet detected much progress relative to the fast compile of official SF 6 (with fast timecontrols, but pretty reliable). But if IDeA is done with multithreaded engines as you write then I would certainly use the Dev version even if the compile is a bit slower than the SF6.
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
-
Isaac
- Posts: 265
- Joined: Sat Feb 22, 2014 8:37 pm
Re: Reliablity of Stockfish Dev
As a rule of thumb it should be perfectly reliable and stable.
In the past (in 2012) there had been unstable dev versions that couldn't complete a single game on my machine (SF would crash) but they fixed that problem by reverting a patch a few days later.
Another "unreliable" period and also more recent, is when the transition to c++11 occurred I think. For some machines there was a big drop in elo (I think the ones running on Windows).
So if you plan on to use SF for over a month you can either go blindly with the dev version and you'd have an over 95% chance to have a reliable version. In case you want more security, just ask here whether the current SF dev is stable/reliable.
In the past (in 2012) there had been unstable dev versions that couldn't complete a single game on my machine (SF would crash) but they fixed that problem by reverting a patch a few days later.
Another "unreliable" period and also more recent, is when the transition to c++11 occurred I think. For some machines there was a big drop in elo (I think the ones running on Windows).
So if you plan on to use SF for over a month you can either go blindly with the dev version and you'd have an over 95% chance to have a reliable version. In case you want more security, just ask here whether the current SF dev is stable/reliable.
-
Werewolf
- Posts: 2089
- Joined: Thu Sep 18, 2008 10:24 pm
- Full name: Carl Bicknell
Re: Reliablity of Stockfish Dev
Many thanks for all your replies.
IDeA is indeed single threaded (usually) but looking at lots of positions simultaneously.
Could someone estimate the elo gain of the latest Dev over SF6 on a single core, at LTC?
20 elo?
(I'll be switching from Komodo 8 so even 20 elo is more than it sounds)
IDeA is indeed single threaded (usually) but looking at lots of positions simultaneously.
Could someone estimate the elo gain of the latest Dev over SF6 on a single core, at LTC?
20 elo?
(I'll be switching from Komodo 8 so even 20 elo is more than it sounds)
-
Eelco de Groot
- Posts: 4724
- Joined: Sun Mar 12, 2006 2:40 am
- Full name: Eelco de Groot
Re: Reliablity of Stockfish Dev
Hello Carl! Well, maybe you would find that kind of increase in a new regression test, I don't know if anything is scheduled right now. The latest regression test numbers at 60s + 0.05s were discussed for instance here and that was with the March 29th dev version, you can see on the commit page https://github.com/official-stockfish/S ... its/master that there have been several functional patches since then.
But Stefan Pohl did not see anything like this sofar, against other engines. I'm curious what he will see with the latest version. I like the code at the moment. My own modified Stockfish just ran a 'stress test' with no crashes for 20 hours, so I think that is a stable version also. But in mine I'm seeing a bit much Fail Highs followed by Fail Lows etc at great depth. That usually means that something is not quite tuned right yet, the search windows are not keeping the search inside of it. I'm trying to think what can be done about that, apart from just throwing out the Fail Highs and Lows that Dann is doing
It looks like this at high depths:
[d]r3k2r/1q2bp2/2b1p2p/pp2p1pP/4N1P1/PP1Q1P2/2P5/1KBR3R w q -
Engine: Sf20150412_002 MOD MP (32 bit Q6700, 4 threads, 512 MB)
by Tord Romstad, Marco Costalba and Joona
.
.
.
36/62 52:37 -0.63 26.Bb2 Bxe4 27.Qxe4 Qxe4 28.fxe4 f6
29.Rd3 Rc8 30.Rf1 Rf8 31.Bc3 b4
32.axb4 axb4 33.Bb2 Rc5 34.Kc1 Rc6
35.c4 bxc3 36.Rxc3 Kd7 37.Rxc6 Kxc6
38.Kc2 Ra8 39.Rc1 (9.042.411.366) 2863
37/63 62:27 -0.69-- 26.Bb2 Bxe4 (10.831.066.123) 2890
37/63 63:53 -0.63++ 26.Bb2 (11.083.001.012) 2890
37/63 64:43 -0.69-- 26.Bb2 Bxe4 (11.225.236.455) 2890
37/63 66:46 -0.61++ 26.Bb2 (11.600.445.271) 2895
37/64 72:11 -0.72-- 26.Bb2 Bxe4 (12.504.101.410) 2886
37/65 84:44 -0.56++ 26.Bb2 (14.932.768.153) 2936
37/65 125:03 -0.77 26.Rhe1 Qc7 27.Bb2 Rd8 28.Qf1 Rxd1+
29.Rxd1 f6 30.Qd3 Qd7 31.Qe2 Qb7
32.a4 Bxe4 33.Qxe4 Qxe4 34.fxe4 bxa4
35.bxa4 Rf8 36.Kc1 Bc5 37.Bc3 Ke7
38.Kd2 f5 39.Bxe5 (21.898.481.930) 2918
38/65 133:33 -0.71++ 26.Rhe1 (23.157.647.903) 2889
38/65 146:04 -0.65++ 26.Bb2 (25.131.894.289) 2867
38/65 148:13 -0.71-- 26.Bb2 Bxe4 (25.473.172.401) 2864
38/68 153:21 -0.63++ 26.Bb2 (26.399.308.782) 2868
38/68 157:45 -0.65 26.Bb2 Bxe4 27.Qxe4 Qxe4 28.fxe4 f6
29.Rhf1 Rd8 30.Rxd8+ Kxd8 31.Bc3 b4
32.axb4 axb4 33.Bd2 Kd7 34.Kc1 Ra8
35.Kd1 Ra1+ 36.Ke2 Ra7 37.Be3 Ra2
38.Kd3 Ke8 39.Kd2 (27.100.749.494) 2863
39/68 191:58 -0.71-- 26.Bb2 Bxe4 (31.892.119.192) 2768
39/68 198:51 -0.65++ 26.Bb2 (33.085.746.914) 2773
39/68 209:04 -0.71-- 26.Bb2 Bxe4 (34.879.923.372) 2780
39/68 213:11 -0.63++ 26.Bb2 (35.589.812.943) 2782
39/69 226:32 -0.74-- 26.Bb2 Bxe4 (37.892.646.150) 2787
39/69 240:46 -0.58++ 26.Bb2 (40.542.791.531) 2806
39/69 275:43 -0.73 26.Rhe1 Qc7 27.Bb2 Rd8 28.Qe2 Rxd1+
29.Rxd1 f6 30.c4 b4 31.axb4 axb4
32.Qd3 Rg8 33.Kc2 Kf7 34.Nf2 Qb7
35.Ne4 Qc8 36.Nd2 Rg7 37.Ne4 Qc7
38.Qe3 Kg8 39.Ra1 (46.936.452.775) 2837
40/69 289:10 -0.67++ 26.Rhe1 (49.309.741.383) 2842
40/69 304:55 -0.73-- 26.Rhe1 Qc7 (52.131.732.383) 2849
40/69 310:56 -0.67++ 26.Rhe1 (53.202.878.435) 2851
40/69 330:57 -0.75-- 26.Rhe1 Qc7 (56.778.211.986) 2859
40/73 344:46 -0.64++ 26.Rhe1 (59.304.478.254) 2866
40/73 390:34 -0.68 26.Bb2 Bxe4 27.Qxe4 Qxe4 28.fxe4 f6
29.Rhf1 Rc8 30.Rd3 a4 31.Bc3 Rf8
32.Kb2 axb3 33.cxb3 Rc7 34.Rf2 Ra7
35.b4 Rd7 36.Rxd7 Kxd7 37.Kb3 Ra8
38.Be1 Rc8 39.Bc3 (67.520.794.800) 2881
41/73 442:59 -0.75-- 26.Bb2 Bxe4 (76.959.897.750) 2895
41/73 454:48 -0.68++ 26.Bb2 (79.172.140.561) 2901
41/73 471:07 -0.75-- 26.Bb2 Bxe4 (82.325.360.801) 2912
41/73 479:30 -0.67++ 26.Bb2 (83.992.815.856) 2919
41/73 562:47 -0.68 26.Rhe1 Qc7 27.Bb2 Rd8 28.Qe2 f6
29.c4 b4 30.axb4 Rxd1+ 31.Rxd1 axb4
32.Rd2 Kf7 33.Rd1 Kg7 34.Qc2 Rd8
35.Rxd8 Qxd8 36.Qe2 Kf8 37.Kc2 Qd7
38.Qe3 Kg7 39.Qe2 (99.680.708.891) 2952
42/73 626:01 -0.64 26.Rhe1 Qc7 27.Bb2 Rd8 28.Qe2 f6
29.c4 b4 30.axb4 axb4 31.Rd3 Rxd3
32.Qxd3 Qd8 33.Qe2 Kf7 34.Rf1 Kg8
35.Qe3 Rh7 36.Kc1 Qc7 37.Rd1 Rf7
38.Ng3 Kh8 39.Ne4 (111.324.032.370) 2963
43/73 720:32 -0.70-- 26.Rhe1 Qc7 (128.469.958.448) 2971
43/73 750:37 -0.64++ 26.Rhe1 (134.049.438.810) 2976
43/73 770:36 -0.70-- 26.Rhe1 Qc7 (137.627.500.339) 2976
43/73 797:43 -0.62++ 26.Rhe1 (142.565.201.556) 2978
43/73 928:12 -0.73 26.Rhe1 Qc7 27.c4 Rd8 28.Qe2 Rxd1
29.Rxd1 b4 30.axb4 axb4 31.Qd3 Rg8
32.Kb2 Kf8 33.Kc2 Kg7 34.Bb2 Ra8
35.Qe3 Kg8 36.Ra1 Rxa1 37.Bxa1 Qa5
38.Bb2 f6 39.Qd3 (164.876.372.552) 2960
44/76 1098:19-0.67++ 26.Rhe1 (197.471.955.869) 2996
44/76 1192:06-0.73-- 26.Rhe1 Qc7 (214.448.589.316) 2998
44/76 1265:51-0.67++ 26.Rhe1 (228.062.365.025) 3002
best move: Rh1-e1 time: 1272:24.253 min n/s: 3.002.753 nodes: 229.208.215.176
It's not too bad but I still think the engine is losing too much time in the researches. It's not all lost because with every re-search the move ordering is still a bit better, and the search windows are still increased a little in width as a whole otherwise sometimes you would see cases were an iteration is never resolved... That would be bad...
But Stefan Pohl did not see anything like this sofar, against other engines. I'm curious what he will see with the latest version. I like the code at the moment. My own modified Stockfish just ran a 'stress test' with no crashes for 20 hours, so I think that is a stable version also. But in mine I'm seeing a bit much Fail Highs followed by Fail Lows etc at great depth. That usually means that something is not quite tuned right yet, the search windows are not keeping the search inside of it. I'm trying to think what can be done about that, apart from just throwing out the Fail Highs and Lows that Dann is doing
It looks like this at high depths:
[d]r3k2r/1q2bp2/2b1p2p/pp2p1pP/4N1P1/PP1Q1P2/2P5/1KBR3R w q -
Engine: Sf20150412_002 MOD MP (32 bit Q6700, 4 threads, 512 MB)
by Tord Romstad, Marco Costalba and Joona
.
.
.
36/62 52:37 -0.63 26.Bb2 Bxe4 27.Qxe4 Qxe4 28.fxe4 f6
29.Rd3 Rc8 30.Rf1 Rf8 31.Bc3 b4
32.axb4 axb4 33.Bb2 Rc5 34.Kc1 Rc6
35.c4 bxc3 36.Rxc3 Kd7 37.Rxc6 Kxc6
38.Kc2 Ra8 39.Rc1 (9.042.411.366) 2863
37/63 62:27 -0.69-- 26.Bb2 Bxe4 (10.831.066.123) 2890
37/63 63:53 -0.63++ 26.Bb2 (11.083.001.012) 2890
37/63 64:43 -0.69-- 26.Bb2 Bxe4 (11.225.236.455) 2890
37/63 66:46 -0.61++ 26.Bb2 (11.600.445.271) 2895
37/64 72:11 -0.72-- 26.Bb2 Bxe4 (12.504.101.410) 2886
37/65 84:44 -0.56++ 26.Bb2 (14.932.768.153) 2936
37/65 125:03 -0.77 26.Rhe1 Qc7 27.Bb2 Rd8 28.Qf1 Rxd1+
29.Rxd1 f6 30.Qd3 Qd7 31.Qe2 Qb7
32.a4 Bxe4 33.Qxe4 Qxe4 34.fxe4 bxa4
35.bxa4 Rf8 36.Kc1 Bc5 37.Bc3 Ke7
38.Kd2 f5 39.Bxe5 (21.898.481.930) 2918
38/65 133:33 -0.71++ 26.Rhe1 (23.157.647.903) 2889
38/65 146:04 -0.65++ 26.Bb2 (25.131.894.289) 2867
38/65 148:13 -0.71-- 26.Bb2 Bxe4 (25.473.172.401) 2864
38/68 153:21 -0.63++ 26.Bb2 (26.399.308.782) 2868
38/68 157:45 -0.65 26.Bb2 Bxe4 27.Qxe4 Qxe4 28.fxe4 f6
29.Rhf1 Rd8 30.Rxd8+ Kxd8 31.Bc3 b4
32.axb4 axb4 33.Bd2 Kd7 34.Kc1 Ra8
35.Kd1 Ra1+ 36.Ke2 Ra7 37.Be3 Ra2
38.Kd3 Ke8 39.Kd2 (27.100.749.494) 2863
39/68 191:58 -0.71-- 26.Bb2 Bxe4 (31.892.119.192) 2768
39/68 198:51 -0.65++ 26.Bb2 (33.085.746.914) 2773
39/68 209:04 -0.71-- 26.Bb2 Bxe4 (34.879.923.372) 2780
39/68 213:11 -0.63++ 26.Bb2 (35.589.812.943) 2782
39/69 226:32 -0.74-- 26.Bb2 Bxe4 (37.892.646.150) 2787
39/69 240:46 -0.58++ 26.Bb2 (40.542.791.531) 2806
39/69 275:43 -0.73 26.Rhe1 Qc7 27.Bb2 Rd8 28.Qe2 Rxd1+
29.Rxd1 f6 30.c4 b4 31.axb4 axb4
32.Qd3 Rg8 33.Kc2 Kf7 34.Nf2 Qb7
35.Ne4 Qc8 36.Nd2 Rg7 37.Ne4 Qc7
38.Qe3 Kg8 39.Ra1 (46.936.452.775) 2837
40/69 289:10 -0.67++ 26.Rhe1 (49.309.741.383) 2842
40/69 304:55 -0.73-- 26.Rhe1 Qc7 (52.131.732.383) 2849
40/69 310:56 -0.67++ 26.Rhe1 (53.202.878.435) 2851
40/69 330:57 -0.75-- 26.Rhe1 Qc7 (56.778.211.986) 2859
40/73 344:46 -0.64++ 26.Rhe1 (59.304.478.254) 2866
40/73 390:34 -0.68 26.Bb2 Bxe4 27.Qxe4 Qxe4 28.fxe4 f6
29.Rhf1 Rc8 30.Rd3 a4 31.Bc3 Rf8
32.Kb2 axb3 33.cxb3 Rc7 34.Rf2 Ra7
35.b4 Rd7 36.Rxd7 Kxd7 37.Kb3 Ra8
38.Be1 Rc8 39.Bc3 (67.520.794.800) 2881
41/73 442:59 -0.75-- 26.Bb2 Bxe4 (76.959.897.750) 2895
41/73 454:48 -0.68++ 26.Bb2 (79.172.140.561) 2901
41/73 471:07 -0.75-- 26.Bb2 Bxe4 (82.325.360.801) 2912
41/73 479:30 -0.67++ 26.Bb2 (83.992.815.856) 2919
41/73 562:47 -0.68 26.Rhe1 Qc7 27.Bb2 Rd8 28.Qe2 f6
29.c4 b4 30.axb4 Rxd1+ 31.Rxd1 axb4
32.Rd2 Kf7 33.Rd1 Kg7 34.Qc2 Rd8
35.Rxd8 Qxd8 36.Qe2 Kf8 37.Kc2 Qd7
38.Qe3 Kg7 39.Qe2 (99.680.708.891) 2952
42/73 626:01 -0.64 26.Rhe1 Qc7 27.Bb2 Rd8 28.Qe2 f6
29.c4 b4 30.axb4 axb4 31.Rd3 Rxd3
32.Qxd3 Qd8 33.Qe2 Kf7 34.Rf1 Kg8
35.Qe3 Rh7 36.Kc1 Qc7 37.Rd1 Rf7
38.Ng3 Kh8 39.Ne4 (111.324.032.370) 2963
43/73 720:32 -0.70-- 26.Rhe1 Qc7 (128.469.958.448) 2971
43/73 750:37 -0.64++ 26.Rhe1 (134.049.438.810) 2976
43/73 770:36 -0.70-- 26.Rhe1 Qc7 (137.627.500.339) 2976
43/73 797:43 -0.62++ 26.Rhe1 (142.565.201.556) 2978
43/73 928:12 -0.73 26.Rhe1 Qc7 27.c4 Rd8 28.Qe2 Rxd1
29.Rxd1 b4 30.axb4 axb4 31.Qd3 Rg8
32.Kb2 Kf8 33.Kc2 Kg7 34.Bb2 Ra8
35.Qe3 Kg8 36.Ra1 Rxa1 37.Bxa1 Qa5
38.Bb2 f6 39.Qd3 (164.876.372.552) 2960
44/76 1098:19-0.67++ 26.Rhe1 (197.471.955.869) 2996
44/76 1192:06-0.73-- 26.Rhe1 Qc7 (214.448.589.316) 2998
44/76 1265:51-0.67++ 26.Rhe1 (228.062.365.025) 3002
best move: Rh1-e1 time: 1272:24.253 min n/s: 3.002.753 nodes: 229.208.215.176
It's not too bad but I still think the engine is losing too much time in the researches. It's not all lost because with every re-search the move ordering is still a bit better, and the search windows are still increased a little in width as a whole otherwise sometimes you would see cases were an iteration is never resolved... That would be bad...
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan