Reliablity of Stockfish Dev

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Werewolf
Posts: 2089
Joined: Thu Sep 18, 2008 10:24 pm
Full name: Carl Bicknell

Reliablity of Stockfish Dev

Post by Werewolf »

Can someone enlighten me as to how stable SF Dev is c/p to official versions? My intention is to run it in IDeA projects for at least a month 24/7, on many cores.

What are the differences between the official releases and the Dev versions? I've heard the official releases get more testing, but SF Dev usually has 10,000 games or more played, so it must be quite stable just to achieve that.
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Reliablity of Stockfish Dev

Post by zullil »

Werewolf wrote:Can someone enlighten me as to how stable SF Dev is c/p to official versions?
Extremely stable, in general.
APassionForCriminalJustic
Posts: 417
Joined: Sat May 24, 2014 9:16 am

Re: Reliablity of Stockfish Dev

Post by APassionForCriminalJustic »

Werewolf wrote:Can someone enlighten me as to how stable SF Dev is c/p to official versions? My intention is to run it in IDeA projects for at least a month 24/7, on many cores.

What are the differences between the official releases and the Dev versions? I've heard the official releases get more testing, but SF Dev usually has 10,000 games or more played, so it must be quite stable just to achieve that.
Stable and simply the strongest SF on the planet.
User avatar
Eelco de Groot
Posts: 4724
Joined: Sun Mar 12, 2006 2:40 am
Full name:   Eelco de Groot

Re: Reliablity of Stockfish Dev

Post by Eelco de Groot »

Maybe a good illustration is that for for instance the TCEC tournament, the Stockfish developers always sent in the very latest development version. The only thing they might do is make a very fast compile of it, especially if Martin gets new hardware, they might want to make a compile for that on a similar system for instance to get some extra speed. But the code is just the latest development code.

The official release got some extra tests but I think that is mainly to make sure something isn't broken and you find out about that a few days after the official release and it could have been tested very simply, but it is something that is only used on an obscure system or not in every day use and had no impact on strength testing etc. It is just a nuisance if you make an official version and three days later you have to patch it when all the testing has already begun.

One other thing maybe is that I always thought IDeA was mainly done with several single thread engines at different positions, and in that case you would not profit from Joona's recent SMP improvements. In that case, Stefan Pohl has not yet detected much progress relative to the fast compile of official SF 6 (with fast timecontrols, but pretty reliable). But if IDeA is done with multithreaded engines as you write then I would certainly use the Dev version even if the compile is a bit slower than the SF6.
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
Isaac
Posts: 265
Joined: Sat Feb 22, 2014 8:37 pm

Re: Reliablity of Stockfish Dev

Post by Isaac »

As a rule of thumb it should be perfectly reliable and stable.
In the past (in 2012) there had been unstable dev versions that couldn't complete a single game on my machine (SF would crash) but they fixed that problem by reverting a patch a few days later.
Another "unreliable" period and also more recent, is when the transition to c++11 occurred I think. For some machines there was a big drop in elo (I think the ones running on Windows).
So if you plan on to use SF for over a month you can either go blindly with the dev version and you'd have an over 95% chance to have a reliable version. In case you want more security, just ask here whether the current SF dev is stable/reliable.
Werewolf
Posts: 2089
Joined: Thu Sep 18, 2008 10:24 pm
Full name: Carl Bicknell

Re: Reliablity of Stockfish Dev

Post by Werewolf »

Many thanks for all your replies.

IDeA is indeed single threaded (usually) but looking at lots of positions simultaneously.

Could someone estimate the elo gain of the latest Dev over SF6 on a single core, at LTC?
20 elo?

(I'll be switching from Komodo 8 so even 20 elo is more than it sounds)
User avatar
Eelco de Groot
Posts: 4724
Joined: Sun Mar 12, 2006 2:40 am
Full name:   Eelco de Groot

Re: Reliablity of Stockfish Dev

Post by Eelco de Groot »

Hello Carl! Well, maybe you would find that kind of increase in a new regression test, I don't know if anything is scheduled right now. The latest regression test numbers at 60s + 0.05s were discussed for instance here and that was with the March 29th dev version, you can see on the commit page https://github.com/official-stockfish/S ... its/master that there have been several functional patches since then.

But Stefan Pohl did not see anything like this sofar, against other engines. I'm curious what he will see with the latest version. I like the code at the moment. My own modified Stockfish just ran a 'stress test' with no crashes for 20 hours, so I think that is a stable version also. But in mine I'm seeing a bit much Fail Highs followed by Fail Lows etc at great depth. That usually means that something is not quite tuned right yet, the search windows are not keeping the search inside of it. I'm trying to think what can be done about that, apart from just throwing out the Fail Highs and Lows that Dann is doing :)
It looks like this at high depths:

[d]r3k2r/1q2bp2/2b1p2p/pp2p1pP/4N1P1/PP1Q1P2/2P5/1KBR3R w q -

Engine: Sf20150412_002 MOD MP (32 bit Q6700, 4 threads, 512 MB)
by Tord Romstad, Marco Costalba and Joona
.
.
.

36/62 52:37 -0.63 26.Bb2 Bxe4 27.Qxe4 Qxe4 28.fxe4 f6
29.Rd3 Rc8 30.Rf1 Rf8 31.Bc3 b4
32.axb4 axb4 33.Bb2 Rc5 34.Kc1 Rc6
35.c4 bxc3 36.Rxc3 Kd7 37.Rxc6 Kxc6
38.Kc2 Ra8 39.Rc1 (9.042.411.366) 2863

37/63 62:27 -0.69-- 26.Bb2 Bxe4 (10.831.066.123) 2890

37/63 63:53 -0.63++ 26.Bb2 (11.083.001.012) 2890

37/63 64:43 -0.69-- 26.Bb2 Bxe4 (11.225.236.455) 2890

37/63 66:46 -0.61++ 26.Bb2 (11.600.445.271) 2895

37/64 72:11 -0.72-- 26.Bb2 Bxe4 (12.504.101.410) 2886

37/65 84:44 -0.56++ 26.Bb2 (14.932.768.153) 2936

37/65 125:03 -0.77 26.Rhe1 Qc7 27.Bb2 Rd8 28.Qf1 Rxd1+
29.Rxd1 f6 30.Qd3 Qd7 31.Qe2 Qb7
32.a4 Bxe4 33.Qxe4 Qxe4 34.fxe4 bxa4
35.bxa4 Rf8 36.Kc1 Bc5 37.Bc3 Ke7
38.Kd2 f5 39.Bxe5 (21.898.481.930) 2918

38/65 133:33 -0.71++ 26.Rhe1 (23.157.647.903) 2889

38/65 146:04 -0.65++ 26.Bb2 (25.131.894.289) 2867

38/65 148:13 -0.71-- 26.Bb2 Bxe4 (25.473.172.401) 2864

38/68 153:21 -0.63++ 26.Bb2 (26.399.308.782) 2868

38/68 157:45 -0.65 26.Bb2 Bxe4 27.Qxe4 Qxe4 28.fxe4 f6
29.Rhf1 Rd8 30.Rxd8+ Kxd8 31.Bc3 b4
32.axb4 axb4 33.Bd2 Kd7 34.Kc1 Ra8
35.Kd1 Ra1+ 36.Ke2 Ra7 37.Be3 Ra2
38.Kd3 Ke8 39.Kd2 (27.100.749.494) 2863

39/68 191:58 -0.71-- 26.Bb2 Bxe4 (31.892.119.192) 2768

39/68 198:51 -0.65++ 26.Bb2 (33.085.746.914) 2773

39/68 209:04 -0.71-- 26.Bb2 Bxe4 (34.879.923.372) 2780

39/68 213:11 -0.63++ 26.Bb2 (35.589.812.943) 2782

39/69 226:32 -0.74-- 26.Bb2 Bxe4 (37.892.646.150) 2787

39/69 240:46 -0.58++ 26.Bb2 (40.542.791.531) 2806

39/69 275:43 -0.73 26.Rhe1 Qc7 27.Bb2 Rd8 28.Qe2 Rxd1+
29.Rxd1 f6 30.c4 b4 31.axb4 axb4
32.Qd3 Rg8 33.Kc2 Kf7 34.Nf2 Qb7
35.Ne4 Qc8 36.Nd2 Rg7 37.Ne4 Qc7
38.Qe3 Kg8 39.Ra1 (46.936.452.775) 2837

40/69 289:10 -0.67++ 26.Rhe1 (49.309.741.383) 2842

40/69 304:55 -0.73-- 26.Rhe1 Qc7 (52.131.732.383) 2849

40/69 310:56 -0.67++ 26.Rhe1 (53.202.878.435) 2851

40/69 330:57 -0.75-- 26.Rhe1 Qc7 (56.778.211.986) 2859

40/73 344:46 -0.64++ 26.Rhe1 (59.304.478.254) 2866

40/73 390:34 -0.68 26.Bb2 Bxe4 27.Qxe4 Qxe4 28.fxe4 f6
29.Rhf1 Rc8 30.Rd3 a4 31.Bc3 Rf8
32.Kb2 axb3 33.cxb3 Rc7 34.Rf2 Ra7
35.b4 Rd7 36.Rxd7 Kxd7 37.Kb3 Ra8
38.Be1 Rc8 39.Bc3 (67.520.794.800) 2881

41/73 442:59 -0.75-- 26.Bb2 Bxe4 (76.959.897.750) 2895

41/73 454:48 -0.68++ 26.Bb2 (79.172.140.561) 2901

41/73 471:07 -0.75-- 26.Bb2 Bxe4 (82.325.360.801) 2912

41/73 479:30 -0.67++ 26.Bb2 (83.992.815.856) 2919

41/73 562:47 -0.68 26.Rhe1 Qc7 27.Bb2 Rd8 28.Qe2 f6
29.c4 b4 30.axb4 Rxd1+ 31.Rxd1 axb4
32.Rd2 Kf7 33.Rd1 Kg7 34.Qc2 Rd8
35.Rxd8 Qxd8 36.Qe2 Kf8 37.Kc2 Qd7
38.Qe3 Kg7 39.Qe2 (99.680.708.891) 2952

42/73 626:01 -0.64 26.Rhe1 Qc7 27.Bb2 Rd8 28.Qe2 f6
29.c4 b4 30.axb4 axb4 31.Rd3 Rxd3
32.Qxd3 Qd8 33.Qe2 Kf7 34.Rf1 Kg8
35.Qe3 Rh7 36.Kc1 Qc7 37.Rd1 Rf7
38.Ng3 Kh8 39.Ne4 (111.324.032.370) 2963

43/73 720:32 -0.70-- 26.Rhe1 Qc7 (128.469.958.448) 2971

43/73 750:37 -0.64++ 26.Rhe1 (134.049.438.810) 2976

43/73 770:36 -0.70-- 26.Rhe1 Qc7 (137.627.500.339) 2976

43/73 797:43 -0.62++ 26.Rhe1 (142.565.201.556) 2978

43/73 928:12 -0.73 26.Rhe1 Qc7 27.c4 Rd8 28.Qe2 Rxd1
29.Rxd1 b4 30.axb4 axb4 31.Qd3 Rg8
32.Kb2 Kf8 33.Kc2 Kg7 34.Bb2 Ra8
35.Qe3 Kg8 36.Ra1 Rxa1 37.Bxa1 Qa5
38.Bb2 f6 39.Qd3 (164.876.372.552) 2960

44/76 1098:19-0.67++ 26.Rhe1 (197.471.955.869) 2996

44/76 1192:06-0.73-- 26.Rhe1 Qc7 (214.448.589.316) 2998

44/76 1265:51-0.67++ 26.Rhe1 (228.062.365.025) 3002

best move: Rh1-e1 time: 1272:24.253 min n/s: 3.002.753 nodes: 229.208.215.176

It's not too bad but I still think the engine is losing too much time in the researches. It's not all lost because with every re-search the move ordering is still a bit better, and the search windows are still increased a little in width as a whole otherwise sometimes you would see cases were an iteration is never resolved... That would be bad...
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan