strange evaluation behavior with Stockfish:

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

FBX
Posts: 4
Joined: Wed Nov 27, 2013 4:27 pm

strange evaluation behavior with Stockfish:

Post by FBX »

Turn endgame databases off and watch Stockfish's eval skyrocket:

8/8/7k/6R1/5P1p/5K1b/8/8 b - -

Now of course having endgame TBs would make this a moot point, but I happened upon this behavior well earlier in a game I was analyzing. Stockfish kept claiming white is winning, while Komodo felt it was going to be a draw. So I played move after move with Stockfish's eval climbing higher and higher, until I ultimately reached this 6-piece position after several trades. I then tested several different versions of Stockfish, including official releases 7 and 8, and they all exhibited this same behavior.

Edit: Seems to be some sort of memory issue from switching between engines. If I load the program and start with Stockfish, it doesn't lose its mind. If I switch to another engine and then switch back, it goes nuts again (this is in Aquarium btw)
ICCF IM and United States cc champion.
User avatar
yurikvelo
Posts: 710
Joined: Sat Dec 06, 2014 1:53 pm

Re: strange evaluation behavior with Stockfish:

Post by yurikvelo »

Stockfish kept claiming white is winning
eval was +6.5 for 3 consecutive moves?
User avatar
yurikvelo
Posts: 710
Joined: Sat Dec 06, 2014 1:53 pm

Re: strange evaluation behavior with Stockfish:

Post by yurikvelo »

Code: Select all

Komodo-9.3 @ D=39
+1,45	1. ... Be6 2.Kg2 h3+ 3.Kh2 Bc8 4.Re5 Kg6 5.Ra5 Kf6

Komodo-10.1 @ D=39
+1,46	1. ... Be6 2.Re5 Bd7 3.Re7 Bf5 4.Rc7 Kg6 5.Rb7 h3

Stockfish_16111011 @ D=30...53
+1,24	1. ... Be6 2.Kg2 Bd7 3.Kh2 Be6 4.Re5 Bd7 5.Rd5 Bg4

Gull 3 @ D=32
+1.58 1...Bd7 2.Kg2 Be6 3.Kh1 Bc8 4.Rg8 Bf5 5.Kh2 h3

Houdini 4 @ D=35
+1.14 1...Be6 2.Re5 Bd7 3.Re7 Bf5 4.Re8 Kg6 5.Rg8+ Kf7 

Deep Fritz 12 @ D=32
+2.40 1...Be6 2.Ke4 h3 3.Kf3 Bd7 4.Kg3 Be6 5.Kf2 Bd7 
BeyondCritics
Posts: 396
Joined: Sat May 05, 2012 2:48 pm
Full name: Oliver Roese

Re: strange evaluation behavior with Stockfish:

Post by BeyondCritics »

[d]8/8/7k/6R1/5P1p/5K1b/8/8 b - -

With Lomonosov table bases (http://tb7.chessok.com/probe) this is a draw after 1...Be6, 1...Bd7, but not 1...Bc8 (Rg8-h8+xh4).
Frankly, if have never seen that type of fortress before.
BeyondCritics
Posts: 396
Joined: Sat May 05, 2012 2:48 pm
Full name: Oliver Roese

Re: strange evaluation behavior with Stockfish:

Post by BeyondCritics »

yurikvelo wrote:
Stockfish kept claiming white is winning
eval was +6.5 for 3 consecutive moves?
Every issue is worth reporting, even if it cannot be reproduced easily. It might be a bug...
Joerg Oster
Posts: 937
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany

Re: strange evaluation behavior with Stockfish:

Post by Joerg Oster »

FBX wrote:Turn endgame databases off and watch Stockfish's eval skyrocket:

8/8/7k/6R1/5P1p/5K1b/8/8 b - -

Now of course having endgame TBs would make this a moot point, but I happened upon this behavior well earlier in a game I was analyzing. Stockfish kept claiming white is winning, while Komodo felt it was going to be a draw. So I played move after move with Stockfish's eval climbing higher and higher, until I ultimately reached this 6-piece position after several trades. I then tested several different versions of Stockfish, including official releases 7 and 8, and they all exhibited this same behavior.

Edit: Seems to be some sort of memory issue from switching between engines. If I load the program and start with Stockfish, it doesn't lose its mind. If I switch to another engine and then switch back, it goes nuts again (this is in Aquarium btw)
No, no memory issue.
Simply too much pruning where it should be avoided.
This is closely related to this issue: https://github.com/official-stockfish/S ... issues/760

Current Stockfish with 1 thread:

Code: Select all

info depth 51 seldepth 83 multipv 1 score cp -4067 upperbound nodes 307716443 nps 2831711 hashfull 330 tbhits 0 time 108668 pv h3d7 g5g8
info depth 51 currmove h3d7 currmovenumber 1
info depth 51 seldepth 83 multipv 1 score cp -4067 nodes 339672891 nps 2830560 hashfull 352 tbhits 0 time 120002 pv h3d7 g5g8
bestmove h3d7 ponder g5g8
Modified Stockfish with 1 Thread:

Code: Select all

info depth 42 seldepth 55 multipv 1 score cp -138 nodes 432646939 nps 3605361 hashfull 994 tbhits 0 time 120001 pv h3d7 f3e4 h4h3 e4f3 d7e8 f3g3 e8d7 g5d5 d7c8 d5d8 c8f5 d8g8 f5d7 g8g5 d7e6 g3h2 e6d7 g5d5 d7c8 h2g3 h6g6 d5d6 g6h5 d6b6 c8d7 b6b7 d7c8 b7g7 h5h6 g7c7 c8f5 c7a7 h6h5 a7a1 h5g6 a1a6 g6g7 a6b6 g7f7 b6c6 f5e6 c6d6 f7e7 d6b6 e7f7 b6a6 f7f6 g3h2 f6f7 a6c6
bestmove h3d7 ponder f3e4
The same result can be achieved with more Threads because LazySMP naturally widens the search.
But this is only hiding the underlying problem.

Stockfish with 4 Threads:

Code: Select all

info depth 55 seldepth 59 multipv 1 score cp -132 nodes 822447597 nps 13707003 hashfull 259 tbhits 0 time 60002 pv h3d7 g5g8 h6h7 g8g1 h4h3 g1g5 h7h6 f3g3 d7e6 g5c5 h6g6 c5c6 g6f6 c6a6 f6f7 a6b6 f7f6 b6d6 f6e7 d6d1 e7f6 d1e1 e6f5 e1e5 f5e6 g3h2 e6d7 e5h5 f6g7 h2g3 g7g6 h5g5 g6h6 g5c5 h6g6 c5d5 d7c8 g3h2 g6f6 d5d6 f6f7 d6h6 f7g7 h6b6 c8f5 b6d6 f5g4 d6a6 g4f5 h2g3 g7f7 a6h6 f7g7 h6b6 g7f7 b6b5 f7f6 b5c5
bestmove h3d7 ponder g5g8
Unfortunately, these kind of fixes have zero chance to pass in the framework.
http://tests.stockfishchess.org/tests/v ... 038170f968 :(
Jörg Oster
FBX
Posts: 4
Joined: Wed Nov 27, 2013 4:27 pm

Re: strange evaluation behavior with Stockfish:

Post by FBX »

BeyondCritics wrote:[d]8/8/7k/6R1/5P1p/5K1b/8/8 b - -

With Lomonosov table bases (http://tb7.chessok.com/probe) this is a draw after 1...Be6, 1...Bd7, but not 1...Bc8 (Rg8-h8+xh4).
Frankly, if have never seen that type of fortress before.
It's about to happen in one of my CC games on ICCF. That's how I discovered this strange behavior during analysis many moves before this point. I've since been able to repeat the bug even when reloading the program. Seems to be entirely random whether Stockfish goes nuts or realizes the exchange sac doesn't work. Likely a pruning issue like has already been mentioned.
yurikvelo wrote:
Stockfish kept claiming white is winning
eval was +6.5 for 3 consecutive moves?
Worse than that. About 20 moves before this position, the eval rose from +1 to +2 to +3, on and on until it was over +13 by the time I slid forward in the line to near this point in the endgame.

Anyway, thank God I got a 2nd opinion from Komodo, as it doesn't have this random behavior when evaluating this ending. I was thinking the game is lost, but now I know my fortress idea is actually going to work.
ICCF IM and United States cc champion.