stockfish poor endgame knowledge

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Uri Blass
Posts: 10282
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

stockfish poor endgame knowledge

Post by Uri Blass »

1)It has no stalemate detection in the evaluation function so it can evaluate drawn position as big advantage.
It means that no evaluation mean sure win and even a rule that always prune moves that are weaker by more than 80 pawns relative to the best move is not safe because stockfish may prefer to be a pawn down and not to make some stalemate combination that it simply prunes.

Stockfish can see stalemate combinations thanks to search but it means that stockfish is not very aggresive in pruning by evaluation and I believe that it can be more aggresive with better evaluation.

New game - Stockfish 1.9 JA
[D]k6r/P1Q5/PP6/8/8/6K1/8/8 b - - 0 1

Analysis by Stockfish 1.9 JA:

1...Rh8-g8+
+- (11.59) Depth: 1 00:00:00
1...Rh8-g8+ 2.Kg3-f4 Rg8-f8+ 3.Kf4-e5
+- (12.32) Depth: 2 00:00:00
1...Rh8-g8+ 2.Kg3-f4 Rg8-f8+ 3.Kf4-e5 Rf8-e8+ 4.Ke5-d5
+- (12.72) Depth: 3 00:00:00
1...Rh8-g8+ 2.Kg3-f4 Rg8-g4+ 3.Kf4-f5 Rg4-g5+ 4.Kf5-f6 Rg5-g6+ 5.Kf6xg6
+- (93.00) Depth: 4 00:00:00
1...Rh8-g8+ 2.Kg3-f4 Rg8-g4+ 3.Kf4-f5 Rg4-g5+ 4.Kf5-e4 Rg5-g4+ 5.Ke4-e3 Rg4-g3+ 6.Qc7xg3
+- (93.10) Depth: 5 00:00:00
1...Rh8-h3+ 2.Kg3-f2 Rh3-f3+ 3.Kf2-g1 Rf3-f1+ 4.Kg1-g2 Rf1-g1+ 5.Kg2xg1
+- (92.95) Depth: 5 00:00:00

My opinion is that it is better to include stalemate detection inside the evaluation even if it has no direct advantage and it makes the program 1% slower in nodes per second because later it may help to improve the program by more aggresive pruning by evaluation and in most cases it is easy to see that the position is not stalemate(it is enough to have 2 non blocked pawns when it is clear that one of them can move).

2)endgame knowledge can be improved and stockfish evaluates KNN vs KP as a big advantage for the knights when in part of the cases it is a draw.
KNB vs KN is also evaluated as a draw inspite of the fact that in significant part of the cases it is a win for the better side.

stockfish needs depth 47 to see that black is losing and is clearly slower than other engines.

New game - Stockfish 1.9 JA
[D]7k/8/8/8/8/8/n2B2K1/7N b - - 0 1

for some reason stockfish1.8 is faster and needs depth 46.
The 0.00 wrong evaluation is the same so it seems that the stockfish team increased pruning in some cases and not only reduced pruning in the search.

3)I believe that Stockfish also does not have a function to tell it when not to use null move pruning in the endgame and when every move is losing material except king moves it may be better to avoid null move pruning(it may be possible to have zugzwang detection function and if it is too expensive it may be possible not to use it in every node but only in nodes when the depth is big enough).
Mangar
Posts: 65
Joined: Thu Jul 08, 2010 9:16 am

Re: stockfish poor endgame knowledge

Post by Mangar »

Hi Uri,

we have a static mate in one detection in spike. It is not part of evaluation and called at some other places. I never found a possibility to get any advantage from a static (without search) stale mate detection. It is too seldom and thus a waste of processor time.

Greetings Volker
Mangar Spike Chess
Uri Blass
Posts: 10282
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: stockfish poor endgame knowledge

Post by Uri Blass »

My guess is that the price of stalemate detection should not be high(maybe the program is going to be 1% slower in nodes per second because in most positions it is very cheap to detect no stalemate because at least one pawn can move) and I can see possible future advantage by pruning.

About endgame knowledge I find that stockfish1.9 does not detect unstoppable pawn even in cases when it is obvious that they are winning because they are more advanced than all opponent pawns)

Here is an example(for comparison stockfish1.8 had wrong knowledge about it and it could evaluate pawns that are not unstoppable as unstoppable so maybe the change had no effect on the playing strength but I believe that it is possible to increase playing strength by productive knowledge about unstoppable passed pawns))

New game - Stockfish 1.9 JA
[D]k7/3ppp2/8/8/7P/8/8/7K w - - 0 1

Analysis by Stockfish 1.9 JA:

1.h4-h5
µ (-0.76) Depth: 1 00:00:00
1.h4-h5 Ka8-b7
-+ (-1.45) Depth: 2 00:00:00
1.h4-h5 Ka8-b7 2.Kh1-g2
µ (-0.84) Depth: 3 00:00:00
1.h4-h5 d7-d5 2.h5-h6 d5-d4 3.h6-h7 Ka8-b7 4.h7-h8Q d4-d3 5.Qh8-b2+ Kb7-c6
+- (6.62) Depth: 4 00:00:00
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: stockfish poor endgame knowledge

Post by Ferdy »

Uri Blass wrote:1)It has no stalemate detection in the evaluation function so it can evaluate drawn position as big advantage.
It means that no evaluation mean sure win and even a rule that always prune moves that are weaker by more than 80 pawns relative to the best move is not safe because stockfish may prefer to be a pawn down and not to make some stalemate combination that it simply prunes.

Stockfish can see stalemate combinations thanks to search but it means that stockfish is not very aggresive in pruning by evaluation and I believe that it can be more aggresive with better evaluation.
Pruning by evaluation is interesting but very difficult to balance. The evaluation to be used in qsearch may not be used to prune the search unless you put some big margins there, and some special functions to detect forks, pins and others. I am planning of using two different evaluation one for qsearch and another for pruning the search.
New game - Stockfish 1.9 JA
[D]k6r/P1Q5/PP6/8/8/6K1/8/8 b - - 0 1

Analysis by Stockfish 1.9 JA:

1...Rh8-g8+
+- (11.59) Depth: 1 00:00:00
1...Rh8-g8+ 2.Kg3-f4 Rg8-f8+ 3.Kf4-e5
+- (12.32) Depth: 2 00:00:00
1...Rh8-g8+ 2.Kg3-f4 Rg8-f8+ 3.Kf4-e5 Rf8-e8+ 4.Ke5-d5
+- (12.72) Depth: 3 00:00:00
1...Rh8-g8+ 2.Kg3-f4 Rg8-g4+ 3.Kf4-f5 Rg4-g5+ 4.Kf5-f6 Rg5-g6+ 5.Kf6xg6
+- (93.00) Depth: 4 00:00:00
1...Rh8-g8+ 2.Kg3-f4 Rg8-g4+ 3.Kf4-f5 Rg4-g5+ 4.Kf5-e4 Rg5-g4+ 5.Ke4-e3 Rg4-g3+ 6.Qc7xg3
+- (93.10) Depth: 5 00:00:00
1...Rh8-h3+ 2.Kg3-f2 Rh3-f3+ 3.Kf2-g1 Rf3-f1+ 4.Kg1-g2 Rf1-g1+ 5.Kg2xg1
+- (92.95) Depth: 5 00:00:00

My opinion is that it is better to include stalemate detection inside the evaluation even if it has no direct advantage and it makes the program 1% slower in nodes per second because later it may help to improve the program by more aggresive pruning by evaluation and in most cases it is easy to see that the position is not stalemate(it is enough to have 2 non blocked pawns when it is clear that one of them can move).
2)endgame knowledge can be improved and stockfish evaluates KNN vs KP as a big advantage for the knights when in part of the cases it is a draw.
This is rare. It is also better to have no pawn at all if the pawn is blocked on ranks away from promotion square. Reducing the score would probably solve this problem.

KNB vs KN is also evaluated as a draw inspite of the fact that in significant part of the cases it is a win for the better side.
KB would resist more to KBN.

stockfish needs depth 47 to see that black is losing and is clearly slower than other engines.

New game - Stockfish 1.9 JA
[D]7k/8/8/8/8/8/n2B2K1/7N b - - 0 1

for some reason stockfish1.8 is faster and needs depth 46.
The 0.00 wrong evaluation is the same so it seems that the stockfish team increased pruning in some cases and not only reduced pruning in the search.

3)I believe that Stockfish also does not have a function to tell it when not to use null move pruning in the endgame and when every move is losing material except king moves it may be better to avoid null move pruning(it may be possible to have zugzwang detection function and if it is too expensive it may be possible not to use it in every node but only in nodes when the depth is big enough).
Stockfish has a lot of potential for improvement :D
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: stockfish poor endgame knowledge

Post by bob »

Uri Blass wrote:My guess is that the price of stalemate detection should not be high(maybe the program is going to be 1% slower in nodes per second because in most positions it is very cheap to detect no stalemate because at least one pawn can move) and I can see possible future advantage by pruning.
Not good enough. Pawn moves, I take it, we are back to your move. Stalemate is non-trivial except for the cases where one side has nothing but a king. I do have some endgame code that uses that idea, but that's the only place other than actually in the search (not in the eval).

About endgame knowledge I find that stockfish1.9 does not detect unstoppable pawn even in cases when it is obvious that they are winning because they are more advanced than all opponent pawns)

Here is an example(for comparison stockfish1.8 had wrong knowledge about it and it could evaluate pawns that are not unstoppable as unstoppable so maybe the change had no effect on the playing strength but I believe that it is possible to increase playing strength by productive knowledge about unstoppable passed pawns))

New game - Stockfish 1.9 JA
[D]k7/3ppp2/8/8/7P/8/8/7K w - - 0 1

Analysis by Stockfish 1.9 JA:

1.h4-h5
µ (-0.76) Depth: 1 00:00:00
1.h4-h5 Ka8-b7
-+ (-1.45) Depth: 2 00:00:00
1.h4-h5 Ka8-b7 2.Kh1-g2
µ (-0.84) Depth: 3 00:00:00
1.h4-h5 d7-d5 2.h5-h6 d5-d4 3.h6-h7 Ka8-b7 4.h7-h8Q d4-d3 5.Qh8-b2+ Kb7-c6
+- (6.62) Depth: 4 00:00:00
Can't explain their scores there. White is two pawns down. Crafty gives a score of +3.0 (for white) due to the white pawn promoting first (the pawn outside the square of the king is +5, but black has two extra pawns so +3).
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: stockfish poor endgame knowledge

Post by mcostalba »

bob wrote: Can't explain their scores there.
Does help you to explain knowing that there is no unstoppable pawn detection in 1.9 ?
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: stockfish poor endgame knowledge

Post by bob »

mcostalba wrote:
bob wrote: Can't explain their scores there.
Does help you to explain knowing that there is no unstoppable pawn detection in 1.9 ?
Yes, although I am not sure why you would remove that. I tested with and without when I did the 23.0 rewrite and with (for Crafty) is definitely better...
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: stockfish poor endgame knowledge

Post by mcostalba »

bob wrote:
mcostalba wrote:
bob wrote: Can't explain their scores there.
Does help you to explain knowing that there is no unstoppable pawn detection in 1.9 ?
Yes, although I am not sure why you would remove that. I tested with and without when I did the 23.0 rewrite and with (for Crafty) is definitely better...
What do you mean with "definitely better", we weren't able to see any difference, say within 3-4 ELO point that is our current resolution.
Uri Blass
Posts: 10282
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: stockfish poor endgame knowledge

Post by Uri Blass »

mcostalba wrote:
bob wrote:
mcostalba wrote:
bob wrote: Can't explain their scores there.
Does help you to explain knowing that there is no unstoppable pawn detection in 1.9 ?
Yes, although I am not sure why you would remove that. I tested with and without when I did the 23.0 rewrite and with (for Crafty) is definitely better...
What do you mean with "definitely better", we weren't able to see any difference, say within 3-4 ELO point that is our current resolution.
If it is difference to your previous code then stockfish1.8 detected unstoppable passed pawns when the opponent has pawns that promote earlier.

I think that it is better to be sure that the opponent cannot promote earlier or at the same time before you give a big bonus.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: stockfish poor endgame knowledge

Post by mcostalba »

Uri Blass wrote: I think that it is better to be sure that the opponent cannot promote earlier or at the same time before you give a big bonus.
I think that if a feature proves good after deep test it can stay in, otherwise is removed.

It is a very simple rule, but I understand it is difficult to accept, especially by chess players that know what feature is important by an "human" point of view and intuitively extrapolate that the feature is important also for chess engines. Sometime is true sometime is not, but only testing can prove it.

So the bottom line is that what you or me do "think" that is good is totally irrilevant, only real games test properly weights the feature.