Position crafty and stockfish both badly mis-evaluate

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

jwes
Posts: 778
Joined: Sat Jul 01, 2006 7:11 am

Position crafty and stockfish both badly mis-evaluate

Post by jwes »

I tried this position with Crafty-23.2 and Stockfish 1.8 and they both give scores > 8 after analyzing for more than 1 minute, while the position is a tablebase draw. These seem like remarkably optimistic evaluations with no win in sight.
[D]2k5/8/Pp1K4/8/7B/8/P7/8 w - - 0 1 bm a4; id "Fine 149 draw";
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Position crafty and stockfish both badly mis-evaluate

Post by bob »

jwes wrote:I tried this position with Crafty-23.2 and Stockfish 1.8 and they both give scores > 8 after analyzing for more than 1 minute, while the position is a tablebase draw. These seem like remarkably optimistic evaluations with no win in sight.
[D]2k5/8/Pp1K4/8/7B/8/P7/8 w - - 0 1 bm a4; id "Fine 149 draw";
Has to be a bug in Crafty. It should say white can't win here since the rook pawn(s) + wrong bishop can't promote with the king in front of the pawn. I'll look as it should get this correct.
BubbaTough
Posts: 1154
Joined: Fri Jun 23, 2006 5:18 am

Re: Position crafty and stockfish both badly mis-evaluate

Post by BubbaTough »

This position is tricky because the b pawn in some positions can be forced to advance letting white convert his 2nd a pawn into a b pawn, which will foil most evals if done near the leafs of the tree. If you made the b pawn into a pawn on any other file, I expect more programs would understand this position better.

When trying to fix things in these positions, care must be taken. This particular positions is drawn, but there are similar positions which are not.

-Sam
jwes
Posts: 778
Joined: Sat Jul 01, 2006 7:11 am

Re: Position crafty and stockfish both badly mis-evaluate

Post by jwes »

BubbaTough wrote:This position is tricky because the b pawn in some positions can be forced to advance letting white convert his 2nd a pawn into a b pawn, which will foil most evals if done near the leafs of the tree. If you made the b pawn into a pawn on any other file, I expect more programs would understand this position better.

When trying to fix things in these positions, care must be taken. This particular positions is drawn, but there are similar positions which are not.

-Sam
I was thinking that too, but I can't construct a position with B and 2 a pawns vs b pawn where that works.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Position crafty and stockfish both badly mis-evaluate

Post by michiguel »

BubbaTough wrote:This position is tricky because the b pawn in some positions can be forced to advance letting white convert his 2nd a pawn into a b pawn, which will foil most evals if done near the leafs of the tree. If you made the b pawn into a pawn on any other file, I expect more programs would understand this position better.

When trying to fix things in these positions, care must be taken. This particular positions is drawn, but there are similar positions which are not.

-Sam
Gaviota was supposed to know all this but this position is very tricky for other reasons.
First of all, white can force black to take the bishop, and if the program does not recognize KPPKP with two rook pawns as draw, it will keep giving a high positive score.
Second, if there is no detection of stalemate in quies, the search can wisely direct the PV to make sure that always the last quies move is taking the b pawn with stalemate. The evaluation will be with a winning score (since the pawn now is in the "b" column).
Third, if the futility margin is not big enough in quies(), it will make the whole thing worst, particularly with an evil interaction with the hashtables. I needed to correct all this three things, and now it works:

Gaviota 0.76.6-modified.
No tablebases

Code: Select all

setboard 2k5/8/Pp1K4/8/7B/8/P7/8 w - - 0 1 bm a4; id "Fine 149 draw";
d
+-----------------+
| . . k . . . . . |
| . . . . . . . . |
| P p . K . . . . |
| . . . . . . . . |    Castling: 
| . . . . . . . B |    ep: -
| . . . . . . . . |
| P . . . . . . . |
| . . . . . . . . | [White]
+-----------------+

tbuse off
analyze
********* Starts iterative deepening, thread = 0
set timer to infinite
        25   1:      0.0    +1.46  1.Kc6
       124   2       0.0      :-(  
       215   2:      0.0    +0.19  1.Kc6 Kb8
       598   3:      0.0    +0.18  1.Kc6 Kb8 2.Bg3+ Ka7
      2096   4:      0.0    +0.19  1.Kc6 Kb8 2.Kb5 Ka7
      6849   5:      0.0    +0.19  1.Kc6 Kb8 2.Bg3+ Ka8 3.Kb5 Ka7
      9286   6       0.0    +0.19  1.Kc6 Kb8 2.Kb5 Ka8 3.Bf2 Ka7
     22413   6:      0.1    +0.19  1.Kc6 Kb8 2.Kb5 Ka8 3.Bf2 Ka7
     28938   7       0.1    +0.19  1.Kc6 Kb8 2.Bg3+ Ka7 3.Kb5 Ka8 4.a4 Ka7
     50131   7       0.2    +0.19  1.Bg3 Kb8 2.Kd7+ Ka7 3.Kc8 Ka8 4.Bb8 b5
     62389   7:      0.2    +0.19  1.Bg3 Kb8 2.Kd7+ Ka7 3.Kc8 Ka8 4.Bb8 b5
     80678   8       0.3    +0.20  1.Bg3 Kb8 2.Kd7+ Ka7 3.Kc8 Ka8 4.Bb8 b5
                                   5.a7 b4
    145606   8:      0.4    +0.20  1.Bg3 Kb8 2.Kd7+ Ka7 3.Kc8 Ka8 4.Bb8 b5
                                   5.a7 b4
    175886   9       0.5    +0.20  1.Bg3 Kb8 2.Kd7+ Ka7 3.Kc8 Ka8 4.a4 Ka7
                                   5.Bb8+ Ka8 6.a7 b5
    305102   9:      0.7    +0.20  1.Bg3 Kb8 2.Kd7+ Ka7 3.Kc8 Ka8 4.a4 Ka7
                                   5.Bb8+ Ka8 6.a7 b5
    381384  10       0.8    +0.20  1.Bg3 Kb8 2.Kd7+ Ka7 3.Kc6 Ka8 4.Bf2
                                   Ka7 5.a4 Kb8 6.a7+ Ka8
    638300  10:      1.2    +0.20  1.Bg3 Kb8 2.Kd7+ Ka7 3.Kc6 Ka8 4.Bf2
                                   Ka7 5.a4 Kb8 6.a7+ Ka8
    748613  11       1.4    +0.20  1.Bg3 Kb8 2.Kd7+ Ka7 3.Kc6 Ka8 4.Bf2
                                   Ka7 5.Kb5 Ka8 6.a4 Kb8
   1194167  11:      2.0    +0.20  1.Bg3 Kb8 2.Kd7+ Ka7 3.Kc6 Ka8 4.Bf2
                                   Ka7 5.Kb5 Ka8 6.a4 Kb8
   1505765  12       2.6    +0.20  1.Bg3 Kb8 2.Kd7+ Ka7 3.Kc6 Ka8 4.Kc7
                                   Ka7 5.Kc8 Ka8 6.Bb8 b5 7.a7 b4
   2259774  12:      3.6    +0.20  1.Bg3 Kb8 2.Kd7+ Ka7 3.Kc6 Ka8 4.Kc7
                                   Ka7 5.Kc8 Ka8 6.Bb8 b5 7.a7 b4
   2894063  13       5.0    +0.20  1.Bg3 Kb8 2.Kd7+ Ka7 3.Kc6 Ka8 4.Kc7
                                   Ka7 5.Kc8 Ka8 6.a4 Ka7 7.Bb8+ Ka8 8.a7
                                   b5
   3035733  13       5.1    +0.20  1.Kc6 Kb8 2.a4 Ka7 3.Kb5 Ka8 4.Bf2 Kb8
                                   5.a7+ Kb7 6.Bd4 Ka8 7.Ka6 b5
   4195838  13:      6.8    +0.20  1.Kc6 Kb8 2.a4 Ka7 3.Kb5 Ka8 4.Bf2 Kb8
                                   5.a7+ Kb7 6.Bd4 Ka8 7.Ka6 b5
   5935338  14       9.9    +0.20  1.Kc6 Kb8 2.Bg3+ Ka7 3.Kb5 Ka8 4.a4 Ka7
                                   5.Bf4 Ka8 6.Be3 Kb8 7.Kc6 Ka8 8.a7 b5
   8741050  14:     14.3    +0.20  1.Kc6 Kb8 2.Bg3+ Ka7 3.Kb5 Ka8 4.a4 Ka7
                                   5.Bf4 Ka8 6.Be3 Kb8 7.Kc6 Ka8 8.a7 b5
with tablebases

Code: Select all

analyze
********* Starts iterative deepening, thread = 0
set timer to infinite
        25   1:      0.0    +1.46  1.Kc6
       120   2       0.0      :-(  
       211   2:      0.0    +0.19  1.Kc6 Kb8
       559   3:      0.0    +0.18  1.Kc6 Kb8 2.Bg3+ Ka7
      1363   4:      0.0    +0.19  1.Kc6 Kb8 2.Kb5 Ka7
      4083   5:      0.0    +0.19  1.Kc6 Kb8 2.Bg3+ Ka8 3.Kb5 Ka7
      5038   6       0.0    +0.19  1.Kc6 Kb8 2.Kb5 Ka8 3.Bf2 Ka7
      9017   6:      0.0    +0.19  1.Kc6 Kb8 2.Kb5 Ka8 3.Bf2 Ka7
     11984   7       0.1    +0.19  1.Kc6 Kb8 2.Bg3+ Ka7 3.Kb5 Ka8 4.Bc7
                                   Ka7
     14472   7       0.1    +0.19  1.Bg3 Kb8 2.Kd7+ Ka7 3.Kc8 Ka8 4.Bb8 b5
     19252   7       0.1    +0.20  1.a4 Kb8 2.Bf2 Ka8 3.a7 Kb7 4.Bd4 Ka8
     22091   7:      0.1    +0.20  1.a4 Kb8 2.Bf2 Ka8 3.a7 Kb7 4.Bd4 Ka8
     25991   8       0.2    +0.19  1.a4 Kb8 2.Bf2 Ka8 3.Kc6 Kb8 4.Kb5 Ka7
     29635   8       0.2    +0.19  1.Kc6 Kb8 2.Bf2 Ka8 3.Kb5 Kb8 4.Bd4 Ka7
     41888   8:      0.2    +0.19  1.Kc6 Kb8 2.Bf2 Ka8 3.Kb5 Kb8 4.Bd4 Ka7
     49934   9       0.2    +0.20  1.Kc6 Kb8 2.Bg3+ Ka7 3.Kb5 Ka8 4.Bc7
                                   Ka7 5.a4 Ka8
     57226   9       0.3    +0.20  1.a4 Kb8 2.Bf2 Ka8 3.Kc6 Kb8 4.a7+ Ka8
                                   5.Bd4 b5
     80054   9:      0.3    +0.20  1.a4 Kb8 2.Bf2 Ka8 3.Kc6 Kb8 4.a7+ Ka8
                                   5.Bd4 b5
     96398  10       0.4    +0.20  1.a4 Kb8 2.Kc6 Ka7 3.Kb5 Ka8 4.Bg3 Ka7
                                   5.Bc7 Ka8
    159552  10:      0.5    +0.20  1.a4 Kb8 2.Kc6 Ka7 3.Kb5 Ka8 4.Bg3 Ka7
                                   5.Bc7 Ka8
    181370  11       0.5    +0.20  1.a4 Kb8 2.Bf2 Ka8 3.Kc6 Kb8 4.a7+ Ka8
                                   5.Kb5 Kb7 6.Bd4 Ka8
    259037  11:      0.7    +0.20  1.a4 Kb8 2.Bf2 Ka8 3.Kc6 Kb8 4.a7+ Ka8
                                   5.Kb5 Kb7 6.Bd4 Ka8
    331581  12       0.9    +0.20  1.a4 Kb8 2.Bf2 Ka8 3.Kc6 Kb8 4.Kb5 Ka8
                                   5.Bg1 Kb8 6.Be3 Ka8
    621119  12:      1.5    +0.20  1.a4 Kb8 2.Bf2 Ka8 3.Kc6 Kb8 4.Kb5 Ka8
                                   5.Bg1 Kb8 6.Be3 Ka8
    682369  13       1.6    +0.20  1.a4 Kb8 2.Bf2 Ka8 3.Kc6 Kb8 4.a7+ Ka8
                                   5.Kb5 Kb7 6.Bd4 Ka8 7.Ka6 b5
    864100  13:      2.0    +0.20  1.a4 Kb8 2.Bf2 Ka8 3.Kc6 Kb8 4.a7+ Ka8
                                   5.Kb5 Kb7 6.Bd4 Ka8 7.Ka6 b5
   1262935  14       2.8    +0.20  1.a4 Kb8 2.Bf2 Ka8 3.Kc6 Kb8 4.Kb5 Ka8
                                   5.Bg1 Kb8 6.Be3 Ka8 7.Bd4 Kb8
   2723616  14:      5.8    +0.20  1.a4 Kb8 2.Bf2 Ka8 3.Kc6 Kb8 4.Kb5 Ka8
                                   5.Bg1 Kb8 6.Be3 Ka8 7.Bd4 Kb8
   2820170  15       6.0    +0.20  1.a4 Kb8 2.Bf2 Ka8 3.Kc6 Kb8 4.Kb5 Ka8
                                   5.Bg1 Kb8 6.Be3 Ka8 7.Kc6 Kb8 8.a7+ Ka8
   4174275  15:      9.4    +0.20  1.a4 Kb8 2.Bf2 Ka8 3.Kc6 Kb8 4.Kb5 Ka8
                                   5.Bg1 Kb8 6.Be3 Ka8 7.Kc6 Kb8 8.a7+ Ka8
   4304496  16       9.7    +0.20  1.a4 Kb8 2.Bf2 Ka8 3.Kc6 Kb8 4.Kb5 Ka8
                                   5.Bg1 Kb8 6.Be3 Ka8 7.Bf4 Ka7 8.Bc7 Ka8
   5028243  16      11.0    +0.20  1.a3 Kb8 2.Kd7 Ka7 3.Kc8 Ka8 4.Bg3 b5
                                   5.Be5 Ka7 6.Kc7 Ka8 7.Kc6 Ka7 8.Bd4+
                                   Kb8 9.a7+ Ka8
   5805792  16:     12.9    +0.20  1.a3 Kb8 2.Kd7 Ka7 3.Kc8 Ka8 4.Bg3 b5
                                   5.Be5 Ka7 6.Kc7 Ka8 7.Kc6 Ka7 8.Bd4+
                                   Kb8 9.a7+ Ka8
   6107334  17      13.5    +0.20  1.a3 Kb8 2.Kd7 Ka7 3.Kc8 Ka8 4.Bg3 b5
                                   5.Be5 Ka7 6.Kc7 Ka8 7.Kd6 Kb8 8.Kc6+
                                   Ka7 9.Bd4+ Kb8 10.a7+ Ka8
   7571773  17:     17.3    +0.20  1.a3 Kb8 2.Kd7 Ka7 3.Kc8 Ka8 4.Bg3 b5
                                   5.Be5 Ka7 6.Kc7 Ka8 7.Kd6 Kb8 8.Kc6+
                                   Ka7 9.Bd4+ Kb8 10.a7+ Ka8
   8282772  18      18.8    +0.20  1.a3 Kb8 2.Kd7 Ka7 3.Kc8 Ka8 4.Bg3 b5
                                   5.Be5 Ka7 6.Kc7 Ka8 7.Kc6 Ka7 8.Bd4+
                                   Kb8 9.a7+ Ka8 10.Kb6 b4

User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Position crafty and stockfish both badly mis-evaluate

Post by michiguel »

jwes wrote:
BubbaTough wrote:This position is tricky because the b pawn in some positions can be forced to advance letting white convert his 2nd a pawn into a b pawn, which will foil most evals if done near the leafs of the tree. If you made the b pawn into a pawn on any other file, I expect more programs would understand this position better.

When trying to fix things in these positions, care must be taken. This particular positions is drawn, but there are similar positions which are not.

-Sam
I was thinking that too, but I can't construct a position with B and 2 a pawns vs b pawn where that works.
[D]8/2K3B1/k7/Pp6/8/P7/8/8 w - - 0 1

Bd4 wins, if Kxa5 Kb7, and if b4, axb4.

Miguel
Michel
Posts: 2273
Joined: Mon Sep 29, 2008 1:50 am

Re: Position crafty and stockfish both badly mis-evaluate

Post by Michel »

Gaviota was supposed to know all this but this position is very tricky for other reasons.
First of all, white can force black to take the bishop, and if the program does not recognize KPPKP with two rook pawns as draw, it will keep giving a high positive score.
Second, if there is no detection of stalemate in quies, the search can wisely direct the PV to make sure that always the last quies move is taking the b pawn with stalemate. The evaluation will be with a winning score (since the pawn now is in the "b" column).
Third, if the futility margin is not big enough in quies(), it will make the whole thing worst, particularly with an evil interaction with the hashtables. I needed to correct all this three things, and now it works:
Thanks!!!!

GnuChess was exhibiting the same problems as the other engines.
So I was going to start a debugging session but now you have
explained it all!

EDIT: Now how do you check efficiently for stalemate during quiescence
search :-( ?
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Position crafty and stockfish both badly mis-evaluate

Post by michiguel »

Michel wrote:
Gaviota was supposed to know all this but this position is very tricky for other reasons.
First of all, white can force black to take the bishop, and if the program does not recognize KPPKP with two rook pawns as draw, it will keep giving a high positive score.
Second, if there is no detection of stalemate in quies, the search can wisely direct the PV to make sure that always the last quies move is taking the b pawn with stalemate. The evaluation will be with a winning score (since the pawn now is in the "b" column).
Third, if the futility margin is not big enough in quies(), it will make the whole thing worst, particularly with an evil interaction with the hashtables. I needed to correct all this three things, and now it works:
Thanks!!!!

GnuChess was exhibiting the same problems as the other engines.
So I was going to start a debugging session but now you have
explained it all!

EDIT: Now how do you check efficiently for stalemate during quiescence
search :-( ?
I did not say I do this efficiently :-)

What I do now is very crude, but I plan to improve it. I cannot prove stalemate efficiently, but it is easy to prove "no stalemate" efficiently "most of the time". For instance, I do with bitboard variables

Code: Select all

if (king_moves[kingsquare] & ~opponent_attacks & ~mypieces) {
  /* no king move available */
  stalemate = full_check_for_stalemate(); /* expensive but I rarely need to do this */
} else {
  stalemate = FALSE;
}
I really need to check this, but I could also introduce other easy "bail-outs" with bitboard operations. Any non-blocked pawn that is not in the same diagonal, horizontal, or vertical of the king, means there is no stalemate (cannot be pinned). Same for any piece with mobility > 0.

A couple of years ago it took me several days of debugging to understand a similar (but more complex) position posted by Uri. The interaction with the hashtable is really nasty. I thought I eliminated most problems with "fail hard" in quies(), but obviously I did not.

Miguel
rjgibert
Posts: 317
Joined: Mon Jun 26, 2006 9:44 am

Re: Position crafty and stockfish both badly mis-evaluate

Post by rjgibert »

Your example does not "count." He implicitly meant a position with white to move where white does not have an immediate axb capture and the black king is on a7, a8, b7 or b8. Other examples are not interesting.

A 2nd black pawn on b5 would sort of count. But with one 1 black b-pawn, it does not seem possible.
User avatar
rvida
Posts: 481
Joined: Thu Apr 16, 2009 12:00 pm
Location: Slovakia, EU

Re: Position crafty and stockfish both badly mis-evaluate

Post by rvida »

Critter evaluates this position as "almost draw"

Code: Select all


2k5/8/Pp1K4/8/7B/8/P7/8 w - -

Engine: Critter 0.80 32-bit (128 MB)
by Richard Vida

24/70  0:27   +0.05    1.Kc6 Kb8 2.Bg3+ Ka7 3.Kc7 Ka8 4.Kd7 Ka7 
                       5.Kc8 Ka8 6.Bb8 b5 7.a7 b4 8.Kd7 Kb7 
                       9.Kd6 Ka8 10.Kc5 Kb7 11.Kb5 Ka8 
                       12.Be5 Kxa7 13.Bd4+ Kb7 14.Kxb4 (32.469.029) 1186 

25/70  0:32   +0.05    1.Kc6 Kb8 2.Bg3+ Ka7 3.Kc7 Ka8 4.Kd7 Ka7 
                       5.Kc8 Ka8 6.Bb8 b5 7.a7 b4 8.Kd7 Kb7 
                       9.Kd6 Ka8 10.Kc5 Kb7 11.Kb5 Ka8 
                       12.Be5 Kxa7 13.Bd4+ Kb7 14.Kxb4 (38.599.235) 1204 

26/70  0:38   +0.05    1.Kc6 Kb8 2.Bg3+ Ka7 3.Kc7 Ka8 4.Kd7 Ka7 
                       5.Kc8 Ka8 6.Bb8 b5 7.a7 b4 8.Kd7 Kb7 
                       9.Kd6 Ka8 10.Kc5 Kb7 11.Kb5 Ka8 
                       12.Be5 Kxa7 13.Bd4+ Kb7 14.Kxb4 (46.922.983) 1224 

27/70  0:46   +0.05    1.Kc6 Kb8 2.Bg3+ Ka7 3.Kc7 Ka8 4.Kd7 Ka7 
                       5.Kc8 Ka8 6.Bb8 b5 7.a7 b4 8.Kd7 Kb7 
                       9.Kd6 Ka8 10.Kc5 Kb7 11.Kb5 Ka8 
                       12.Be5 Kxa7 13.Bd4+ Kb7 14.Kxb4 (58.018.378) 1247 

28/70  0:56   +0.05    1.Kc6 Kb8 2.Bg3+ Ka7 3.Kc7 Ka8 4.Kd7 Ka7 
                       5.Kc8 Ka8 6.Bb8 b5 7.a7 b4 8.Kd7 Kb7 
                       9.Kd6 Ka8 10.Kc5 Kb7 11.Kb5 Ka8 
                       12.Be5 Kxa7 13.Bd4+ Kb7 14.Kxb4 (70.335.746) 1255 

29/70  1:09   +0.05    1.Kc6 Kb8 2.Bg3+ Ka7 3.Kc7 Ka8 4.Kd7 Ka7 
                       5.Kc8 Ka8 6.Bb8 b5 7.a7 b4 8.Kd7 Kb7 
                       9.Kd6 Ka8 10.Kc5 Kb7 11.Kb5 Ka8 
                       12.Be5 Kxa7 13.Bd4+ Kb7 14.Bf2 (88.024.778) 1265 

30/70  1:29   +0.05    1.Kc6 Kb8 2.Bg3+ Ka7 3.Kc7 Ka8 4.Kd7 Ka7 
                       5.Kc8 Ka8 6.Bb8 b5 7.a7 b4 8.Kd7 Kb7 
                       9.Kd6 Ka8 10.Kc5 Kb7 11.Kb5 Ka8 
                       12.Be5 Kxa7 13.Bd4+ Kb7 14.Kxb4 (113.074.690) 1259 

best move: Kd6-c6 time: 1:43.235 min  n/s: 1.267.947  nodes: 130.677.248