Stockfish can't evaluate the KP vs K endgame?

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Stockfish can't evaluate the KP vs K endgame?

Post by bob »

michiguel wrote:
JVMerlino wrote:
michiguel wrote:
mar wrote:
Zhu.Jianzhao wrote:I tired Stockfish 2.3.1 JA just now, got the same problem, it returns 0.00 after I entered Kf6.
Yes Miguel is right - it's a repetion problem.

Code: Select all

position fen 6k1/8/6K1/6P1/8/8/8/8 w - - 0 1 moves g6f6
However, SF would never play Kf6 in the first position anyway. Not a bug then.
It does not matter whether the engine plays Kf6 or not, if it is used to analyzed a game, it will give wrong results for positions with a repetition in its path.

Miguel
How do you know you are analyzing? you can't. If the engine is used with another tool to annotate the game, I cannot see how that could be done. IMHO, this is not a bug, but a flawed optimization.

Miguel
So this is an analysis bug only, IMO. Most engines give a draw score in the search after one repetition to (barely) improve speed, right? I wonder if Bob would do a cluster run to see if one repetition has any noticeable ELO improvement compared to requiring two reps?

Anyway, my engine does the same thing. Seems like the easiest way to resolve this is to require two reps when in analysis mode, and only one rep when playing a game (on the assumption that the speed increase has value). "Bug" fixed.

jm
I bet this is a bug that every program has. Because in a real game they will never repeat unless they want to. Humans often repeat the second time just to gain time on their clocks by reaching the next time control. The program assumes that was forced and the next repetition ends the game...
carldaman
Posts: 2287
Joined: Sat Jun 02, 2012 2:13 am

Re: Stockfish can't evaluate the KP vs K endgame?

Post by carldaman »

Uri Blass wrote:Nothing to do with evaluation of pawn endgames.

It is obvious that it is about evaluating first repetition and many engines evaluate first repetition as a draw because they are designed to play and not to analyze correctly and the programmers did not want to add code that is probably not productive for playing strength.

It is a disadvantage of the program but
I do not think that it is a bug because a bug is a behaviour that is different than the intention of the programmer.
Why not have both 'game play' and 'analysis' versions of the program, (as does Ivanhoe)?

Regards,
CL
Michel
Posts: 2292
Joined: Mon Sep 29, 2008 1:50 am

Re: Stockfish can't evaluate the KP vs K endgame?

Post by Michel »

Yes, it is trivial. I do the same in Gaviota and Bob was teaching about it for two decades. SF team seems to think that there is a faster search if this is detection is "accelerated". I think the price to pay in terms of analysis is too high, IMHO.


Ok sorry!

The discussion seemed to be all about how to detect if the engine is analyzing,
which IMHO had nothing to do with the original problem in the thread!

I think that making the distinction between moves in the search and in the game history for draw scoring has zero cost in actual game play since we may assume that the engine will not intentionally make a bad move.
lucasart
Posts: 3243
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Stockfish can't evaluate the KP vs K endgame?

Post by lucasart »

Michel wrote:GNU Chess uses the single repetion rule only for repetions occurring within the search. For repetitions involving the games history it uses the threefold repetition rule. So it analyzes this position correctly.
This is interesting. I was just reading that thread, and realizing that DiscoCheck would have the same problem as SF -- if only it had a KPK bitbase.

And the good news is that it can be implemented without any measurable cost in the general case.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
JVMerlino
Posts: 1413
Joined: Wed Mar 08, 2006 10:15 pm
Location: San Francisco, California

Re: Stockfish can't evaluate the KP vs K endgame?

Post by JVMerlino »

bob wrote:
Zhu.Jianzhao wrote:[d] 6k1/8/6K1/6P1/8/8/8/8 w - - 0 1

Code: Select all

New game
6k1/8/6K1/6P1/8/8/8/8 w - - 0 1

Analysis by Stockfish 20-02-13 64bit:

1.Kh6 Kf7 2.g6+ Kf8 3.Kh7 Ke7 4.g7 Kd6 5.g8Q Kc5 6.Kg6 Kd4 7.Kf6 Kd3 8.Ke5 Kc3 9.Qd5 Kc2 10.Kd4 Kb2 11.Qc4 Ka1 12.Kc3 Kb1 13.Qb5+ Ka2 14.Qb2# 
+-  (#14)    Depth: 46/32   00:00:36  168mN
(User,  23.02.2013)

I continued to input the move, after 1.Kf6 ...

[d] 6k1/8/5K2/6P1/8/8/8/8 b - - 0 1

Code: Select all

New game
6k1/8/5K2/6P1/8/8/8/8 b - - 0 1

Analysis by Stockfish 20-02-13 64bit:

1...Kh7 2.Kf7 Kh8 3.Kf8 Kh7 4.Kf7 
=  (0.00)    Depth: 100/7   00:00:02  1047kN
(User,  23.02.2013)

following 1...Kh7 2.Kf7, the evaluation still returns 0.00 :(

[d] 8/5K1k/8/6P1/8/8/8/8 b - - 0 2

Code: Select all

New game
8/5K1k/8/6P1/8/8/8/8 b - - 0 1

Analysis by Stockfish 20-02-13 64bit:

2...Kh8 3.Kf8 Kh7 4.Kf7 
=  (0.00)    Depth: 100/5   00:00:00  4kN
(User,  23.02.2013)



after I input 2...Kh8 3.Kg6, again the evaluation is 0.00, but we can see the mate from the analysis.

[d] 7k/8/6K1/6P1/8/8/8/8 b - - 0 3

Code: Select all

New game
7k/8/6K1/6P1/8/8/8/8 b - - 0 1

Analysis by Stockfish 20-02-13 64bit:

3...Kg8 4.Kh6 Kf7 5.g6+ Kf8 6.Kh7 Ke7 7.g7 Kd6 8.g8Q Kc5 9.Kg6 Kd4 10.Kf6 Kd3 11.Ke5 Kc3 12.Qd5 Kc2 13.Kd4 Kb2 14.Qc4 Ka1 15.Kc3 Kb1 16.Qb5+ Ka2 17.Qb2# 
=  (0.00)    Depth: 100/2   00:00:00  6kN
(User,  23.02.2013)

Have tried other engines like Hiarcs, Komodo, Fritz, only Houdini 3 don't have such problem.
There are plenty that have no problems with this. Crafty finds a mate in 18 after a couple of seconds, and a mate in 15 by 5 seconds, no EGTBs used at all...

e23-> 1.59 13.84 1. Kh6 Kf7 2. g6+ Ke6 3. g7 Kd6 4.
g8=Q Kc5 5. Qb3 Kd4 6. Qb4+ Kd3 7.
Kg6 Ke2 8. Qc4+ Kf2 9. Qd3 Kg1 10.
Qe4 Kf2 11. Kh6 Kf1 12. Kg5 Kf2 13.
Kf4 Kg1 14. Qf3 Kh2 15. Qg3+ Kh1
e24 1.76 ++ 1. Kh6! (>+14.09)
e24 1.78 ++ 1. Kh6! (>+14.25)
e24 1.78 ++ 1. Kh6! (>+14.57)
e24 1.78 ++ 1. Kh6! (>+15.21)
e24 1.78 ++ 1. Kh6! (>+16.49)
e24 1.79 ++ 1. Kh6! (>+19.05)
e24 1.79 ++ 1. Kh6! (>+24.17)
e24 3.57 Mat18 1. Kh6 Kf7 2. g6+ Kf8 3. Kh7 Ke7 4.
g7 Kd6 5. g8=Q Kc5 6. Qb8 Kd4 7. Kh6
Kc4 8. Qb6 Kc3 9. Qb5 Kd4 10. Kg5 Ke4
11. Qc6+ Kd3 12. Kf4 Kd4 13. Qb5 Kc3
14. Ke4 Kc2 15. Qb4 Kd1 16. Qb2 Ke1
17. Ke3 Kf1 18. Qf2#
e24-> 4.11 Mat18 1. Kh6 Kf7 2. g6+ Kf8 3. Kh7 Ke7 4.
g7 Kd6 5. g8=Q Kc5 6. Qb8 Kd4 7. Kh6
Kc4 8. Qb6 Kc3 9. Qb5 Kd4 10. Kg5 Ke4
11. Qc6+ Kd3 12. Kf4 Kd4 13. Qb5 Kc3
14. Ke4 Kc2 15. Qb4 Kd1 16. Qb2 Ke1
17. Ke3 Kf1 18. Qf2#
e25 4.67 Mat18 1. Kh6 Kf7 2. g6+ Kf8 3. Kh7 Ke7 4.
g7 Kd6 5. g8=Q Kc5 6. Qb8 Kc4 7. Qb6
Kd3 8. Kh6 Kd2 9. Qb3 Ke2 10. Kg5 Kd2
11. Kf4 Kc1 12. Qb5 Kc2 13. Qb4 Kd3
14. Qc5 Kd2 15. Qc4 Ke1 16. Qc2 Kf1
17. Kf3 Ke1 18. Qe2#
e25-> 5.01 Mat18 1. Kh6 Kf7 2. g6+ Kf8 3. Kh7 Ke7 4.
g7 Kd6 5. g8=Q Kc5 6. Qb8 Kc4 7. Qb6
Kd3 8. Kh6 Kd2 9. Qb3 Ke2 10. Kg5 Kd2
11. Kf4 Kc1 12. Qb5 Kc2 13. Qb4 Kd3
14. Qc5 Kd2 15. Qc4 Ke1 16. Qc2 Kf1
17. Kf3 Ke1 18. Qe2#
e26 5.74 Mat15 1. Kh6 Kf7 2. g6+ Kf8 3. Kh7 Ke7 4.
g7 Kd6 5. g8=Q Kc5 6. Qb3 Kc6 7. Qc4+
Kb6 8. Qb4+ Kc7 9. Kg6 Kc6 10. Kf6
Kc7 11. Qb5 Kc8 12. Qb6 Kd7 13. Qc5
Kd8 14. Ke6 Ke8 15. Qc8#
e26-> 5.93 Mat15 1. Kh6 Kf7 2. g6+ Kf8 3. Kh7 Ke7 4.
g7 Kd6 5. g8=Q Kc5 6. Qb3 Kc6 7. Qc4+
Kb6 8. Qb4+ Kc7 9. Kg6 Kc6 10. Kf6
Kc7 11. Qb5 Kc8 12. Qb6 Kd7 13. Qc5
Kd8 14. Ke6 Ke8 15. Qc8#
Unless I'm mistaken, I think you might have missed the initial point of the thread, with all of the other stuff that has been discussed. The point is that in the initial position the move Kf6 was forced and that SF THEN scored the position as draw, because the only way to win is to go back to the initial position, which most engines score as a draw because they have seen one repetition.

And I just confirmed this same behavior in Crafty 23.5. Setting the position, using "analyze" and then forcing Kf6 produces a -0.01 score up through depth 60, which it reaches in 1.42 seconds. Changing Crafty so that, if in analysis mode, it requires two reps to declare draw, will fix the problem.

jm
JVMerlino
Posts: 1413
Joined: Wed Mar 08, 2006 10:15 pm
Location: San Francisco, California

Re: Stockfish can't evaluate the KP vs K endgame?

Post by JVMerlino »

JVMerlino wrote: Unless I'm mistaken, I think you might have missed the initial point of the thread, with all of the other stuff that has been discussed. The point is that in the initial position the move Kf6 was forced and that SF THEN scored the position as draw, because the only way to win is to go back to the initial position, which most engines score as a draw because they have seen one repetition.

And I just confirmed this same behavior in Crafty 23.5. Setting the position, using "analyze" and then forcing Kf6 produces a -0.01 score up through depth 60, which it reaches in 1.42 seconds. Changing Crafty so that, if in analysis mode, it requires two reps to declare draw, will fix the problem.

jm
Ok, I think I have to retract the above because if you make that change then the behavior of the engine between analysis and "normal search" modes can be drastically different. Fine 70 is an obvious example. Since I use analysis mode for various test positions, making this change to the code would defeat the purpose of analysis mode.

So the other solution presented in this thread by Michel Van den Bergh is indeed correct.

Sorry for the confusion,
jm
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Stockfish can't evaluate the KP vs K endgame?

Post by bob »

JVMerlino wrote:
bob wrote:
Zhu.Jianzhao wrote:[d] 6k1/8/6K1/6P1/8/8/8/8 w - - 0 1

Code: Select all

New game
6k1/8/6K1/6P1/8/8/8/8 w - - 0 1

Analysis by Stockfish 20-02-13 64bit:

1.Kh6 Kf7 2.g6+ Kf8 3.Kh7 Ke7 4.g7 Kd6 5.g8Q Kc5 6.Kg6 Kd4 7.Kf6 Kd3 8.Ke5 Kc3 9.Qd5 Kc2 10.Kd4 Kb2 11.Qc4 Ka1 12.Kc3 Kb1 13.Qb5+ Ka2 14.Qb2# 
+-  (#14)    Depth: 46/32   00:00:36  168mN
(User,  23.02.2013)

I continued to input the move, after 1.Kf6 ...

[d] 6k1/8/5K2/6P1/8/8/8/8 b - - 0 1

Code: Select all

New game
6k1/8/5K2/6P1/8/8/8/8 b - - 0 1

Analysis by Stockfish 20-02-13 64bit:

1...Kh7 2.Kf7 Kh8 3.Kf8 Kh7 4.Kf7 
=  (0.00)    Depth: 100/7   00:00:02  1047kN
(User,  23.02.2013)

following 1...Kh7 2.Kf7, the evaluation still returns 0.00 :(

[d] 8/5K1k/8/6P1/8/8/8/8 b - - 0 2

Code: Select all

New game
8/5K1k/8/6P1/8/8/8/8 b - - 0 1

Analysis by Stockfish 20-02-13 64bit:

2...Kh8 3.Kf8 Kh7 4.Kf7 
=  (0.00)    Depth: 100/5   00:00:00  4kN
(User,  23.02.2013)



after I input 2...Kh8 3.Kg6, again the evaluation is 0.00, but we can see the mate from the analysis.

[d] 7k/8/6K1/6P1/8/8/8/8 b - - 0 3

Code: Select all

New game
7k/8/6K1/6P1/8/8/8/8 b - - 0 1

Analysis by Stockfish 20-02-13 64bit:

3...Kg8 4.Kh6 Kf7 5.g6+ Kf8 6.Kh7 Ke7 7.g7 Kd6 8.g8Q Kc5 9.Kg6 Kd4 10.Kf6 Kd3 11.Ke5 Kc3 12.Qd5 Kc2 13.Kd4 Kb2 14.Qc4 Ka1 15.Kc3 Kb1 16.Qb5+ Ka2 17.Qb2# 
=  (0.00)    Depth: 100/2   00:00:00  6kN
(User,  23.02.2013)

Have tried other engines like Hiarcs, Komodo, Fritz, only Houdini 3 don't have such problem.
There are plenty that have no problems with this. Crafty finds a mate in 18 after a couple of seconds, and a mate in 15 by 5 seconds, no EGTBs used at all...

e23-> 1.59 13.84 1. Kh6 Kf7 2. g6+ Ke6 3. g7 Kd6 4.
g8=Q Kc5 5. Qb3 Kd4 6. Qb4+ Kd3 7.
Kg6 Ke2 8. Qc4+ Kf2 9. Qd3 Kg1 10.
Qe4 Kf2 11. Kh6 Kf1 12. Kg5 Kf2 13.
Kf4 Kg1 14. Qf3 Kh2 15. Qg3+ Kh1
e24 1.76 ++ 1. Kh6! (>+14.09)
e24 1.78 ++ 1. Kh6! (>+14.25)
e24 1.78 ++ 1. Kh6! (>+14.57)
e24 1.78 ++ 1. Kh6! (>+15.21)
e24 1.78 ++ 1. Kh6! (>+16.49)
e24 1.79 ++ 1. Kh6! (>+19.05)
e24 1.79 ++ 1. Kh6! (>+24.17)
e24 3.57 Mat18 1. Kh6 Kf7 2. g6+ Kf8 3. Kh7 Ke7 4.
g7 Kd6 5. g8=Q Kc5 6. Qb8 Kd4 7. Kh6
Kc4 8. Qb6 Kc3 9. Qb5 Kd4 10. Kg5 Ke4
11. Qc6+ Kd3 12. Kf4 Kd4 13. Qb5 Kc3
14. Ke4 Kc2 15. Qb4 Kd1 16. Qb2 Ke1
17. Ke3 Kf1 18. Qf2#
e24-> 4.11 Mat18 1. Kh6 Kf7 2. g6+ Kf8 3. Kh7 Ke7 4.
g7 Kd6 5. g8=Q Kc5 6. Qb8 Kd4 7. Kh6
Kc4 8. Qb6 Kc3 9. Qb5 Kd4 10. Kg5 Ke4
11. Qc6+ Kd3 12. Kf4 Kd4 13. Qb5 Kc3
14. Ke4 Kc2 15. Qb4 Kd1 16. Qb2 Ke1
17. Ke3 Kf1 18. Qf2#
e25 4.67 Mat18 1. Kh6 Kf7 2. g6+ Kf8 3. Kh7 Ke7 4.
g7 Kd6 5. g8=Q Kc5 6. Qb8 Kc4 7. Qb6
Kd3 8. Kh6 Kd2 9. Qb3 Ke2 10. Kg5 Kd2
11. Kf4 Kc1 12. Qb5 Kc2 13. Qb4 Kd3
14. Qc5 Kd2 15. Qc4 Ke1 16. Qc2 Kf1
17. Kf3 Ke1 18. Qe2#
e25-> 5.01 Mat18 1. Kh6 Kf7 2. g6+ Kf8 3. Kh7 Ke7 4.
g7 Kd6 5. g8=Q Kc5 6. Qb8 Kc4 7. Qb6
Kd3 8. Kh6 Kd2 9. Qb3 Ke2 10. Kg5 Kd2
11. Kf4 Kc1 12. Qb5 Kc2 13. Qb4 Kd3
14. Qc5 Kd2 15. Qc4 Ke1 16. Qc2 Kf1
17. Kf3 Ke1 18. Qe2#
e26 5.74 Mat15 1. Kh6 Kf7 2. g6+ Kf8 3. Kh7 Ke7 4.
g7 Kd6 5. g8=Q Kc5 6. Qb3 Kc6 7. Qc4+
Kb6 8. Qb4+ Kc7 9. Kg6 Kc6 10. Kf6
Kc7 11. Qb5 Kc8 12. Qb6 Kd7 13. Qc5
Kd8 14. Ke6 Ke8 15. Qc8#
e26-> 5.93 Mat15 1. Kh6 Kf7 2. g6+ Kf8 3. Kh7 Ke7 4.
g7 Kd6 5. g8=Q Kc5 6. Qb3 Kc6 7. Qc4+
Kb6 8. Qb4+ Kc7 9. Kg6 Kc6 10. Kf6
Kc7 11. Qb5 Kc8 12. Qb6 Kd7 13. Qc5
Kd8 14. Ke6 Ke8 15. Qc8#
Unless I'm mistaken, I think you might have missed the initial point of the thread, with all of the other stuff that has been discussed. The point is that in the initial position the move Kf6 was forced and that SF THEN scored the position as draw, because the only way to win is to go back to the initial position, which most engines score as a draw because they have seen one repetition.

And I just confirmed this same behavior in Crafty 23.5. Setting the position, using "analyze" and then forcing Kf6 produces a -0.01 score up through depth 60, which it reaches in 1.42 seconds. Changing Crafty so that, if in analysis mode, it requires two reps to declare draw, will fix the problem.

jm
I simply pointed out that in the original position, MOST programs get it right, with or without endgame databases. But if you read what I continued, ALL programs will get it wrong when the human intentionally repeated a position for the second time, with no intention of repeating for a third allowing a repetition draw. A program won't understand that "concept". And it is not something broken, as you have two choices (I have tried both extensively).

1. 2 reps is treated as a draw. Best option by far, but fails in this unusual type of position.

2. 3 reps in the game, or two in the tree (counting the game) is a draw, but not one in the game and one in the tree (which helps in this position). This works, but it makes it harder to see draws and also makes the trees larger. I did this for a year or so, but went back to the more rational 2 reps == draw, period.

I suppose you COULD add (3) which is 3 reps == draw, period, never two. Anyone that tests that will discover it is bad. Very bad.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Stockfish can't evaluate the KP vs K endgame?

Post by bob »

JVMerlino wrote:
JVMerlino wrote: Unless I'm mistaken, I think you might have missed the initial point of the thread, with all of the other stuff that has been discussed. The point is that in the initial position the move Kf6 was forced and that SF THEN scored the position as draw, because the only way to win is to go back to the initial position, which most engines score as a draw because they have seen one repetition.

And I just confirmed this same behavior in Crafty 23.5. Setting the position, using "analyze" and then forcing Kf6 produces a -0.01 score up through depth 60, which it reaches in 1.42 seconds. Changing Crafty so that, if in analysis mode, it requires two reps to declare draw, will fix the problem.

jm
Ok, I think I have to retract the above because if you make that change then the behavior of the engine between analysis and "normal search" modes can be drastically different. Fine 70 is an obvious example. Since I use analysis mode for various test positions, making this change to the code would defeat the purpose of analysis mode.

So the other solution presented in this thread by Michel Van den Bergh is indeed correct.

Sorry for the confusion,
jm
BTW, didn't you mean "change crafty so that analysis requires THREE reps for a draw"?? It already does the 2-reps == draw everywhere in the search, analysis mode or not.
JVMerlino
Posts: 1413
Joined: Wed Mar 08, 2006 10:15 pm
Location: San Francisco, California

Re: Stockfish can't evaluate the KP vs K endgame?

Post by JVMerlino »

bob wrote:
JVMerlino wrote:
JVMerlino wrote: Unless I'm mistaken, I think you might have missed the initial point of the thread, with all of the other stuff that has been discussed. The point is that in the initial position the move Kf6 was forced and that SF THEN scored the position as draw, because the only way to win is to go back to the initial position, which most engines score as a draw because they have seen one repetition.

And I just confirmed this same behavior in Crafty 23.5. Setting the position, using "analyze" and then forcing Kf6 produces a -0.01 score up through depth 60, which it reaches in 1.42 seconds. Changing Crafty so that, if in analysis mode, it requires two reps to declare draw, will fix the problem.

jm
Ok, I think I have to retract the above because if you make that change then the behavior of the engine between analysis and "normal search" modes can be drastically different. Fine 70 is an obvious example. Since I use analysis mode for various test positions, making this change to the code would defeat the purpose of analysis mode.

So the other solution presented in this thread by Michel Van den Bergh is indeed correct.

Sorry for the confusion,
jm
BTW, didn't you mean "change crafty so that analysis requires THREE reps for a draw"?? It already does the 2-reps == draw everywhere in the search, analysis mode or not.
Indeed -- again, apologies for the confusion.

jm
Uri Blass
Posts: 11204
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Stockfish can't evaluate the KP vs K endgame?

Post by Uri Blass »

bob wrote:
JVMerlino wrote:
JVMerlino wrote: Unless I'm mistaken, I think you might have missed the initial point of the thread, with all of the other stuff that has been discussed. The point is that in the initial position the move Kf6 was forced and that SF THEN scored the position as draw, because the only way to win is to go back to the initial position, which most engines score as a draw because they have seen one repetition.

And I just confirmed this same behavior in Crafty 23.5. Setting the position, using "analyze" and then forcing Kf6 produces a -0.01 score up through depth 60, which it reaches in 1.42 seconds. Changing Crafty so that, if in analysis mode, it requires two reps to declare draw, will fix the problem.

jm
Ok, I think I have to retract the above because if you make that change then the behavior of the engine between analysis and "normal search" modes can be drastically different. Fine 70 is an obvious example. Since I use analysis mode for various test positions, making this change to the code would defeat the purpose of analysis mode.

So the other solution presented in this thread by Michel Van den Bergh is indeed correct.

Sorry for the confusion,
jm
BTW, didn't you mean "change crafty so that analysis requires THREE reps for a draw"?? It already does the 2-reps == draw everywhere in the search, analysis mode or not.
I think that it already evaluates first repetition as a draw.
The word repetition means that the position happened at least twice.