stockfish poor endgame knowledge

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
Eelco de Groot
Posts: 4671
Joined: Sun Mar 12, 2006 2:40 am
Full name:   Eelco de Groot

Re: stockfish poor endgame knowledge

Post by Eelco de Groot »

mcostalba wrote:
Uri Blass wrote: I think that it is better to be sure that the opponent cannot promote earlier or at the same time before you give a big bonus.
I think that if a feature proves good after deep test it can stay in, otherwise is removed.

It is a very simple rule, but I understand it is difficult to accept, especially by chess players that know what feature is important by an "human" point of view and intuitively extrapolate that the feature is important also for chess engines. Sometime is true sometime is not, but only testing can prove it.

So the bottom line is that what you or me do "think" that is good is totally irrilevant, only real games test properly weights the feature.
Endgames with unstoppable passed pawns are pretty basic stuff though. But as Joona already pointed out in a thread about a month ago or thereabouts, the implementation was incomplete and that is probably what you detected Marco. So good to have that information. You probably just need a better rule, and I believe Joona said it was on his to do list. Well I'm not taking the unstoppable passed pawns evaluation out yet. Maybe just a few changes and it will already work and even measurable! I don't have anything ready at the moment though.

Regards,
Eelco
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: stockfish poor endgame knowledge

Post by mcostalba »

Eelco de Groot wrote:
mcostalba wrote:
Uri Blass wrote: I think that it is better to be sure that the opponent cannot promote earlier or at the same time before you give a big bonus.
I think that if a feature proves good after deep test it can stay in, otherwise is removed.

It is a very simple rule, but I understand it is difficult to accept, especially by chess players that know what feature is important by an "human" point of view and intuitively extrapolate that the feature is important also for chess engines. Sometime is true sometime is not, but only testing can prove it.

So the bottom line is that what you or me do "think" that is good is totally irrilevant, only real games test properly weights the feature.
Endgames with unstoppable passed pawns are pretty basic stuff though. But as Joona already pointed out in a thread about a month ago or thereabouts, the implementation was incomplete and that is probably what you detected Marco. So good to have that information. You probably just need a better rule, and I believe Joona said it was on his to do list. Well I'm not taking the unstoppable passed pawns evaluation out yet. Maybe just a few changes and it will already work and even measurable! I don't have anything ready at the moment though.
I don't know why unstoppable pawns evaluation does not seem to give an ELO increase.

Perhaps it is because our implementation was not perfect, but apart that implementation in SF was already very good (and not worst then _any_ other out there where sources are available), I am quite dubious on the possibility to build a 100% bullet proof unstoppable pawn evaluation that works in all the cases.

I am more inclinced to see unstoppable pawn evaluation as a special case of the more general and in this case, yes, very critical, passed pawn evaluation.

As you can see from Uri example at depth 1 score is -0.76 instead of -200 as it would have been considering only the material because black is 2 pawns up. This is because white pawn is passed and in a more advanced rank then black ones.

My point is that a proper passed pawn evaluation _could_ somehow subsitute a missing unstoppable pawn evaluation in many cases, perhaps this could be a reason why we didn't experienced any ELO drop removing it...just a theory of course.
Uri Blass
Posts: 10892
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: stockfish poor endgame knowledge

Post by Uri Blass »

mcostalba wrote:
Eelco de Groot wrote:
mcostalba wrote:
Uri Blass wrote: I think that it is better to be sure that the opponent cannot promote earlier or at the same time before you give a big bonus.
I think that if a feature proves good after deep test it can stay in, otherwise is removed.

It is a very simple rule, but I understand it is difficult to accept, especially by chess players that know what feature is important by an "human" point of view and intuitively extrapolate that the feature is important also for chess engines. Sometime is true sometime is not, but only testing can prove it.

So the bottom line is that what you or me do "think" that is good is totally irrilevant, only real games test properly weights the feature.
Endgames with unstoppable passed pawns are pretty basic stuff though. But as Joona already pointed out in a thread about a month ago or thereabouts, the implementation was incomplete and that is probably what you detected Marco. So good to have that information. You probably just need a better rule, and I believe Joona said it was on his to do list. Well I'm not taking the unstoppable passed pawns evaluation out yet. Maybe just a few changes and it will already work and even measurable! I don't have anything ready at the moment though.
I don't know why unstoppable pawns evaluation does not seem to give an ELO increase.

Perhaps it is because our implementation was not perfect, but apart that implementation in SF was already very good (and not worst then _any_ other out there where sources are available), I am quite dubious on the possibility to build a 100% bullet proof unstoppable pawn evaluation that works in all the cases.

I am more inclinced to see unstoppable pawn evaluation as a special case of the more general and in this case, yes, very critical, passed pawn evaluation.

As you can see from Uri example at depth 1 score is -0.76 instead of -200 as it would have been considering only the material because black is 2 pawns up. This is because white pawn is passed and in a more advanced rank then black ones.

My point is that a proper passed pawn evaluation _could_ somehow subsitute a missing unstoppable pawn evaluation in many cases, perhaps this could be a reason why we didn't experienced any ELO drop removing it...just a theory of course.
I disagree that stockfish evaluation was very good.
Here is an example when stockfish detect wrong unstoppable passed pawn(see the bottom of this post).

I doubt if all free source programs who detect unstoppable passed pawn detect a3 as unstoppable.

I also do not agree with the idea of removing some knowledge that is obvious to chess players if your test find that it changes nothing and my opinion is that you should remove knowledge
only if you find that it is counter productive or if it is not clear to chess players that it is productive(otherwise you may decide to reject
some good changes that gives 1 elo and the program may be 10 elo weaker because you reject 10 changes that give 1 elo).


As a chess player
it is not clear to me if the code of stockfish1.8 was productive not because of your tests but because it is obvious that it is counter productive in significant part of the cases.

[d]7k/8/8/6p1/7p/P7/7P/K7 w - - 0 1

Analysis by Stockfish 1.8 JA:

1.a3-a4
+- (5.65) Depth: 1 00:00:00
1.a3-a4 Kh8-g7
+- (5.09) Depth: 2 00:00:00
1.a3-a4 Kh8-g7 2.a4-a5 Kg7-f6
+- (5.93) Depth: 3 00:00:00
1.a3-a4 g5-g4 2.Ka1-b2 Kh8-g7
+- (5.45) Depth: 4 00:00:01
1.a3-a4 g5-g4 2.a4-a5 g4-g3 3.h2xg3 h4xg3
-+ (-6.22) Depth: 5 00:00:01
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: stockfish poor endgame knowledge

Post by bob »

mcostalba wrote:
bob wrote:
mcostalba wrote:
bob wrote: Can't explain their scores there.
Does help you to explain knowing that there is no unstoppable pawn detection in 1.9 ?
Yes, although I am not sure why you would remove that. I tested with and without when I did the 23.0 rewrite and with (for Crafty) is definitely better...
What do you mean with "definitely better", we weren't able to see any difference, say within 3-4 ELO point that is our current resolution.
I don't remember the Elo gain. But it was larger than the margin of error. I'll try to look, but we have played hundreds of millions of games since the 22.0 rewrite..

I doubt it was "huge". 3-4 might be in the right range... more later...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: stockfish poor endgame knowledge

Post by bob »

mcostalba wrote:
Uri Blass wrote: I think that it is better to be sure that the opponent cannot promote earlier or at the same time before you give a big bonus.
I think that if a feature proves good after deep test it can stay in, otherwise is removed.

It is a very simple rule, but I understand it is difficult to accept, especially by chess players that know what feature is important by an "human" point of view and intuitively extrapolate that the feature is important also for chess engines. Sometime is true sometime is not, but only testing can prove it.

So the bottom line is that what you or me do "think" that is good is totally irrilevant, only real games test properly weights the feature.
I completely agree. It seems to be counter-intuitive to many, but if you play enough games to drive the error margin low enough to make the measurement statistically meaningful, then ignoring the results doesn't make much sense.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: stockfish poor endgame knowledge

Post by bob »

There are two useful special cases in a race. If one pawn promotes 2 moves sooner than the opponent, it is easy to deal with the idea. But for the 2 special cases:

(1) one pawn queens 1 move before the other, and with check. If the enemy king is too far away, then the check can lead to winning the pawn. If not, it is still going to promote.

(2) one pawn queens one move before the other and simultaneously attacks the promoting square of the other pawn. If the enemy king is far enough away, this also wins.

In short, the two special cases are quite complicated. We did both in Cray Blitz. I am not certain about current Crafty. We did both at one time, I assume I did not remove the code but am not certain without looking.
Uri Blass
Posts: 10892
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: stockfish poor endgame knowledge

Post by Uri Blass »

bob wrote:There are two useful special cases in a race. If one pawn promotes 2 moves sooner than the opponent, it is easy to deal with the idea. But for the 2 special cases:

(1) one pawn queens 1 move before the other, and with check. If the enemy king is too far away, then the check can lead to winning the pawn. If not, it is still going to promote.

(2) one pawn queens one move before the other and simultaneously attacks the promoting square of the other pawn. If the enemy king is far enough away, this also wins.

In short, the two special cases are quite complicated. We did both in Cray Blitz. I am not certain about current Crafty. We did both at one time, I assume I did not remove the code but am not certain without looking.
There is also the case when one side has unstoppable passed pawn but the opponent promote first inspite of not having a passed pawn because the opponent has pawn majority with more advanced pawns.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: stockfish poor endgame knowledge

Post by mcostalba »

Uri Blass wrote: There is also the case when one side has unstoppable passed pawn but the opponent promote first inspite of not having a passed pawn because the opponent has pawn majority with more advanced pawns.
Yes, this is far more common and is the main reason why we, don't handling correctly this case, took the drastical approach of removing everthing and see what happenes...and it happened that the whole unstoppable pawn stuff (a lot of code BTW) was performing below our error bar of about 3-4 ELO. Because it was a lot of code we chose to remove, perhaps if it was a couple of lines was still there, but we think that a big chunk of code _needs_ to have a good reason to stay in, and bigger and more complex is the code chunk, so much better has to be the performance increase brought by the feature: we can accept 2 lines of code give 2 ELO, but we cannot accept 200 lines of code give 4 ELO.

There is a bonus in the simplicity and readibility of the code especially in the long term.
Uri Blass
Posts: 10892
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: stockfish poor endgame knowledge

Post by Uri Blass »

I have a different opinion and I think that 200 lines of code that give 4 ELO are good.

If you refuse to add 200 lines and get 4 elo
you may also refuse to add 20.000 lines and get 200 elo(I do not say 400 elo because of diminishing returns) and it is not good that you have something that is 200 elo weaker then what you could have only because you refuse to add 20,000 lines of code.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: stockfish poor endgame knowledge

Post by bob »

Uri Blass wrote:
bob wrote:There are two useful special cases in a race. If one pawn promotes 2 moves sooner than the opponent, it is easy to deal with the idea. But for the 2 special cases:

(1) one pawn queens 1 move before the other, and with check. If the enemy king is too far away, then the check can lead to winning the pawn. If not, it is still going to promote.

(2) one pawn queens one move before the other and simultaneously attacks the promoting square of the other pawn. If the enemy king is far enough away, this also wins.

In short, the two special cases are quite complicated. We did both in Cray Blitz. I am not certain about current Crafty. We did both at one time, I assume I did not remove the code but am not certain without looking.
There is also the case when one side has unstoppable passed pawn but the opponent promote first inspite of not having a passed pawn because the opponent has pawn majority with more advanced pawns.
I first do the "majority calculation" to see how many moves it takes to promote and use that number. Usually this loses unless you have something like wp@c2, bp@b4, c4, so that you play b3 and promote in 2 more moves (if white ignores, you play b2-b1=q, or if white trades you both lose a move and you still promote in 2 moves...)