Fail soft/Fail hard

michiguel · Post by **michiguel** » Tue May 06, 2008 12:19 am

How many of you are using one or the other? I never paid to much attention to this. When I first started, I chosed fail soft because I thought it was ok and forgot about. I recently found a serious problem with it in quies(). I took me several days of debugging. There was nothing wrong with code, but the approach itself is flawed.

In fail soft, if you return a value = v that is lower than alpha, you are saying that the score in that position is <= v. However, you do not know that because there may be another move, which you did not consider, that break that barrier v. It may still be <= alpha, but it could at least be >b. In very simple endgames when positional factors could be huge (K + wrong B + rook pawn vs. K is a draw despite the material advantage) the problem showed up. A wrong score was propagated through the hashtable. It is easier for this to be revealed in simple endgames because the number of alternative moves is lower, and the score from a single position can easily "poison" the score of another position, and another, and another... I saw that the analysis of a drawing KBPkp showed a mate score after I introduced knowledge about the endgame. WTF?!?!?!

Tracking this bug (or flaw) was an epic endeavor.

Of course, now I fail hard and the problem is gone in quiescence. I think I will fail hard in the rest of the search too. Will I lose much?

Was I the only one bitten by this type of behavior?

Miguel

Zach Wegner · Post by **Zach Wegner** » Tue May 06, 2008 12:42 am

I'm not sure exactly what you are describing as the problem, but fail soft is correct, and should never give incorrect bounds if used properly.

In fail soft, if you return a value = v that is lower than alpha, you are saying that the score in that position is <= v. However, you do not know that because there may be another move, which you did not consider, that break that barrier v.

But you DO know that it's the lowest, because you keep track of the upper bounds returned by all the moves.

michiguel · Post by **michiguel** » Tue May 06, 2008 1:42 am

Zach Wegner wrote:I'm not sure exactly what you are describing as the problem, but fail soft is correct, and should never give incorrect bounds if used properly.

Properly is the magic word

My point is that the only way to make it properly (or safe) is to search all the sibling moves (does not happen in quies()) or not to use the hashtable. The only question is how unsafe is it.

In fail soft, if you return a value = v that is lower than alpha, you are saying that the score in that position is <= v. However, you do not know that because there may be another move, which you did not consider, that break that barrier v.
But you DO know that it's the lowest, because you keep track of the upper bounds returned by all the moves.

In many situations you end up searching fewer moves. Whenever that happens, it is a risk. There may be a move that brings the score to alpha that you did not consider. It is an educated gamble. However, it it worse when you are saying that no other move can bring the score up to the score of the "best score" you sa (this best could be very low!).

For instance, you have two possible captures when alpha is 0 (draw). You examine the first one and you are checkmated in two moves. That is the best score so far (-MATE02). The next capture, you evaluate that it is impossible to bring the score close to alpha and you prune it. Then you return the best score -MATE02 (fail soft). No problem, because that means fail low anyway and a fail high in the previous ply. But, if this is stored in the hashtable and this position is reached again in a different path in different situation with different bounds, it could seriously screw the search at that branch because (-MATE02 or lower) is a wrong score. This may propagate with serious consequences. As I said, this is rare and I only saw it in a couple of endgames, but I wonder whether it contaminates the score in other positions without being obvious.

Please note that the simplified example I gave is not exactly what I had. I just wanted to make a point.

Miguel

Zach Wegner · Post by **Zach Wegner** » Tue May 06, 2008 2:15 am

michiguel wrote:
Zach Wegner wrote:I'm not sure exactly what you are describing as the problem, but fail soft is correct, and should never give incorrect bounds if used properly.
Properly is the magic word My point is that the only way to make it properly (or safe) is to
search all the sibling moves (does not happen in quies()) or not to use the hashtable. The only question is how unsafe is it.

Well, the bound is proper for the qsearch, which is what it is used for.

In many situations you end up searching fewer moves. Whenever that happens, it is a risk. There may be a move that brings the score to alpha that you did not consider. It is an educated gamble. However, it it worse when you are saying that no other move can bring the score up to the score of the "best score" you sa (this best could be very low!).

The only time this would make a difference is when you are doing hard forward pruning in the main search. If this starts to be a problem, you can try basing pruning decisions on best_score instead of alpha (though this will probably make your search more unstable).

For instance, you have two possible captures when alpha is 0 (draw). You examine the first one and you are checkmated in two moves. That is the best score so far (-MATE02). The next capture, you evaluate that it is impossible to bring the score close to alpha and you prune it. Then you return the best score -MATE02 (fail soft).

Not quite. If the second capture could not hold onto alpha, but doesn't result in a direct checkmate, then that would be higher than the first move and the bound would increase. But anyways, even if both moves lead to being mated, you would return eval() from the stand pat score. (You are incrementing best_score with the stand pat score, right?)

No problem, because that means fail low anyway and a fail high in the previous ply. But, if this is stored in the hashtable and this position is reached again in a different path in different situation with different bounds, it could seriously screw the search at that branch because (-MATE02 or lower) is a wrong score. This may propagate with serious consequences. As I said, this is rare and I only saw it in a couple of endgames, but I wonder whether it contaminates the score in other positions without being obvious.

Well what I do, just because fail soft _does_ interact with hashing, null move, qsearch etc. in complicated ways, and because I want to reduce search instability: whenever I get a fail high or fail low I store alpha or beta in the hash table instead of the returned score. I haven't measured whether it makes a difference or not, but it's basically just a preventative measure against instability. It would still be correct to store best_score though.

bob · Post by **bob** » Tue May 06, 2008 2:23 am

michiguel wrote:How many of you are using one or the other? I never paid to much attention to this. When I first started, I chosed fail soft because I thought it was ok and forgot about. I recently found a serious problem with it in quies(). I took me several days of debugging. There was nothing wrong with code, but the approach itself is flawed.

In fail soft, if you return a value = v that is lower than alpha, you are saying that the score in that position is <= v. However, you do not know that because there may be another move, which you did not consider, that break that barrier v. It may still be <= alpha, but it could at least be >b. In very simple endgames when positional factors could be huge (K + wrong B + rook pawn vs. K is a draw despite the material advantage) the problem showed up. A wrong score was propagated through the hashtable. It is easier for this to be revealed in simple endgames because the number of alternative moves is lower, and the score from a single position can easily "poison" the score of another position, and another, and another... I saw that the analysis of a drawing KBPkp showed a mate score after I introduced knowledge about the endgame. WTF?!?!?!

Tracking this bug (or flaw) was an epic endeavor.

Of course, now I fail hard and the problem is gone in quiescence. I think I will fail hard in the rest of the search too. Will I lose much?

Was I the only one bitten by this type of behavior?

Miguel

You simply have to be careful. A value < alpha does not mean anything other than "there is a move that produces a score < alpha, namely V. This V should be a valid upper bound on the search, since other moves might further improve this V (from the opponent's perspective).

I played with it when trying mtd(f) because the value V gives you a better starting point than simply slightly lowering alpha or raising beta. But even then it is just a wild guess...

michiguel · Post by **michiguel** » Tue May 06, 2008 4:20 am

Zach Wegner wrote:
michiguel wrote:
Zach Wegner wrote:I'm not sure exactly what you are describing as the problem, but fail soft is correct, and should never give incorrect bounds if used properly.
Properly is the magic word My point is that the only way to make it properly (or safe) is to
search all the sibling moves (does not happen in quies()) or not to use the hashtable. The only question is how unsafe is it.
Well, the bound is proper for the qsearch, which is what it is used for.

but it propagates to the main search.

In many situations you end up searching fewer moves. Whenever that happens, it is a risk. There may be a move that brings the score to alpha that you did not consider. It is an educated gamble. However, it it worse when you are saying that no other move can bring the score up to the score of the "best score" you sa (this best could be very low!).
The only time this would make a difference is when you are doing hard forward pruning in the main search. If this starts to be a problem, you can try basing pruning decisions on best_score instead of alpha (though this will probably make your search more unstable).

quies_search() is hard pruning by definition. What I am trying to say is that I saw this problem. I just wonder how serious it may be.

For instance, you have two possible captures when alpha is 0 (draw). You examine the first one and you are checkmated in two moves. That is the best score so far (-MATE02). The next capture, you evaluate that it is impossible to bring the score close to alpha and you prune it. Then you return the best score -MATE02 (fail soft).
Not quite. If the second capture could not hold onto alpha, but doesn't result in a direct checkmate, then that would be higher than the first move and the bound would increase. But anyways, even if both moves lead to being mated, you would return eval() from the stand pat score. (You are incrementing best_score with the stand pat score, right?)

As I said, it is just an example to make a point. Yes, you are right, the stand pat score may be higher (lets say -300 cp), but it does not change the concept. You return -300 which still may be a very incorrect score. The same reasoning applies.

Miguel

No problem, because that means fail low anyway and a fail high in the previous ply. But, if this is stored in the hashtable and this position is reached again in a different path in different situation with different bounds, it could seriously screw the search at that branch because (-MATE02 or lower) is a wrong score. This may propagate with serious consequences. As I said, this is rare and I only saw it in a couple of endgames, but I wonder whether it contaminates the score in other positions without being obvious.
Well what I do, just because fail soft _does_ interact with hashing, null move, qsearch etc. in complicated ways, and because I want to reduce search instability: whenever I get a fail high or fail low I store alpha or beta in the hash table instead of the returned score. I haven't measured whether it makes a difference or not, but it's basically just a preventative measure against instability. It would still be correct to store best_score though.

Harald Johnsen · Post by **Harald Johnsen** » Tue May 06, 2008 9:35 am

As I said, it is just an example to make a point. Yes, you are right, the stand pat score may be higher (lets say -300 cp), but it does not change the concept. You return -300 which still may be a very incorrect score. The same reasoning applies

Who cares ? The eval is allways incorrect, every search at depth n is invalidated with a search at depth n+1. Your example has nothing special, your endgame position was evaluated incorrectly because of qs pruning, but it will be evaluated less incorectly when you do your next search because then you won't be in qs but in the main search.

HJ.

hgm · Post by **hgm** » Tue May 06, 2008 9:57 am

michiguel wrote:For instance, you have two possible captures when alpha is 0 (draw). You examine the first one and you are checkmated in two moves. That is the best score so far (-MATE02). The next capture, you evaluate that it is impossible to bring the score close to alpha and you prune it. Then you return the best score -MATE02 (fail soft).

This is an incorrect implementation of futility pruning. You prune moves that have curEval + pieceValue[victim] + MARGIN <= alpha, because you are confident that a search would return a score < curEval + pieceValue[victim] + MARGIN, so that score <= alpha.

So the move should be treated like it returned a score curEval + pieceValue[victim] + MARGIN, which represents an upper bound, like any score below alpha would. This means you have to up bestScore (the soft-fail upper bound that will be returned and put in the hash when we are done with the node) to this value, just like you would have done when you had done the search, and it would have returned this score.

That in QS some of the non-captures might not have been futile (curEval <= alpha, so you did not get a stand-pat cutoff, but curEval + MARGIN > alpha), but were pruned anyway , because this is QS, is no problem. The returned (hashed) score of a QS search is only supposed to be valid for QS (like a stand-pat score). It is normal for scores (and score bounds) to change on deeper search. This is why you store the depth in the hash table, and if you need a d=1 search, you won't accept a QS score.

michiguel · Post by **michiguel** » Tue May 06, 2008 7:31 pm

hgm wrote:
michiguel wrote:For instance, you have two possible captures when alpha is 0 (draw). You examine the first one and you are checkmated in two moves. That is the best score so far (-MATE02). The next capture, you evaluate that it is impossible to bring the score close to alpha and you prune it. Then you return the best score -MATE02 (fail soft).
This is an incorrect implementation of futility pruning. You prune moves that have curEval + pieceValue[victim] + MARGIN <= alpha, because you are confident that a search would return a score < curEval + pieceValue[victim] + MARGIN, so that score <= alpha.

So the move should be treated like it returned a score curEval + pieceValue[victim] + MARGIN, which represents an upper bound, like any score below alpha would. This means you have to up bestScore (the soft-fail upper bound that will be returned and put in the hash when we are done with the node) to this value, just like you would have done when you had done the search, and it would have returned this score.

Yes, what you say is correct and I believe this is certainly a way to minimize the problem. i do not believe it will eliminate it. Please, do not take literally my made-up example given just to illustrate the problem, which was more subtle. The origin of my post is to ask if it is worthy to deal with all this just to fail soft and not hard. Is there a really significant gain?

That in QS some of the non-captures might not have been futile (curEval <= alpha, so you did not get a stand-pat cutoff, but curEval + MARGIN > alpha), but were pruned anyway , because this is QS, is no problem. The returned (hashed) score of a QS search is only supposed to be valid for QS (like a stand-pat score). It is normal for scores (and score bounds) to change on deeper search. This is why you store the depth in the hash table, and if you need a d=1 search, you won't accept a QS score.

That is not the problem. In fact, I do not store hash entries in the quies() and I do not even probe there. The score gets propagate it to the regular search at least to d=0. The problem is that the search later manages to
reach to this position at d=0 again (with delaying tactics) because it sees a good score there as long as is d=0. When that happens, the score at d=1 will depend on the alternative moves to avoid that position. In a very elemental endgame, sometimes there are not many options and one side is forced to play a suboptimal move. That creates a wrong score in the table at d=1. In the next search, the engine will take advantage of that position and use it as a goal. What I saw is that a bad score in the table started to get propagated iteration after iteration. The position was being reached by different paths. This is very rare with the exception of simple endgames, when there are many transpositions. I saw this, and the problem position was deep. It took me a while to track it.

The wrong score at the beginning does not need to be big. I started to suspect something wrong when I was testing KBP(wrong color) vs kp. I put knowledge to get a drawish score, but I was seeing somethin like +50 cp with very weird PVs. With fail hard, or increasing the margin of pruning, solve the problem altogether.

I should try to reproduce the problem to show it here but that was not the point of my post. The point was to ask: Do you use fail soft or fail hard? is fail soft much more efficient than the other in your experience? If the people see a decrease of nodes search of 1% I will use fail hard and make the code much simpler and forget about it.

Miguel

Gerd Isenberg · Post by **Gerd Isenberg** » Tue May 06, 2008 9:14 pm

michiguel wrote: I should try to reproduce the problem to show it here but that was not the point of my post. The point was to ask: Do you use fail soft or fail hard? is fail soft much more efficient than the other in your experience? If the people see a decrease of nodes search of 1% I will use fail hard and make the code much simpler and forget about it.

Miguel

I always used fail soft - without taking much care about. I don't really understood what might be an advantage of fail-hard and never tried it seriously. My futility and lazy eval margin is probably big enough. I don't adjust any bounds based on hash-hits and actually I store/probe q-nodes, I don't prune based on hash-hits at pv-nodes, I don't use aspiration windows, but -+oo at the root.

Gerd

Fail soft/Fail hard

Fail soft/Fail hard

Re: Fail soft/Fail hard

Re: Fail soft/Fail hard

Re: Fail soft/Fail hard

Re: Fail soft/Fail hard

Re: Fail soft/Fail hard

Re: Fail soft/Fail hard

Re: Fail soft/Fail hard

Re: Fail soft/Fail hard

Re: Fail soft/Fail hard