Good Example of Horizon effect in Eval

bob · Post by **bob** » Sat Feb 09, 2008 2:57 am

hgm wrote:Evaluation discontinuities are hard on the search, but if they are a reality of life, the search will have to deal with them (through a smarter extension policy).

If your opponents King is so exposed that your best estimate is that (considering the Pawn structure) he will lose two Pawns becasue of double attacks (check+Pawn) by the Queen, then the two Pawn advantage that evaporates on trading Queens before cashing in on the opponent's bad King safety is a real effect. Your Queen is worth 1100cP, and the opponent's Queen only 900cP in this situation.

One problem. This assumes your king safety scores are "dead on". 99% of the scores in current programs are anywhere but "dead on". I've noticed and reported many times over the years, that it is often far more important to score some feature, as opposed to what the actual numeric score for that feature really should be, within some reasonable range. All that matters is that the scores are somehow "in the proper scale relationship" with others.

This means that in the above case, you might be OK if your scores are dead on. And if your extensions work perfectly. But I don't personally try to extend all the non-tactical issues I find. I can give up a 1/2 pawn advantage here, or a 1/3 pawn advantage over there, most programs choose 1/3. Even though the 1/2 pawn advantage is going to disappear before long, as the 1/3 giveaway is just a horizon effect. And trying to stave off that kind of horizon effect is extremely difficult, we barely control the material/tactical type horizon effects..

This is not different from having a Rook against the opponent's Bishop. There it also would be unwise to 'trade' the Rook for the Bishop. If your search cannot handle the 200cP loss that will materialize at the instant of the trade, there is no way you could fix it by fiddling with the King safety. If the opponent can force you to lose the exchange, your search will have to be able to handle it, or you will see classical horizon effect. Scrificing Pawns for no end, or wrecking King safety, might be manifestations of that.

Note that it really is impossible to eliminate discontinuities, because the game of chess itself is not a continuous system. Pawns and Pieces have discrete values. And any term that is based upon those values will have some discontinuity associated with it. But the larger the discontinuity, the more mischief the search can work...

The castling example (from which, unfortunately, Joker suffers a lot, I must confess) seems more an example of mis-evaluation: the discontinuity is not real there in the first place. But that doesn't prove that discontinuities are bad: a continuous mis-evaluation can be just as bad. In this case the possession of castling rights is apparently under-estimated, so that the engine think it has an unrealistically large advantage when the opponent has not castled yet, but is still allowed to do so. It should be solved by more realistic evaluation of the rights.

I'd bet most of these discontinuity issues are results of misevaluations. Unfortunately, at least for me, eliminating them has proven to be very difficult...

hgm · Post by **hgm** » Sat Feb 09, 2008 9:06 am

I agree: in 95% of the cases, the evaluation of a feature (e.g. a backward Pawn), can only serve to determine if the program will create it or avoid it. Only in ~5% of the cases it is confronted with a position where it really has the option of 'trading' a weaknes of one kind for a gain of another kind. So just roughly scoring the features solves 95% of the cases.

Horizon effect is the Achilles heel of the search + leave evaluation scheme. But I think it is better to approach that problem directly, then to hide it by fudging the evaluation. So I am thinking more in the direction of coupling evaluation directly to search decisions, rather than indirectly via the score.

E.g. trapped pieces are obviously a bad feature, and penalizing them in the evaluation helps engines to avoid them where there is no need to have them. But whatever the magnitude of the penalty, you will never get satisfactory behavior when it has to weigh the disadvantage of the trapping against other material advantages. The classical examples are BxP on at, followed by b6, and NxR on a8. If you can safely retreat the B or N, the move wins the game, if you can't, it loses. But no matter how you evaluate it, if you don't have the depth to resolve the issue, it remains a blind gamble.

So when your evaluation recognizes such a pattern, rather than just adding some arbitrary amount to the score, it really should ignore it in the numerical value of the score (mostly; a small mobility penalty would be OK), but attach some qualifier to the score "but with trapped piece". Basically this amounts to multidimensional scoring. Minimax should then be somewhat more clever in the "if(score > alpha)" part. Comparison of the numerical part of the score only makes sense if the attached qualifiers are equal (so both scores have the trapped piece, or they haven't). If some moves resolve the trapping one way or the other, but others don't, it would have to determine the maximum of each class of moves separately.

And then in the end it will have to face the tough decision if it will go for the best move that solves the trapping, or that it is going for the best move that doesn't. As you can put an upper limit to the value of the trapping (it loses you at most a Bishop for a Pawn, or a full Knight, respectively), the difference of the numerical part might already be large enough to make the decision. If all moves that resolve the trapping within the search depth cost you the piece, obviously maintaining the trapping without sacrificing other material would be preferrable. But if you can only hang on to the piece by sacrificing a Pawn, but leaving it trapped, the worst-case ending of that branch would score below resolving it at the expense of the piece now. The engine will then realize that it cannot make a sensible choice without further search, and can then re-search the unresolved branch with an extension. Or opt for passing the uncertainty back towards the root first (by picking the unresolved score as 'best'), in the hope that the current branch will be cut off by alpha-beta by a later move, so that it will never be needed to resolve it. If that doesn't work out, the search will eventuall be sent back up into the tree from a lower level, as the root will never be satisfied with unresolved scores.

Uri Blass · Post by **Uri Blass** » Sat Feb 09, 2008 10:09 am

hgm wrote:I agree: in 95% of the cases, the evaluation of a feature (e.g. a backward Pawn), can only serve to determine if the program will create it or avoid it. Only in ~5% of the cases it is confronted with a position where it really has the option of 'trading' a weaknes of one kind for a gain of another kind. So just roughly scoring the features solves 95% of the cases.

Horizon effect is the Achilles heel of the search + leave evaluation scheme. But I think it is better to approach that problem directly, then to hide it by fudging the evaluation. So I am thinking more in the direction of coupling evaluation directly to search decisions, rather than indirectly via the score.

E.g. trapped pieces are obviously a bad feature, and penalizing them in the evaluation helps engines to avoid them where there is no need to have them. But whatever the magnitude of the penalty, you will never get satisfactory behavior when it has to weigh the disadvantage of the trapping against other material advantages. The classical examples are BxP on at, followed by b6, and NxR on a8. If you can safely retreat the B or N, the move wins the game, if you can't, it loses. But no matter how you evaluate it, if you don't have the depth to resolve the issue, it remains a blind gamble.

So when your evaluation recognizes such a pattern, rather than just adding some arbitrary amount to the score, it really should ignore it in the numerical value of the score (mostly; a small mobility penalty would be OK), but attach some qualifier to the score "but with trapped piece". Basically this amounts to multidimensional scoring. Minimax should then be somewhat more clever in the "if(score > alpha)" part. Comparison of the numerical part of the score only makes sense if the attached qualifiers are equal (so both scores have the trapped piece, or they haven't). If some moves resolve the trapping one way or the other, but others don't, it would have to determine the maximum of each class of moves separately.

And then in the end it will have to face the tough decision if it will go for the best move that solves the trapping, or that it is going for the best move that doesn't. As you can put an upper limit to the value of the trapping (it loses you at most a Bishop for a Pawn, or a full Knight, respectively), the difference of the numerical part might already be large enough to make the decision. If all moves that resolve the trapping within the search depth cost you the piece, obviously maintaining the trapping without sacrificing other material would be preferrable. But if you can only hang on to the piece by sacrificing a Pawn, but leaving it trapped, the worst-case ending of that branch would score below resolving it at the expense of the piece now. The engine will then realize that it cannot make a sensible choice without further search, and can then re-search the unresolved branch with an extension. Or opt for passing the uncertainty back towards the root first (by picking the unresolved score as 'best'), in the hope that the current branch will be cut off by alpha-beta by a later move, so that it will never be needed to resolve it. If that doesn't work out, the search will eventuall be sent back up into the tree from a lower level, as the root will never be satisfied with unresolved scores.

A pattern like Bxa7 protected pawn of the opponent at b6 may happen many times in the search and not in the root position.

You need to have static evaluation to it(because if you decide to make a search every time that it happens you may spend a lot of time and get smaller depth).

Uri

Zach Wegner · Post by **Zach Wegner** » Sat Feb 09, 2008 4:55 pm

hgm wrote:I agree: in 95% of the cases, the evaluation of a feature (e.g. a backward Pawn), can only serve to determine if the program will create it or avoid it. Only in ~5% of the cases it is confronted with a position where it really has the option of 'trading' a weaknes of one kind for a gain of another kind. So just roughly scoring the features solves 95% of the cases.

Horizon effect is the Achilles heel of the search + leave evaluation scheme. But I think it is better to approach that problem directly, then to hide it by fudging the evaluation. So I am thinking more in the direction of coupling evaluation directly to search decisions, rather than indirectly via the score.

E.g. trapped pieces are obviously a bad feature, and penalizing them in the evaluation helps engines to avoid them where there is no need to have them. But whatever the magnitude of the penalty, you will never get satisfactory behavior when it has to weigh the disadvantage of the trapping against other material advantages. The classical examples are BxP on at, followed by b6, and NxR on a8. If you can safely retreat the B or N, the move wins the game, if you can't, it loses. But no matter how you evaluate it, if you don't have the depth to resolve the issue, it remains a blind gamble.

So when your evaluation recognizes such a pattern, rather than just adding some arbitrary amount to the score, it really should ignore it in the numerical value of the score (mostly; a small mobility penalty would be OK), but attach some qualifier to the score "but with trapped piece". Basically this amounts to multidimensional scoring. Minimax should then be somewhat more clever in the "if(score > alpha)" part. Comparison of the numerical part of the score only makes sense if the attached qualifiers are equal (so both scores have the trapped piece, or they haven't). If some moves resolve the trapping one way or the other, but others don't, it would have to determine the maximum of each class of moves separately.

And then in the end it will have to face the tough decision if it will go for the best move that solves the trapping, or that it is going for the best move that doesn't. As you can put an upper limit to the value of the trapping (it loses you at most a Bishop for a Pawn, or a full Knight, respectively), the difference of the numerical part might already be large enough to make the decision. If all moves that resolve the trapping within the search depth cost you the piece, obviously maintaining the trapping without sacrificing other material would be preferrable. But if you can only hang on to the piece by sacrificing a Pawn, but leaving it trapped, the worst-case ending of that branch would score below resolving it at the expense of the piece now. The engine will then realize that it cannot make a sensible choice without further search, and can then re-search the unresolved branch with an extension. Or opt for passing the uncertainty back towards the root first (by picking the unresolved score as 'best'), in the hope that the current branch will be cut off by alpha-beta by a later move, so that it will never be needed to resolve it. If that doesn't work out, the search will eventuall be sent back up into the tree from a lower level, as the root will never be satisfied with unresolved scores.

I agree with Uri about the Bxa7 case, that is very common and should be covered by eval (though I don't have that pattern ATM). But I think this problem in general should be left to qsearch. Setting a flag about a trapped piece is fine, but potentially having your branching factor double in these situations is no good. And the more flags you have in the search, the worse your branching factor gets. IMO the qsearch should just resolve the issue by not accepting a stand pat and then searching until there is no trapped piece, possibly trying some "quiet" moves by the piece.

hgm · Post by **hgm** » Sat Feb 09, 2008 7:23 pm

Then we disagree. I don't think that static evaluation can efficiently substitute for search. That is why we do QS in stead of doing a smarter evaluation that accounts for hanging pieces. That is why we do search in the first place. While it would take a lot less time to just play the move that leads to the smallest evaluation...

Uri Blass · Post by **Uri Blass** » Sat Feb 09, 2008 8:04 pm

hgm wrote:Then we disagree. I don't think that static evaluation can efficiently substitute for search. That is why we do QS in stead of doing a smarter evaluation that accounts for hanging pieces. That is why we do search in the first place. While it would take a lot less time to just play the move that leads to the smallest evaluation...

Your second example is simply not relevant because nobody suggest not to search when you see by evaluation that the bishop is trapped.

Talking about your first example then qsearch is small search that lead to clear conclusions(if the side to move win material or not win material).
If you can do some small search that lead to conclusion if the bishop is trapped or not trapped then it may be good to do it but the problem is that often you do not get a conclusion by small search in this case.

you can only see by small search that the bishop cannot go out of a7 but also see that the opponent cannot capture it.

If you extend that small search to bigger search then you may not search
other important lines and if you do not do it then you need to have some evaluation for the situation.

It is possible that it is good to do a search of one ply instead of evaluation of leaf position when you may see by the search that the bishop is captured or if the bishop can go Ba7-b8 and stop to be trapped(if there is a big probability that one of these cases happen) but these cases are only part of the cases and you need to have evaluation for other cases because you do not plan to do infinite search.

Uri

bob · Post by **bob** » Sat Feb 09, 2008 9:00 pm

Uri Blass wrote:
hgm wrote:I agree: in 95% of the cases, the evaluation of a feature (e.g. a backward Pawn), can only serve to determine if the program will create it or avoid it. Only in ~5% of the cases it is confronted with a position where it really has the option of 'trading' a weaknes of one kind for a gain of another kind. So just roughly scoring the features solves 95% of the cases.

Horizon effect is the Achilles heel of the search + leave evaluation scheme. But I think it is better to approach that problem directly, then to hide it by fudging the evaluation. So I am thinking more in the direction of coupling evaluation directly to search decisions, rather than indirectly via the score.

E.g. trapped pieces are obviously a bad feature, and penalizing them in the evaluation helps engines to avoid them where there is no need to have them. But whatever the magnitude of the penalty, you will never get satisfactory behavior when it has to weigh the disadvantage of the trapping against other material advantages. The classical examples are BxP on at, followed by b6, and NxR on a8. If you can safely retreat the B or N, the move wins the game, if you can't, it loses. But no matter how you evaluate it, if you don't have the depth to resolve the issue, it remains a blind gamble.

So when your evaluation recognizes such a pattern, rather than just adding some arbitrary amount to the score, it really should ignore it in the numerical value of the score (mostly; a small mobility penalty would be OK), but attach some qualifier to the score "but with trapped piece". Basically this amounts to multidimensional scoring. Minimax should then be somewhat more clever in the "if(score > alpha)" part. Comparison of the numerical part of the score only makes sense if the attached qualifiers are equal (so both scores have the trapped piece, or they haven't). If some moves resolve the trapping one way or the other, but others don't, it would have to determine the maximum of each class of moves separately.

And then in the end it will have to face the tough decision if it will go for the best move that solves the trapping, or that it is going for the best move that doesn't. As you can put an upper limit to the value of the trapping (it loses you at most a Bishop for a Pawn, or a full Knight, respectively), the difference of the numerical part might already be large enough to make the decision. If all moves that resolve the trapping within the search depth cost you the piece, obviously maintaining the trapping without sacrificing other material would be preferrable. But if you can only hang on to the piece by sacrificing a Pawn, but leaving it trapped, the worst-case ending of that branch would score below resolving it at the expense of the piece now. The engine will then realize that it cannot make a sensible choice without further search, and can then re-search the unresolved branch with an extension. Or opt for passing the uncertainty back towards the root first (by picking the unresolved score as 'best'), in the hope that the current branch will be cut off by alpha-beta by a later move, so that it will never be needed to resolve it. If that doesn't work out, the search will eventuall be sent back up into the tree from a lower level, as the root will never be satisfied with unresolved scores.
A pattern like Bxa7 protected pawn of the opponent at b6 may happen many times in the search and not in the root position.

You need to have static evaluation to it(because if you decide to make a search every time that it happens you may spend a lot of time and get smaller depth).

Uri

The trapped bishhop term appears to be pretty critical to me. For two reasons:

(1) others already have it. If you don't, they will leave their a/h pawnns undefended, knowing that a reasonable opponent would not capture them, giving you many opportunities to fall into the trap, where if both programs don't have the term, the one capturing will always do so, and as a result, the one with the pawn will defend it quickly.

(2) I don't want my program to make these captures. Good humans used to exploit most programs in this way, commercial or amateur. I fixed it and suddenly my scores against commercial programs went through the roof because they didn't have it and it became a central theme of almost evry game....

hgm · Post by **hgm** » Mon Feb 11, 2008 11:11 am

Uri Blass wrote:Talking about your first example then qsearch is small search that lead to clear conclusions(if the side to move win material or not win material).
If you can do some small search that lead to conclusion if the bishop is trapped or not trapped then it may be good to do it but the problem is that often you do not get a conclusion by small search in this case.

you can only see by small search that the bishop cannot go out of a7 but also see that the opponent cannot capture it.

If you extend that small search to bigger search then you may not search
other important lines and if you do not do it then you need to have some evaluation for the situation.

Well, if this line would be PV in one of the two possible outcome (Bishop is lost / Bishop can be liberated), it most likely decides the game immediately. So I don't see what line could be more important than that.

This is why I proposed to make it possible to propagate the uncertainty back towards the root. If after searching all moves you have remembered the best one that did resolve the trapping, and the best one that left it on the board, if there is no obvious way to decide which you prefer, you could return the unresolved score (with the flag set that indicates it is unresolved). That way you could first look at some alternatives at the level below, without having spend any axtra search effort yet, to make sure that the uncertain line can be really PV, and how much better or worse the score of the current node will become if the uncertain line would end in the favorable way.

And if the difference is just a small positional score, you could take the average the scores of the good and bad resolution (as 'bad' then means you will go for the alternative). If the difference is decisive (exceeding some relevance threshold) you either do an extended re-search, passing a flag that unresolved scores this time are unacceptable. Or, if this flag was not yet set, you pass the unresolved score back towards the root. That way you have the guarantee that the extended unresolved branch will searched with the narrowest possible window.

As an estimate of how much it will cost you to resolve the evaluation feature, you can take the difference between the score of the best move that resolves it, and that of the best move that doesn't. This would be a good poor man's solution anyway: Just handle it wthout search extension, by picking the move that resolves it, adding a bonus to it in the positional range (e.g. 20cP) for the fact that there exists an uncertain alternative, which might turn out to help on deeper search.

Good Example of Horizon effect in Eval

Re: Good Example of Horizon effect in Eval

Re: Good Example of Horizon effect in Eval

Re: Good Example of Horizon effect in Eval

Re: Good Example of Horizon effect in Eval

Re: Good Example of Horizon effect in Eval

Re: Good Example of Horizon effect in Eval

Re: Good Example of Horizon effect in Eval

Re: Good Example of Horizon effect in Eval