Draw positions misevaluated by SF

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Draw positions misevaluated by SF

Post by Lyudmil Tsvetkov »

Again, just to repeat what is very important: look at all 6 positions the IM sent; a general rule is that in all 6 positions all pawns have enemy pawns on the same or adjacent files.
I would exclude initially 4 vs 3 though from scaling, as uncertainty increases.
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Draw positions misevaluated by SF

Post by lucasart »

Erik Kislik sent me an invitation on skype, asking me if I was involved with the Stockfish project. I thought: wtf is he? i don't skype with strangers ;)

Now I understand. Anyway, I prefer to keep a distance with end-users, and comunicate by email. Skype is too intrusive; I use it only for family and friends.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
Uri Blass
Posts: 10297
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Draw positions misevaluated by SF

Post by Uri Blass »

mwyoung wrote:
Uri Blass wrote:
mwyoung wrote:
mcostalba wrote: 1. If Stockfish can't show 0.00 in the above positions, what scores should it show?

2. If one version of Komodo gives extremely low scores in endgames like the ones above but is within 20 or 30 elo points of the current version, can you send it to me instead of merely throwing it out? A special endgame engine would be very practical for human players and might end up finding a lot of very useful ideas for opening analysis and when analyzing one's own games
The more I read this post the funnier it becomes. Yes all programs misevaluated positions. And not only in the endgames. There is only 3 evaluations that are truly accurate in chess. In any position 1-0, 0-1, or =.

Anything else is a misevaluation, but does a .32 to 1.2 evaluation by an engine mean the engine is claiming a win in that position. I would say No.

It would be nice if we could get a perfect evaluation in a 12 man endgame or beyond. But that is beyond the engines abilities in many cases. As the engine would need to search to a drawn position in the given positions by the IM. This can not be done in all cases with just a static evaluation with 100% certainty. There are always exceptions.

A computer evaluation is a guide for the engine to hopefully find a winning path, or to avoid a losing one. It is not a absolute evaluation of a given position, unless the search can reach the end of the game.

What the IM is asking for is a 12+ man tablebase, and that is Ronald de Man's department. :)
Nם

The IM only wants that the engine is going to evaluate as 0.00 positions that humans know that they are drawn.

There are many positions that humans do not know if it is a win for white and a draw and in these cases he has no objection to see 0.32 or 1.2
That can be problematic. "Positions that the humans know that they are drawn". The IM is giving positions with 12 men. The problem is humans may think something is a draw. When it is not. There were 5,6, and 7 man positions that humans thought were drawn, but tablebases showed they were wins.

Good luck with that...humans that are thinking that something is true is not proof. If humans can not get every 5, 6, or 7 man position correct. Why would I think humans can get every 10, 11 or 12 man position correct.

I said "know" and not "think".

I agree that there is a problem when humans claim that they know and they are wrong but there are positions when humans can be 100% sure that it is a draw and the computer does not know

I am not sure about all the positions that the IM gave but I can easily give different position that I am sure about the result even with more than 12 pieces on the board.

[D]4k3/p1p1p1p1/PpPpPpP1/1P1P1P2/8/8/8/1B2K3 w - - 0 1

In this diagram the chess rules say that it is a draw because there is no way to give mate by a sequence of chess moves.

If you want to see an example from humans game then look at
Petrosian vs. Hazai from
http://en.wikipedia.org/wiki/Fortress_(chess)


It is possible that some strong humans know and not only think about more positions that I know.

A test to see if some IM really know may be to give him to look at many chess games and tell us the position that he knows that they are draws.

Later ask top engines to play these positions against themselves.
If top engines always get draw in 1000 positions that they did not evaluate as a draw then I think that you can trust his evaluation.

The evaluation of course should consider the weakness of engines and the IM should never say draw if he thinks that the engine can blunder and in these specific cases he should say only draw if I play against engines and later play games against engines and show that he can draw.
BeyondCritics
Posts: 396
Joined: Sat May 05, 2012 2:48 pm
Full name: Oliver Roese

Re: Draw positions misevaluated by SF

Post by BeyondCritics »

[d] 8/8/2bkp3/2p2p2/2P1n3/3K4/R3P3/4B3 b - - 0 1
The master has shown us a position, there the exchange for a pawn is somehow outwashed due to some side conditions. Please don't take this position as typical, this will backfire.
Just think about what will happen, if white manage to exchange his bad bishop against the black knight. In this case, black faces a very difficult fight for the draw, if at all.
These positions are difficult to evaluate. To come down down to the truth you have to conduct deep analysis.

Ideally the current position should be evaluated very close to a draw, but it is not dead for white.

[d] 8/2n5/1kp5/pp6/1nP5/1PK4B/3R4/8 w - - 0 1
I can only repeat: Normally if you have a pawn for the exchange, prepare for defeat. This is an exception to the rule.

[d] 8/4b1k1/5pn1/4p1p1/4P3/1BB2P2/6P1/6K1 w - - 0 1
In this position, the master has set up a position with big, stable positional advantage for white: pair of bishop and glaring black field weaknesses on g4,f5. Seemingly he expected, that the engine would give big bonuses for that, but he funny thing is, that stockfish seems not to appreciate its possibilities. So white could play 1.Kh2!? with the idea Kh3-g4-(h5). A weak player could loose against this plan, but of course this is no match for stockfish.
The position should be drawn under top engines and ideally stockfish should show that.
I got the impression, that stockfish does not take into account enough field weaknesses and the pair of bishops, but this could prove wrong.

Compare:
[d]8/3b2k1/5pn1/4p1p1/4P3/1BB2P2/6P1/6K1 w - - 0 1
In the position above, black should have zero difficulties, the static eval should be lower for white than before. Still the black pawns can be "critized" due to their inflexibility

[d]8/3bn1k1/5pp1/4p3/4P3/1BB2P2/6P1/6K1 w - - 0 1
No pawn weaknesses, so it is dead now.


[d]8/p3b1k1/5pn1/4p1p1/P3P3/1BB2P2/6P1/6K1 w - - 0 1
Now white should really expect a slight advantage due to the pair of bishops, but without search you cannot say much how big it really is.The eval should be higher.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Draw positions misevaluated by SF

Post by Lyudmil Tsvetkov »

Again, it does not matter if the pawns are on one of the wings, or in the center. On the wings it is safer, as less opportunities for penetration, but central pawns also almost always guarantee a draw.

[d]6k1/3r4/3pp3/8/8/2PPP3/1R6/6K1 w - - 0 1
Easy draw. It would be wrong for an engine to think white has 1 pawn advantage here, maybe 50cps. You understand, 1 pawn advantage means the engine says white is winning, but it is not.

[d]6k1/8/3ppn2/8/8/2PPPN2/8/6K1 w - - 0 1
Same here, white is not winning, so that more than 50cps white edge would be wrong.

[d]5bk1/8/3ppn2/8/8/2PPPN2/8/4B1K1 w - - 0 1
2 pieces change nothing, still draw.

[d]5bk1/8/4pn2/8/8/3PPN2/8/4B1K1 w - - 0 1
Even safer draw with fewer pawns.

So that, as a rule, I would scale down all endings with equal material and 3 vs 2, 2 vs 1 and 1 vs 0 pawns, when the pawn span is small, i.e. each pawn has an enemy pawn on the same file or on an adjacent file.

Programers are laughing at that, but when a chess player reaches such a position and the engine says 'white has big advantage, +150cps' , the chess player says: well, stupid.
BeyondCritics
Posts: 396
Joined: Sat May 05, 2012 2:48 pm
Full name: Oliver Roese

Re: Draw positions misevaluated by SF

Post by BeyondCritics »

Lyudmil Tsvetkov wrote:
mcostalba wrote:I have received this interesting email by an international master from US named Erik Kislik, I have asked him permission to repost on talckhess to share with a broader audience. He kindly acknwoledged.

------------------------------------------------------------------------

I will send you 6 positions that I think are almost certainly objectively drawn, but are misevaluated by Stockfish (and other top engines). In this sense, this is practically the only area in which strong humans have an edge over computers currently (recognizing which positions are drawn and which ones are not). I will retest all of the positions here on June 1st Stockfish and mention the results below.

Position 1: Move: 1. ...Bb7 Score: .80 Depth: 40

Reason why I believe it's drawn: A strong grandmaster (Davor Palo - 2562 FIDE) proposed this position to me and told me that it's drawn. I agree with him, in view of the fact that White's pawns are weak and will remain weak forever. His king is limited in mobility and his bishop has nothing in particular to win, being dominated by the knight. Positions like this occur fairly often in analysis of one's games or openings, where the PV line ends up just being the computer repeating moves and still giving a high score. Black's minor pieces are also ideally placed and if he doesn't lose material, he won't lose (since he will not get checkmated). The lack of king safety worry (lack of mate) also plays a role here.

Position 2: Move: 1. Bg4 Score: 1.20 Depth: 38

Reason why I believe it's drawn: With only one pawn left on the board after the inevitable exchange of b5 for c4, we reach a position where again, assuming Black doesn't lose material by force, he won't be mated and he'll draw comfortably. I suspect that Black's knights form a fortress within which the Black king is unapproachable. I invented this position because this is the worst possible configuration of minor pieces for black, and the best possible configuration of rook and minor piece for White, yet even here I believe this is drawn. In the main line of Stockfish's PV here, it has White exchanging his bishop for a black knight on a6, and thereby evaluates the resulting rook vs. knight endgame as +1. If it were actually winning, don't you think it would give a score well above +1 for such a pure ending?

Position 3: Move: 1. Kg2 Score: .32 Depth: 46

Reason why I believe it's drawn: In view of the symmetrical 3 vs. 3 position on the kingside and the limited material left on board it is extremely unlikely Black will lose the base of his pawn chain (f6), and extremely likely he will hold the position comfortably. The White king also has no way to "come in" and do anything. I would have figured a score more like +.1 would make more sense here, since I certainly don't expect White's winning chances to be any greater than 10%.

Position 4: Move: 1. Rg5 Score: .64 Depth: 34

Reason why I believe it's drawn: In view of the fact that White has no specific plan or idea besides playing Rg5 and b4-b5 (which is met by ...Nb8, capturing on b5 twice and ...Nc6 with equality), White has nothing in particular to do, so the score is rather vague and meaningless and in the PV White mostly just moves around randomly and hopes for something to happen. With solid symmetrcial positions like this for Black with no obvious weaknesses or reasons you should suffer, most strong human players would just assume it's an easy draw, regardless of the engine score.

Position 5: Move: 1. Ne4 Score: .87 Depth: 40

Reason why I believe it's drawn: Even with a pawn and the move, White has no obvious reason why he should have any winning chances. His only mobile pawn is on a3, and when it advances, it will just lead to more pawn exchanges. Eventually if he exchanges 2 more sets of pawns, giving up the bishop will just lead to a rook + knight vs. rook easily drawn ending. Black doesn't have to do a whole lot here besides shuffle his rook and make sure he doesn't blunder a pawn.

Position 6: Move: 1. cxd5 Score: .99 Depth: 36

Reason why I believe it's drawn: Here we have another position where none of Black's weaknesses can be attacked, both of his minor pieces appear to be perfectly reasonably placed and White's h3 bishop is out of play with nothing in particular to do. Most likely one or two pairs of pawns will be exchanged in the near future and we'll reach a more pure endgame which will be even closer to a clear draw that can be proven. Perhaps rook vs. minor piece endgames (with a few pawns) which are drawn are not understood well by Stockfish. In any case, I think most strong human players would be convinced that this position is drawn for Black due to the semi-fortress and the solidity of his pieces, combined with the fact that he's not getting mated, the h3 bishop has no obvious targets, the White king cannot come in, and White's pawns may easily become weak if he tries to activate his king somehow.

I believe that if you run some simulations with decent depth, all of these positions will be drawn in practically every game, if not every game. Even if they don't draw in exactly every game, it still proves my point that the realistic winning chances are greatly exaggerated by the scores.

Note that in all of these positions, there were few pawns on the board, and those pawns were almost always easily defensible and almost always symmetrical with Black's structure (although positions 1, 2, and 6 are not exactly symmetrical).

Larry Kaufman has stated that tests to lower scores in endings such as the ones above have not tested out, but the best one produced only a -1 elo result. So there are a few relevant questions here for Stockfish:

1. If Stockfish can't show 0.00 in the above positions, what scores should it show?

2. If one version of Komodo gives extremely low scores in endgames like the ones above but is within 20 or 30 elo points of the current version, can you send it to me instead of merely throwing it out? A special endgame engine would be very practical for human players and might end up finding a lot of very useful ideas for opening analysis and when analyzing one's own games
Hi Marco.

I told you that already, but also even more so from now on, most persons using SF will be chessplayers, so you should get accustomed to paying more attention to their concerns. 90% of SF users are chess players of different category.

Thanks to Kim for posting the diagrams. I think Mark is right that the engine has a job a bit different from what humans expect, but also Uri is right in claiming that drawn positions should be evaluated as drawn. In the case of the 6 positions posted, many are irrelevant as there is too much material, besides Kim's analysis shows SF more or less gets it right. What you can realistically do is scale down eval in a couple of cases with material close to insufficient for a win:

1. scale down by some multiplier all positions with 3 vs 2 pawns only on a,b,c or f,g,h files, as in positions 2 and 5. Almost all such positions are practically draws, so the engine should not show +1.5/2 scores and more

2. scale down by some multiplier all positions with less pawns on only a,b,c or f,g,h files, when one side has only 1 pawn more; for example, 2vs 2, 2 vs 1; of course, nonpawn material should not be very different, equal or just an exchange more

3. scale down all positions with just 3 vs 3 pawns all 2 vs 2 pawns or 1 vs 1 pawn, when all pawns are symmetrical in terms of files, i.e. all pawns are opposed; such pawns might be in the center, as in one of the diagrams above

I think Shane had a nice scaling patch, you should just elaborate on it so that is scores and behaves more successful.

The positions that the IM sent however are not a tragedy, and Kim evals show that, I would not be bothered too much from SF evals there. I would be bothered however by SF evals in the below 3 diagrams:

[d]5rk1/6p1/7p/8/8/4R2P/5PP1/6K1 w - - 0 1
SF does not have a right to show here more than 50cps white edge. Scores of more than 150cps are simply ridiculous to chess players, as all such endgames are drawn.

[d]5rk1/6p1/5q1p/8/6Q1/4R2P/5PP1/6K1 w - - 0 1
Nothing changes if you add queens. Simply 3 vs 2 pawns on f,g,h is always drawn. Please note that games like this frequently occur in TCEC, for example SF-Hannibal from an earlier round, and it is funny when SF gives +2pawns for white. This has also big repercussions on strength, as SF has chosen this drawn endgame instead of deviating into a won position with a lower score

[d]6k1/6p1/5n1p/8/8/4N2P/5PP1/6K1 w - - 0 1
Same, fully drawn. SF has no right to show more than 50cps here.
No this is not the same. Winning chances are markedly higher here. Just remove the g-pawn on both sides and suddenly white wins.
You should have mentioned here the "Botvinnik rule".
And one position from a recent game of mine against SF5:

[d]8/8/1B6/7p/4k1p1/3r2P1/5PK1/8 b - - 0 1
Again, this is easy draw, but SF as Louis showed gives more than 2 pawns black edge. Again, the key is material very low, close to insufficient, especially pawn material, and also, please note that, all pawns have enemy pawns on the same or adjacent files.

[d]8/8/8/5Rp1/3qk1P1/7P/6K1/8 b - - 0 1
I think you also remember this one from a recent TCEC game against Komodo. It is simple draw, but SF gives more than 4 pawns black edge. Again, key is that pawn material is very low, just 2 vs 1 pawns, and all pawns have enemy pawns on the same or adjacent files, i.e. no passers, candidates, etc. that could change things.

So if you ask me what to realsitically do, I would say:

scale down all endgames with 3 vs 2 pawns, 2 vs 1 pawns, 2 vs 2 pawns and 1 vs 1 pawn by a multiplier of 0.5, when

1. non-pawn material is equal, or just B vs N
2. all pawns have enemy pawns on the same or adjacent files, i.e. no passers and candidates, small pawn span

3. each side has 2 or less pieces nonpawn material; this is important, because otherwise it gets too complicated; however, practically I think just 1 and 2 are sufficient

I think this comprises all f,g,h and a,b,c most frequent cases, but also cases with few remaining pawns in the center. I think probability that this rules fails somewhere is less than 5%, so pretty sure to apply.

You could also, if you wish so, scale down by a smaller multiplier, say 0.2, all endgames with the above conditions about pawns also when material is not equal, but say one side has more material, like an exchange more or Q vs R, with fortress probabilities, but I would not vouch how safe it is to do that in general.

So my advice would be to just scale by 0.5 with equal nonpawn material. This will increase Sf strength, but most importantly, make it a muh more positional engine chess players will love.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Draw positions misevaluated by SF

Post by Lyudmil Tsvetkov »

One last condition, very important: with 3 vs 2 the weaker side should not have doubled or isolated pawns.

[d]5rk1/6p1/6p1/8/8/4R2P/5PP1/6K1 w - - 0 1

[d]5rk1/5p1p/8/8/8/4R2P/5PP1/6K1 w - - 0 1

Actually, when you have just 2 pawns and they are doubled, they are also isolated by definition. So that only not having isolated pawns is important, but only with 3 vs 2. With 2 vs 1, the pawn of the weaker side is isolated by default.

So that again, no easy way around, you should simply specify in order to be correct.

Finally, I would do the following:

scale down by 0.5 all endings where

1. nonpawn material is equal
2. it is 1 vs 0, 2 vs 1 or 3 vs 2 pawns; also of course, 1 vs 1, 2 vs 2 and 3 vs 3 pawns
3. all pawns have at least one enemy pawn on the same or on an adjacent file; or even better, programmers like that, all pawns are within 3 consecutive files, i.e. on a,b,c, c,d,e or f,g,h.
4. with 3 vs 2 only, the weaker side does not have isolated pawns

I think this also has major implications in terms of strength, but how to do it as simple as possible?
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Draw positions misevaluated by SF

Post by Lyudmil Tsvetkov »

BeyondCritics wrote:[d] 8/8/2bkp3/2p2p2/2P1n3/3K4/R3P3/4B3 b - - 0 1
The master has shown us a position, there the exchange for a pawn is somehow outwashed due to some side conditions. Please don't take this position as typical, this will backfire.
Just think about what will happen, if white manage to exchange his bad bishop against the black knight. In this case, black faces a very difficult fight for the draw, if at all.
These positions are difficult to evaluate. To come down down to the truth you have to conduct deep analysis.
.
[d][d] 8/4b1k1/5pn1/4p1p1/4P3/1BB2P2/6P1/6K1 w - - 0 1
The key to this is the small pawn span. Pawn weaknesses, apart from isolated pawns, really do not matter. Bacward pawn are unimportant. The defending side has enough resources simply because nonpawn material would be equal or almost and pawn material would be very close to insufficient. Bigger pawn span and available passer already changes things dramatically.

Black draws easily by playing Bc5, Ne7, Kh6 and all penetration squares, f5,h5, are solidly guarded. But showing 50cps advantage is not a problem, the problem is when you show more than a pawn and fail to convert what you claim to be an easy win.

[d][d]8/p3b1k1/5pn1/4p1p1/P3P3/1BB2P2/6P1/6K1 w - - 0 1

This should be easily won for white. Again, the pawn span is decisive. That is why it is not possible to scale with few pawns, but just with few pawns when the pawn span is small, all pawns within 3 consecutive files. Adding a 4th file, although consecutive, already adds big uncertainties.

The suggestion is to develop a bit the idea of insufficient material. For example, R vs R or B vs N is draw, but it is almost the same practically when you have 3 or less pawns per side and all pawns are within very small pawn span: there are simply insufficient resources to make use of any existing advantage.
BeyondCritics
Posts: 396
Joined: Sat May 05, 2012 2:48 pm
Full name: Oliver Roese

Re: Draw positions misevaluated by SF

Post by BeyondCritics »

Lyudmil Tsvetkov wrote: [d]8/p3b1k1/5pn1/4p1p1/P3P3/1BB2P2/6P1/6K1 w - - 0 1

This should be easily won for white.
Really? I think i would play Bc5, Ne7-c8-d6 and then just move the king to the centre, waitong for the easy win of white, maybe forever...
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Draw positions misevaluated by SF

Post by Lyudmil Tsvetkov »

BeyondCritics wrote:
Lyudmil Tsvetkov wrote: [d]8/p3b1k1/5pn1/4p1p1/P3P3/1BB2P2/6P1/6K1 w - - 0 1

This should be easily won for white.
Really? I think i would play Bc5, Ne7-c8-d6 and then just move the king to the centre, waitong for the easy win of white, maybe forever...
Excellent plan, but the problem is it is white's turn to move now and after Ng6-e7 white plays Be6, controlling the c8 square, so that the knight can never go there.