Did Topalov have a win in game 10 against Anand?

afzzq · Post by **afzzq** » Sat May 15, 2010 11:59 pm

I am interested in whether Topalov could have won the endgame in game 10 of his recent match versus Anand, where he had two bishops against Anand's bishop and knight.

Specifically, consider the position after Anand's 34'th move:

[d]8/p1n2k1p/1p1b2p1/3P4/6P1/3B1K1P/P7/2B5 w - - 7 35

Here, computers seem to suggest 35 h4! with an edge of about .8 pawns (Stockfish depth 34). Instead, Topalov played 35 Ke4 and draw.

But an edge of ".8" doesn't really tell us whether White can actually win. Maybe black sets up a fortress.

(1) Is there any computer analysis that can tell us whether White can win in the given position, and

(2) What's the best program for answering this kind of question?

LaurenceChen · Post by **LaurenceChen** » Sun May 16, 2010 3:04 am

Just looking at the position... it is easy to see it is a draw with best play from both sides. The engine is wrong, even with .8, it doesn't translate to a win... it means that White has the edge ... but it doesn't mean it is a win for White...

yanquis1972 · Post by **yanquis1972** » Sun May 16, 2010 3:29 am

interesting, when you say 'computers' do you mean multiple engines? zappa also gives a +.8 score:

New game
8/p1n2k1p/1p1b2p1/3P4/6P1/3B1K1P/P7/2B5 w - - 0 1

Analysis by Zappa Mexico II:

1.a4 Ne8 2.g5 Ng7 3.Bd2 Nh5 4.Be3 Ke7
+/- (0.73) Depth: 10/20 00:00:01 328kN
1.g5 Bh2 2.Ba3 Ne8 3.h4 Nd6 4.Bb2 Kg8
+/- (0.76) Depth: 10/20 00:00:02 737kN
1.g5 Bh2 2.Ba3 Ne8 3.h4 Nd6 4.Bb2
+/- (0.76) Depth: 10/20 00:00:02 888kN
1.g5 Ke7 2.Ke4 Ne8 3.Bb2 Bh2 4.Bc3 Nd6+
+/- (0.72) Depth: 11/20 00:00:02 1095kN
1.Be3 Ne8 2.g5 Bh2 3.Bd4 Nd6 4.Bc3 Ke7
+/- (0.75) Depth: 11/22 00:00:02 1822kN
1.Be3 Ne8 2.g5 Bh2 3.Bd4 Nd6 4.Bc3 Ke7
+/- (0.75) Depth: 11/22 00:00:03 2370kN
1.Be3 Ne8 2.g5 Ng7 3.Bd2 Nh5 4.Ke4 Ke7 5.Bc3
+/= (0.62) Depth: 12/23 00:00:03 3107kN
1.Bd2 Ne8 2.g5 Ng7 3.a4 Nh5 4.Ba6 Ng3 5.a5
+/= (0.69) Depth: 12/25 00:00:05 5197kN
1.Ke4 Ne8 2.g5 Ke7 3.Bb2 Bh2 4.Bc3 Nd6+ 5.Kf3 Kd7
+/- (0.78) Depth: 12/25 00:00:05 5467kN
1.Ke4 Ne8 2.g5 Ke7 3.Bb2 Bh2 4.Bc3 Nd6+ 5.Kf3 Kd7
+/- (0.78) Depth: 12/25 00:00:05 5667kN
1.Ke4 Ne8 2.g5 Ke7 3.Bb2 Bh2 4.Bc3 Nd6+ 5.Kf3 Nf7 6.Bf6+
+/- (0.77) Depth: 13/25 00:00:05 6065kN
1.Ke4 Ne8 2.g5 Ke7 3.Bb2 Bh2 4.Bc3 Nd6+ 5.Kf3 Nf7 6.Bf6+
+/- (0.77) Depth: 13/25 00:00:06 6858kN
1.Ke4 Ne8 2.g5 Ke7 3.Bb2 Bh2 4.Ke3 Kd6 5.Bc4 b5 6.Bxb5
+/- (0.74) Depth: 14/26 00:00:07 8081kN
1.Ke4 Ne8 2.g5 Ke7 3.Bb2 Bh2 4.Ke3 Kd6 5.Bc4 b5 6.Bxb5
+/- (0.74) Depth: 14/26 00:00:07 9538kN
1.Ke4 Ne8 2.g5 Ke7 3.Bb2 Bh2 4.Ba3+ Nd6+ 5.Kf3 Be5 6.h4 Kd7 7.h5
+/= (0.68) Depth: 15/30 00:00:10 13242kN
1.Ke4 Ne8 2.g5 Ke7 3.Bb2 Bh2 4.Ba3+ Nd6+ 5.Kf3 Be5 6.h4 Kd7 7.h5
+/= (0.68) Depth: 15/30 00:00:15 21022kN
1.Ke4 Ne8 2.g5 Ke7 3.Bb2 Bh2 4.Ba3+ Nd6+ 5.Kf3 Be5 6.h4 a5 7.h5 Kf7 8.hxg6+ hxg6
+/= (0.67) Depth: 16/33 00:00:19 27059kN
1.a4 Ne8 2.g5 Ng7 3.Bd2 Nh5 4.Bf1 Ng3 5.Bc4 Nf5 6.Bc3 Ke7 7.Bd3 Ng3
+/= (0.69) Depth: 16/33 00:00:34 42712kN
1.a4 Ne8 2.g5 Ng7 3.Bd2 Nh5 4.Bf1 Ng3 5.Bc4 Nf5 6.Bc3 Ke7 7.Bd3 Ng3
+/= (0.69) Depth: 16/33 00:00:34 49023kN
1.a4 Ne8 2.g5 Ng7 3.Bd2 Nf5 4.Bc3 Bc7 5.Ke4 Ng3+ 6.Ke3 Nf5+ 7.Bxf5 gxf5 8.Bb4 Be5
+/- (0.71) Depth: 17/33 00:00:46 65922kN
1.a4 Ne8 2.g5 Ng7 3.Bd2 Nf5 4.Bc3 Bc7 5.Ke4 Ng3+ 6.Ke3 Nf5+ 7.Bxf5 gxf5 8.Bb4 Be5
+/- (0.71) Depth: 17/33 00:00:53 76770kN
1.a4 Ne8 2.g5 Ng7 3.Bd2 Ke7 4.a5 Nf5 5.axb6 axb6 6.Bc3 Bc5 7.Be5 Nd6 8.Bf6+ Kd7 9.h4
+/- (0.74) Depth: 18/35 00:01:42 148mN
1.h4 Ne8 2.h5 gxh5 3.gxh5 Nf6 4.h6 Be5 5.Bg5 Bd6 6.a4 a6 7.Bf5 b5 8.axb5 axb5 9.Bxf6 Kxf6 10.Bxh7
+/- (0.79) Depth: 18/35 00:02:55 270mN
1.h4 Ne8 2.h5 gxh5 3.gxh5 Nf6 4.h6 Be5 5.Bg5 Bd6 6.a4 a6 7.Bf5 b5 8.axb5 axb5 9.Bxf6 Kxf6 10.Bxh7
+/- (0.79) Depth: 18/35 00:02:56 271mN
1.h4 Ne8 2.h5 gxh5 3.gxh5 Nf6 4.h6 Ke7 5.Bg5 Kf7 6.Bf5 Be5 7.Bc1 Ke7 8.Ba3+ Bd6 9.Bb2 Ng8 10.Bc1 Nf6 11.a3
+/- (0.79) Depth: 19/45 00:03:37 337mN, tb=3
1.h4 Ne8 2.h5 gxh5 3.gxh5 Nf6 4.h6 Ke7 5.Bg5 Kf7 6.Bf5 Be5 7.Bc1 Ke7 8.Ba3+ Bd6 9.Bb2 Ng8 10.Bc1 Nf6 11.a3
+/- (0.79) Depth: 19/45 00:03:42 342mN, tb=3
1.h4 Ne8 2.h5 Nf6 3.hxg6+ hxg6 4.g5 Nd7 5.Bf4 Ne5+ 6.Ke4 Nxd3 7.Bxd6 Nc5+ 8.Kd4 Nd7 9.a4 Ke8 10.Bb4 Kf7 11.Ba3 a6 12.Bb4 b5 13.a5 Ke8
+/- (0.81) Depth: 20/55 00:07:31 706mN, tb=5
1.h4 Ne8 2.h5 Nf6 3.hxg6+ hxg6 4.g5 Nd7 5.Bf4 Ne5+ 6.Ke4 Nxd3 7.Bxd6 Nc5+ 8.Kd4 Nd7 9.a4 Ke8 10.Bb4 Kf7 11.Ba3 a6 12.Bb4 b5 13.a5 Ke8
+/- (0.81) Depth: 20/55 00:07:43 716mN, tb=5

as for which engines would give the most accurate evaluation, my guesses would be shredder, and zappa on big hardware with lots and lots of hash.

metax · Post by **metax** » Sun May 16, 2010 1:55 pm

A Stockfish eval of +0.8 is a pretty small edge, maybe like a +0.3 evaluation of Rybka.

Tord Romstad · Post by **Tord Romstad** » Sun May 16, 2010 2:30 pm

yanquis1972 wrote:as for which engines would give the most accurate evaluation, my guesses would be shredder, and zappa on big hardware with lots and lots of hash.

There is no such thing as a "most accurate evaluation function". Trying to compare the accuracy of different programs' evaluation functions, or comparing the scores returned for a particular position (like the one discussed in this thread) indicates a misunderstanding of the purpose of the evaluation function.

The evaluation function is a tool to make the program select good moves, when used along with an efficient search. In order to work well, the evaluation function should satisfy the following two criterions, of which the first is far more important than the second:

It should be good at determining which of two similar positions is better.
It should be good at determining which side has the advantage.

The word "similar" is emphasized in the first criterion, because it's extremely important: Most of the positions a program encounters during a search are somewhat similar, and in order to select a good move, the program has to be able to decide which positions in the tree are more favorable. Being able to decide which of two unrelated positions (like the position discussed in the current thread and an early opening position from the Benko gambit) is better is not at all important, because the program will never have to choose between two such entirely different positions.

A score of +0.8 therefore doesn't just mean different things when displayed by two different programs, it doesn't even mean the same when displayed by a single program in two unrelated positions. It only means that the program thinks the side with the plus score has the advantage. This brings us to the second of my two criterions: The program should be able to judge which side is better. This is important only because it needs to know whether it should force a draw, if given the chance.

If three programs A, B and C return scores of +0.3, +0.5 and +1.0 from the given position after a deep search, this by itself tells us nothing about the quality or accuracy of their evaluation functions for the given type of position. They all think white is better, and will try to improve the position while avoiding a draw when playing white, and to improve the position while forcing a draw if possible when playing black. They won't resign with either color. It is possible that one of the programs evaluate the position better than the others, but in order to find out, you'll have to run test matches against a variety of opponents from the given position, or (if you are a sufficiently strong player) analyse interactively with the computer and examine the moves and lines it suggests, and see how it responds to reasonable alternatives to the moves it suggests.

By now, some readers probably think I'm forgetting that chess programs are not only used to play games, but also for analysis. I'm not forgetting this, but I'm telling you that you shouldn't make any conclusions just based on seeing the score returned by the search (unless there is a forced mate). You should always use the program interactively, running it in analysis mode while trying out the moves it suggests and various alternatives, and use your own judgment in addition to the program's evaluations. This will give you a much better understanding of the position, and is also much more fun.

kranium · Post by **kranium** » Sun May 16, 2010 2:45 pm

Tord Romstad wrote:
yanquis1972 wrote:as for which engines would give the most accurate evaluation, my guesses would be shredder, and zappa on big hardware with lots and lots of hash.
There is no such thing as a "most accurate evaluation function". Trying to compare the accuracy of different programs' evaluation functions, or comparing the scores returned for a particular position (like the one discussed in this thread) indicates a misunderstanding of the purpose of the evaluation function.

The evaluation function is a tool to make the program select good moves, when used along with an efficient search. In order to work well, the evaluation function should satisfy the following two criterions, of which the first is far more important than the second:

It should be good at determining which of two similar positions is better.

It should be good at determining which side has the advantage.

The word "similar" is emphasized in the first criterion, because it's extremely important: Most of the positions a program encounters during a search are somewhat similar, and in order to select a good move, the program has to be able to decide which positions in the tree are more favorable. Being able to decide which of two unrelated positions (like the position discussed in the current thread and an early opening position from the Benko gambit) is better is not at all important, because the program will never have to choose between two such entirely different positions.

A score of +0.8 therefore doesn't just mean different things when displayed by two different programs, it doesn't even mean the same when displayed by a single program in two unrelated positions. It only means that the program thinks the side with the plus score has the advantage. This brings us to the second of my two criterions: The program should be able to judge which side is better. This is important only because it needs to know whether it should force a draw, if given the chance.

If three programs A, B and C return scores of +0.3, +0.5 and +1.0 from the given position after a deep search, this by itself tells us nothing about the quality or accuracy of their evaluation functions for the given type of position. They all think white is better, and will try to improve the position while avoiding a draw when playing white, and to improve the position while forcing a draw if possible when playing black. They won't resign with either color. It is possible that one of the programs evaluate the position better than the others, but in order to find out, you'll have to run test matches against a variety of opponents from the given position, or (if you are a sufficiently strong player) analyse interactively with the computer and examine the moves and lines it suggests, and see how it responds to reasonable alternatives to the moves it suggests.

By now, some readers probably think I'm forgetting that chess programs are not only used to play games, but also for analysis. I'm not forgetting this, but I'm telling you that you shouldn't make any conclusions just based on seeing the score returned by the search (unless there is a forced mate). You should always use the program interactively, running it in analysis mode while trying out the moves it suggests and various alternatives, and use your own judgment in addition to the program's evaluations. This will give you a much better understanding of the position, and is also much more fun.

yes..
and if (hypothetically speaking) empirical values for each and every known chess position actually did exist
(for ex: in some future 'year 3000' gargantuan database...which IMO will occur only if technology survivies the coming environmental holocaust?:shock:),
or if these empirical values could be generated upon demand by some future 'super AI clustered entity' or 'big brother' (government, boy that's scary! The Matrix comes to mind)...
or, if capitalism survives...you could log in online and pay Chessbase to see if your eval was accurate or not?
in these scenarios it would be possible to measure any program's resulting position evaluation estimate against this known metric...
otherwise it's all pretty relative, as Tord points out.

of course in the meantime, one can hypothetically say the "most accurate evaluation" meaning compared to future values which actually don't yet exist...

kranium · Post by **kranium** » Sun May 16, 2010 3:28 pm

wait..
i take that back!

not year 3000
with the speed and progress of computer tech, i predict chess will be 'solved', just like DNA sequencing for all life forms on earth (talk about 'clones'?!), etc. before 2200?
(i.e. in the next 190 years - 2, maybe 3 generations!) is that too optimistic?
no! even sooner i think!

yanquis1972 · Post by **yanquis1972** » Sun May 16, 2010 3:39 pm

i feel like i have a lot to say about this, but perhaps it is too early for me to phrase it well...nevertheless you mention mate positions; there are also drawn positions...i would regard the most accurate evaluation as the one pointing nearest draw, if the position is truly drawn. but regardless of whether or not such a thing exists, the main point for me -- & perhaps a more accurate, if very understated, phrasing -- would be that there do exist more 'convenient' evaluations; ie ones that are more consistent & easier to interpret.

i'm a big stockfish fan but to be perfectly blunt this is why i'm hoping in the long run komodo et al outpace you guys, provided you don't or can't change the way your engine outputs its evaluations...it's quite messy & requires more work than should be necessary. you talk about interacting with a position, but i think engines that provide immediate & 'easy to interpret' *cough*accurate*cough* evaluations allow for more of this. i have only just begun using stockfish as a major analysis partner but i find i often wait longer than average to get agrasp of what it actually thinks of the position, and/or switch to mpv to get a better understanding of how the evaluation relates to other moves...

re your example, the engine that evaluates a best move as +1 and the engine that evaluates the same move as best as +.3 are not part & parcel...one's funky and one isn't...one is typically (in my view it should be always) interpreted by the program's human chess partner as +- and one as +/= or so. if you don't want to call this accuracy, fine, but it strikes me as a heavily pedantic argument. it certainly matters. i do realize in the endgame this importance is minimized to a pretty significant degree, & opening & EG evals aren't really equivalent within a program...i would prefer an engine that reads a static +1.00 in a drawn endgame position to one that evaluates quite a bit lower but is in flux...the latter may objectively be closer to the truth of the position, but the former is easier to interpret, & that is a big deal for me & i would assume most users...

anyway this is basically a loving rant from me to you in hopes that you guys will get your evals better tuned for analysis. and no there is no convincing me its all relative and subjective and theres no such thing as better.

kranium · Post by **kranium** » Sun May 16, 2010 3:49 pm

yanquis1972 wrote:
re your example, the engine that evaluates a best move as +1 and the engine that evaluates the same move as best as +.3 are not part & parcel...one's funky and one isn't...

i believe Stockfish scales the eval results according to game phase and other criteria, in an effort to obtain more granularity
(of course, i don't know the true motivation, here only Tord can explain for sure...i do know he's been using it for many years)
thus eval results (numbers) are definitely 'magnified' in comparison to most other engines...

yes here it is:

enum ScaleFactor {
SCALE_FACTOR_ZERO = 0,
SCALE_FACTOR_NORMAL = 64,
SCALE_FACTOR_MAX = 128,
SCALE_FACTOR_NONE = 255
};

inline Value apply_scale_factor(Value v, ScaleFactor f) {
return Value((v * f) / int(SCALE_FACTOR_NORMAL));
}

Tord Romstad · Post by **Tord Romstad** » Sun May 16, 2010 4:29 pm

Hi John,

I wasn't writing about Stockfish, but about how to interpret engine evaluations in general, but since you ask about Stockfish, I'll write a few words.

yanquis1972 wrote:i feel like i have a lot to say about this, but perhaps it is too early for me to phrase it well...nevertheless you mention mate positions; there are also drawn positions...i would regard the most accurate evaluation as the one pointing nearest draw, if the position is truly drawn.

That would be "accurate" in a theoretical sense, but would make the program weak and passive in practice. All theoretically drawn positions are not equal. There is a very wide variety in how difficult theoretically drawn positions are to defend, and in order to maximize the chances of winning against a non-perfect opponent, the program needs a wide variety of scores to evaluate drawn positions.

but regardless of whether or not such a thing exists, the main point for me -- & perhaps a more accurate, if very understated, phrasing -- would be that there do exist more 'convenient' evaluations; ie ones that are more consistent & easier to interpret.

i'm a big stockfish fan but to be perfectly blunt this is why i'm hoping in the long run komodo et al outpace you guys, provided you don't or can't change the way your engine outputs its evaluations...it's quite messy & requires more work than should be necessary. you talk about interacting with a position, but i think engines that provide immediate & 'easy to interpret' *cough*accurate*cough* evaluations allow for more of this. i have only just begun using stockfish as a major analysis partner but i find i often wait longer than average to get agrasp of what it actually thinks of the position, and/or switch to mpv to get a better understanding of how the evaluation relates to other moves...

But this (in Stockfish) is not about evaluation at all, but about search. To be more precise, it's about search inconsistencies caused by tiny aspiration windows and massive forward pruning in non-PV nodes. In layman terms, Stockfish's search is optimized for maximizing the chances of selecting the best move rather than on finding a precise score for the best move. It's a bit lazy and sloppy; when one move appears to be at least as good as all the others, Stockfish prefers to spends it thinking time searching a little deeper rather than on computing a more precise score.

It is possible that we will introduce a few new UCI parameters for configuring forward pruning and aspiration windows, for users who want to sacrifice strength and speed for stability.

Did Topalov have a win in game 10 against Anand?

Did Topalov have a win in game 10 against Anand?

Re: Did Topalov have a win in game 10 against Anand?

Re: Did Topalov have a win in game 10 against Anand?

Re: Did Topalov have a win in game 10 against Anand?

Re: Did Topalov have a win in game 10 against Anand?

Re: Did Topalov have a win in game 10 against Anand?

Re: Did Topalov have a win in game 10 against Anand?

Re: Did Topalov have a win in game 10 against Anand?

Re: Did Topalov have a win in game 10 against Anand?

Re: Did Topalov have a win in game 10 against Anand?