Evaluation of moves from World Championship Games

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Evaluation of moves from World Championship Games

Post by Ferdy »

I tried anlyzing the games from wch games, something similar done by Guid and Bratko using Crafty at fixed depth. I think they did another analysis later with stronger engine.

I used Stockfish 13 dev 2020-12-14 using a movetime of 5s per position on 4 cores with i7-2600K. Analysis starts at move 12 similar to Guid and Bratko but I only stop the analysis when the score of the game move and engine move is above 500cp or below -500cp. In Guid and Bratko they stopped the analysis when score is above 200cp or below -200cp.

Not all matches were analyzed so far but here is the current plot.

Image

Carlsen 2013 is deadly against Anand. But Anand was not really that bad either.

Event: WCh 2013
Date: 2013.11.22

Code: Select all

A. Average error in centipawn (low is better)
+-----------+-----------+---------+-----------------+
|   GameNum |   Carlsen |   Anand |   Carlsen_score |
|-----------+-----------+---------+-----------------|
|         1 |      -1.8 |    10.4 |             0.5 |
|         2 |       3.3 |     2.9 |             0.5 |
|         3 |       6.1 |     5.4 |             0.5 |
|         4 |       5   |     8.1 |             0.5 |
|         5 |       0.9 |    12.3 |             1   |
|         6 |       1.5 |     9.9 |             1   |
|         7 |       1.6 |     2.6 |             0.5 |
|         8 |       0.2 |     0.8 |             0.5 |
|         9 |      11.1 |    16.6 |             1   |
|        10 |       9.7 |     9.9 |             0.5 |
+-----------+-----------+---------+-----------------+

Code: Select all

B. Overall average error in centipawn (low is better)
+-----------+---------+
|   Carlsen |   Anand |
|-----------+---------|
|       3.8 |     7.9 |
+-----------+---------+
Stronger players do not commit error early in the game. Sections C, D and E show this feature.

Code: Select all

C. Move Number with 50 to 100 cp error from a playable position (-50/50) cp, (high or None is better)
+-----------+-----------+---------+-----------------+
|   GameNum | Carlsen   | Anand   |   Carlsen_score |
|-----------+-----------+---------+-----------------|
|         1 | None      | None    |             0.5 |
|         2 | None      | None    |             0.5 |
|         3 | 26        | None    |             0.5 |
|         4 | None      | 18      |             0.5 |
|         5 | None      | 45      |             1   |
|         6 | None      | None    |             1   |
|         7 | None      | None    |             0.5 |
|         8 | None      | None    |             0.5 |
|         9 | 22        | 18      |             1   |
|        10 | None      | 26      |             0.5 |
+-----------+-----------+---------+-----------------+
Game 5, Carlsen won the game, from the playable position of +/-50cp Anand committed a mistake in the range from 50 to 100 cp at move 45!.

Higher error values.

Code: Select all

D. Move Number with 101 to 300 cp error from a playable position (-50/50) cp, (high or None is better)
+-----------+-----------+---------+-----------------+
|   GameNum | Carlsen   | Anand   |   Carlsen_score |
|-----------+-----------+---------+-----------------|
|         1 | None      | None    |             0.5 |
|         2 | None      | None    |             0.5 |
|         3 | None      | None    |             0.5 |
|         4 | None      | 28      |             0.5 |
|         5 | None      | None    |             1   |
|         6 | None      | None    |             1   |
|         7 | None      | None    |             0.5 |
|         8 | None      | None    |             0.5 |
|         9 | 18        | None    |             1   |
|        10 | None      | None    |             0.5 |
+-----------+-----------+---------+-----------------+

Code: Select all

E. Move Number with 301 to 500 cp error from a playable position (-50/50) cp, (high or None is better)
+-----------+-----------+---------+-----------------+
|   GameNum | Carlsen   | Anand   |   Carlsen_score |
|-----------+-----------+---------+-----------------|
|         1 | None      | None    |             0.5 |
|         2 | None      | None    |             0.5 |
|         3 | None      | None    |             0.5 |
|         4 | None      | None    |             0.5 |
|         5 | None      | None    |             1   |
|         6 | None      | 60      |             1   |
|         7 | None      | None    |             0.5 |
|         8 | None      | None    |             0.5 |
|         9 | None      | None    |             1   |
|        10 | None      | None    |             0.5 |
+-----------+-----------+---------+-----------------+
Rating Estimates when Stockfish is set at 2900. Rating estimates are based on the position evaluation after the move of the player and compared with evaluation of the move of Stockfish.

The rating according to Stockfish on the match between Carlsen and Anand is not comparable to the rating in the Capablanca - Lasker match. We only have an idea on the difference in rating between the two players in the match.

Code: Select all

F. Rating according to Stockfish 13 dev NNUE
+-----------+-----------+---------+
|   GameNum |   Carlsen |   Anand |
|-----------+-----------+---------|
|         1 |      2902 |    2879 |
|         2 |      2880 |    2887 |
|         3 |      2792 |    2769 |
|         4 |      2783 |    2713 |
|         5 |      2887 |    2679 |
|         6 |      2837 |    2671 |
|         7 |      2885 |    2853 |
|         8 |      2898 |    2892 |
|         9 |      2802 |    2785 |
|        10 |      2677 |    2652 |
+-----------+-----------+---------+
Sample Capablanca-Lasker match and how their rating differs.

Code: Select all

F. Rating according to Stockfish 13 dev NNUE
+-----------+--------------+----------+
|   GameNum |   Capablanca |   Lasker |
|-----------+--------------+----------|
|         1 |         2744 |     2688 |
|         2 |         2715 |     2724 |
|         3 |         2802 |     2748 |
|         4 |         2852 |     2880 |
|         5 |         2555 |     2652 |
|         6 |         2846 |     2850 |
|         7 |         2843 |     2850 |
|         8 |         2868 |     2864 |
|         9 |         2836 |     2858 |
|        10 |         2629 |     2489 |
|        11 |         2706 |     2567 |
|        12 |         2607 |     2555 |
|        13 |         2859 |     2863 |
|        14 |         2668 |     2548 |
+-----------+--------------+----------+
How Lasker did in that match? Check the tables below.

Event: World Championship 12th
Date: 1921.??.??

Code: Select all

A. Average error in centipawn (low is better)
+-----------+--------------+----------+--------------------+
|   GameNum |   Capablanca |   Lasker |   Capablanca_score |
|-----------+--------------+----------+--------------------|
|         1 |          8.6 |     12.5 |                0.5 |
|         2 |         14.4 |     10.3 |                0.5 |
|         3 |          4   |      6.9 |                0.5 |
|         4 |          4.5 |      2.5 |                0.5 |
|         5 |         22.8 |     17.9 |                1   |
|         6 |          3.2 |      1.9 |                0.5 |
|         7 |          7.7 |      5.9 |                0.5 |
|         8 |          3.7 |      4.3 |                0.5 |
|         9 |         14.3 |      2.7 |                0.5 |
|        10 |         16.1 |     22.6 |                1   |
|        11 |         14.3 |     22   |                1   |
|        12 |         32   |     37.2 |                0.5 |
|        13 |          6.7 |      2.7 |                0.5 |
|        14 |         15.7 |     25.3 |                1   |
+-----------+--------------+----------+--------------------+

B. Overall average error in centipawn (low is better)
+--------------+----------+
|   Capablanca |   Lasker |
|--------------+----------|
|           12 |     12.5 |
+--------------+----------+

Code: Select all

C. Move Number with 50 to 100 cp error from a playable position (-50/50) cp, (high or None is better)
+-----------+--------------+----------+--------------------+
|   GameNum | Capablanca   | Lasker   |   Capablanca_score |
|-----------+--------------+----------+--------------------|
|         1 | None         | 13       |                0.5 |
|         2 | None         | 34       |                0.5 |
|         3 | None         | None     |                0.5 |
|         4 | None         | None     |                0.5 |
|         5 | None         | 15       |                1   |
|         6 | None         | None     |                0.5 |
|         7 | None         | None     |                0.5 |
|         8 | None         | None     |                0.5 |
|         9 | 20           | None     |                0.5 |
|        10 | None         | 17       |                1   |
|        11 | None         | None     |                1   |
|        12 | 12           | 14       |                0.5 |
|        13 | None         | None     |                0.5 |
|        14 | 12           | None     |                1   |
+-----------+--------------+----------+--------------------+

D. Move Number with 101 to 300 cp error from a playable position (-50/50) cp, (high or None is better)
+-----------+--------------+----------+--------------------+
|   GameNum | Capablanca   | Lasker   |   Capablanca_score |
|-----------+--------------+----------+--------------------|
|         1 | None         | 41       |                0.5 |
|         2 | None         | None     |                0.5 |
|         3 | None         | None     |                0.5 |
|         4 | None         | None     |                0.5 |
|         5 | None         | 23       |                1   |
|         6 | None         | None     |                0.5 |
|         7 | None         | None     |                0.5 |
|         8 | None         | None     |                0.5 |
|         9 | None         | None     |                0.5 |
|        10 | None         | None     |                1   |
|        11 | None         | None     |                1   |
|        12 | None         | 20       |                0.5 |
|        13 | None         | None     |                0.5 |
|        14 | None         | 29       |                1   |
+-----------+--------------+----------+--------------------+

E. Move Number with 301 to 500 cp error from a playable position (-50/50) cp, (high or None is better)
+-----------+--------------+----------+--------------------+
|   GameNum | Capablanca   | Lasker   |   Capablanca_score |
|-----------+--------------+----------+--------------------|
|         1 | None         | None     |                0.5 |
|         2 | None         | None     |                0.5 |
|         3 | None         | None     |                0.5 |
|         4 | None         | None     |                0.5 |
|         5 | None         | None     |                1   |
|         6 | None         | None     |                0.5 |
|         7 | None         | None     |                0.5 |
|         8 | None         | None     |                0.5 |
|         9 | None         | None     |                0.5 |
|        10 | None         | None     |                1   |
|        11 | None         | None     |                1   |
|        12 | None         | None     |                0.5 |
|        13 | None         | None     |                0.5 |
|        14 | None         | None     |                1   |
+-----------+--------------+----------+--------------------+
A happy and prosperous new year to all :!:

Analysis are saved in csv file.
KLc
Posts: 140
Joined: Wed Jun 03, 2020 6:46 am
Full name: Kurt Lanc

Re: Evaluation of moves from World Championship Games

Post by KLc »

Very nice. I think it would be even nicer to see the average error of both contenders.
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Evaluation of moves from World Championship Games

Post by Dann Corbit »

I think it would also be interesting to see the results using LC0.
What I would be most keen about is to see if:
1. The shape and ranking of the bar chart stays the same.
2. If positional players like Botvinnik moved up higher (since LC0 is especially good at quiet moves).
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Evaluation of moves from World Championship Games

Post by Ferdy »

KLc wrote: Thu Dec 31, 2020 7:10 pm Very nice. I think it would be even nicer to see the average error of both contenders.
Uploaded some csv files to see average errors of both players in the match.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Evaluation of moves from World Championship Games

Post by Ferdy »

Dann Corbit wrote: Thu Dec 31, 2020 10:39 pm I think it would also be interesting to see the results using LC0.
What I would be most keen about is to see if:
1. The shape and ranking of the bar chart stays the same.
2. If positional players like Botvinnik moved up higher (since LC0 is especially good at quiet moves).
I have limited computer time at the moment. I will just upload a script to analyze the games, and save the analysis so anyone can analyze it with any engine.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Evaluation of moves from World Championship Games

Post by lkaufman »

[quote=Ferdy post_id=878022 time=1609432771 user_id=4054]
I tried anlyzing the games from wch games, something similar done by Guid and Bratko using Crafty at fixed depth. I think they did another analysis later with stronger engine.

I used Stockfish 13 dev 2020-12-14 using a movetime of 5s per position on 4 cores with i7-2600K. Analysis starts at move 12 similar to Guid and Bratko but I only stop the analysis when the score of the game move and engine move is above 500cp or below -500cp. In Guid and Bratko they stopped the analysis when score is above 200cp or below -200cp.

Not all matches were analyzed so far but here is the current plot.

Image

Carlsen 2013 is deadly against Anand. But Anand was not really that bad either.

Event: WCh 2013
Date: 2013.11.22

Code: Select all

A. Average error in centipawn (low is better)
+-----------+-----------+---------+-----------------+
|   GameNum |   Carlsen |   Anand |   Carlsen_score |
|-----------+-----------+---------+-----------------|
|         1 |      -1.8 |    10.4 |             0.5 |
|         2 |       3.3 |     2.9 |             0.5 |
|         3 |       6.1 |     5.4 |             0.5 |
|         4 |       5   |     8.1 |             0.5 |
|         5 |       0.9 |    12.3 |             1   |
|         6 |       1.5 |     9.9 |             1   |
|         7 |       1.6 |     2.6 |             0.5 |
|         8 |       0.2 |     0.8 |             0.5 |
|         9 |      11.1 |    16.6 |             1   |
|        10 |       9.7 |     9.9 |             0.5 |

    The errors are presumably absolute values, since it wouldn't make sense to average differences in score otherwise. So how does Carlsen have a negative 1.8 error rate in game 1? Maybe I'm misunderstanding what you are doing here. In general, your results make sense, they show the winning player in most matches as having the lower error on average (Alekhine-Capablanca shows nearly a tie), and it shows the error margins recently to be much lower than long ago. Perhaps starting at move 15 or so would be more fair to minimize memorized opening theory.
Komodo rules!
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Evaluation of moves from World Championship Games

Post by Ferdy »

lkaufman wrote: Thu Dec 31, 2020 11:56 pm
Ferdy wrote: Thu Dec 31, 2020 5:39 pm I tried anlyzing the games from wch games, something similar done by Guid and Bratko using Crafty at fixed depth. I think they did another analysis later with stronger engine.

I used Stockfish 13 dev 2020-12-14 using a movetime of 5s per position on 4 cores with i7-2600K. Analysis starts at move 12 similar to Guid and Bratko but I only stop the analysis when the score of the game move and engine move is above 500cp or below -500cp. In Guid and Bratko they stopped the analysis when score is above 200cp or below -200cp.

Not all matches were analyzed so far but here is the current plot.

Image

Carlsen 2013 is deadly against Anand. But Anand was not really that bad either.

Event: WCh 2013
Date: 2013.11.22

Code: Select all

A. Average error in centipawn (low is better)
+-----------+-----------+---------+-----------------+
|   GameNum |   Carlsen |   Anand |   Carlsen_score |
|-----------+-----------+---------+-----------------|
|         1 |      -1.8 |    10.4 |             0.5 |
|         2 |       3.3 |     2.9 |             0.5 |
|         3 |       6.1 |     5.4 |             0.5 |
|         4 |       5   |     8.1 |             0.5 |
|         5 |       0.9 |    12.3 |             1   |
|         6 |       1.5 |     9.9 |             1   |
|         7 |       1.6 |     2.6 |             0.5 |
|         8 |       0.2 |     0.8 |             0.5 |
|         9 |      11.1 |    16.6 |             1   |
|        10 |       9.7 |     9.9 |             0.5 |
The errors are presumably absolute values, since it wouldn't make sense to average differences in score otherwise. So how does Carlsen have a negative 1.8 error rate in game 1?
I calculate the error as

Code: Select all

error = engine_move_score - game_move_score
Usually engine_move_score is higher than game_move_score, so the error is positive. When the engine move and game move are the same, error is 0 of course. It happened that in game 1 there is a move by Carlsen that is stronger than the move suggested by Stockfish at 5s/pos on 4 threads on my hardware.

Code: Select all

WCh 2013;2013.11.09;1;Carlsen;Anand;14;r2q1rk1/pp2ppbp/2p3p1/n1Pn1b2/3P4/1QN2NP1/P3PPBP/R1B2RK1 w - -;white;30;b3a3;b3b2;-13;-22;-9;29;1/2-1/2
Data is here.

Code: Select all

WCh 2013;2013.11.09;1;Carlsen;Anand;12;r2q1rk1/pp2ppbp/2p2np1/2P2b2/2nP4/2N2NP1/PB2PPBP/R2Q1RK1 w - -;white;30;b2c1;d1b3;0;0;0;34;1/2-1/2
WCh 2013;2013.11.09;1;Carlsen;Anand;13;r2q1rk1/pp2ppbp/2p3p1/2Pn1b2/2nP4/2N2NP1/P3PPBP/R1BQ1RK1 w - -;white;30;d1b3;d1b3;-1;-1;0;32;1/2-1/2
WCh 2013;2013.11.09;1;Carlsen;Anand;14;r2q1rk1/pp2ppbp/2p3p1/n1Pn1b2/3P4/1QN2NP1/P3PPBP/R1B2RK1 w - -;white;30;b3a3;b3b2;-13;-22;-9;29;1/2-1/2
WCh 2013;2013.11.09;1;Carlsen;Anand;15;r2q1rk1/pp2ppbp/2p3p1/2Pn1b2/2nP4/Q1N2NP1/P3PPBP/R1B2RK1 w - -;white;30;a3b3;a3b3;-17;-17;0;31;1/2-1/2
WCh 2013;2013.11.09;1;Carlsen;Anand;16;r2q1rk1/pp2ppbp/2p3p1/n1Pn1b2/3P4/1QN2NP1/P3PPBP/R1B2RK1 w - -;white;30;b3a3;b3a3;-3;-3;0;31;1/2-1/2
epd: r2q1rk1/pp2ppbp/2p3p1/n1Pn1b2/3P4/1QN2NP1/P3PPBP/R1B2RK1 w - -
Carlsen move: b3a3, score: -13 spov
Engine move: b3b2, score: -22 spov
error: -22 - (-13) or -9
depth: 29
result: 1/2-1/2

In Guid and Bratko, error = abs(engine_move_score - game_move_score).

In my method it incorporates not only the error of the human move but also the error of engine if it turns out that human move is stronger than that of the engine's move. I did not use the backward analysis where the engine will analyze the game from the end and it already has an idea of the move/score. I analyze the position forward, so that the human move is at least comparable to the engine strength both have to figure out what is the best move at the current position. Given a game pos1 with move1, I evaluated move1 by making this move on the board and let the engine analyze it, once done negate the score and set it to score1. In the same situation, I give pos1 to engine and save its bestmove and bestscore. if move1 and bestmove are the same error is zero, otherwise error = bestscore - score1.
Maybe I'm misunderstanding what you are doing here. In general, your results make sense, they show the winning player in most matches as having the lower error on average (Alekhine-Capablanca shows nearly a tie), and it shows the error margins recently to be much lower than long ago. Perhaps starting at move 15 or so would be more fair to minimize memorized opening theory.
The starting move where calculation of errors begins can be any number close to 12 indeed, I just use the one used by Guid and Bratko. The analysis data is available in the repo. Here is the plot at move 15.

Image
jmartus
Posts: 256
Joined: Sun May 16, 2010 2:50 am

Re: Evaluation of moves from World Championship Games

Post by jmartus »

Kramnik in 2000 is most surprising to me he played impressive vs kasparoff
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Evaluation of moves from World Championship Games

Post by lkaufman »

Ferdy wrote: Fri Jan 01, 2021 1:13 am
lkaufman wrote: Thu Dec 31, 2020 11:56 pm
Ferdy wrote: Thu Dec 31, 2020 5:39 pm I tried anlyzing the games from wch games, something similar done by Guid and Bratko using Crafty at fixed depth. I think they did another analysis later with stronger engine.

I used Stockfish 13 dev 2020-12-14 using a movetime of 5s per position on 4 cores with i7-2600K. Analysis starts at move 12 similar to Guid and Bratko but I only stop the analysis when the score of the game move and engine move is above 500cp or below -500cp. In Guid and Bratko they stopped the analysis when score is above 200cp or below -200cp.

Not all matches were analyzed so far but here is the current plot.

Image

Carlsen 2013 is deadly against Anand. But Anand was not really that bad either.

Event: WCh 2013
Date: 2013.11.22

Code: Select all

A. Average error in centipawn (low is better)
+-----------+-----------+---------+-----------------+
|   GameNum |   Carlsen |   Anand |   Carlsen_score |
|-----------+-----------+---------+-----------------|
|         1 |      -1.8 |    10.4 |             0.5 |
|         2 |       3.3 |     2.9 |             0.5 |
|         3 |       6.1 |     5.4 |             0.5 |
|         4 |       5   |     8.1 |             0.5 |
|         5 |       0.9 |    12.3 |             1   |
|         6 |       1.5 |     9.9 |             1   |
|         7 |       1.6 |     2.6 |             0.5 |
|         8 |       0.2 |     0.8 |             0.5 |
|         9 |      11.1 |    16.6 |             1   |
|        10 |       9.7 |     9.9 |             0.5 |
The errors are presumably absolute values, since it wouldn't make sense to average differences in score otherwise. So how does Carlsen have a negative 1.8 error rate in game 1?
I calculate the error as

Code: Select all

error = engine_move_score - game_move_score
Usually engine_move_score is higher than game_move_score, so the error is positive. When the engine move and game move are the same, error is 0 of course. It happened that in game 1 there is a move by Carlsen that is stronger than the move suggested by Stockfish at 5s/pos on 4 threads on my hardware.

Code: Select all

WCh 2013;2013.11.09;1;Carlsen;Anand;14;r2q1rk1/pp2ppbp/2p3p1/n1Pn1b2/3P4/1QN2NP1/P3PPBP/R1B2RK1 w - -;white;30;b3a3;b3b2;-13;-22;-9;29;1/2-1/2
Data is here.

Code: Select all

WCh 2013;2013.11.09;1;Carlsen;Anand;12;r2q1rk1/pp2ppbp/2p2np1/2P2b2/2nP4/2N2NP1/PB2PPBP/R2Q1RK1 w - -;white;30;b2c1;d1b3;0;0;0;34;1/2-1/2
WCh 2013;2013.11.09;1;Carlsen;Anand;13;r2q1rk1/pp2ppbp/2p3p1/2Pn1b2/2nP4/2N2NP1/P3PPBP/R1BQ1RK1 w - -;white;30;d1b3;d1b3;-1;-1;0;32;1/2-1/2
WCh 2013;2013.11.09;1;Carlsen;Anand;14;r2q1rk1/pp2ppbp/2p3p1/n1Pn1b2/3P4/1QN2NP1/P3PPBP/R1B2RK1 w - -;white;30;b3a3;b3b2;-13;-22;-9;29;1/2-1/2
WCh 2013;2013.11.09;1;Carlsen;Anand;15;r2q1rk1/pp2ppbp/2p3p1/2Pn1b2/2nP4/Q1N2NP1/P3PPBP/R1B2RK1 w - -;white;30;a3b3;a3b3;-17;-17;0;31;1/2-1/2
WCh 2013;2013.11.09;1;Carlsen;Anand;16;r2q1rk1/pp2ppbp/2p3p1/n1Pn1b2/3P4/1QN2NP1/P3PPBP/R1B2RK1 w - -;white;30;b3a3;b3a3;-3;-3;0;31;1/2-1/2
epd: r2q1rk1/pp2ppbp/2p3p1/n1Pn1b2/3P4/1QN2NP1/P3PPBP/R1B2RK1 w - -
Carlsen move: b3a3, score: -13 spov
Engine move: b3b2, score: -22 spov
error: -22 - (-13) or -9
depth: 29
result: 1/2-1/2

In Guid and Bratko, error = abs(engine_move_score - game_move_score).

In my method it incorporates not only the error of the human move but also the error of engine if it turns out that human move is stronger than that of the engine's move. I did not use the backward analysis where the engine will analyze the game from the end and it already has an idea of the move/score. I analyze the position forward, so that the human move is at least comparable to the engine strength both have to figure out what is the best move at the current position. Given a game pos1 with move1, I evaluated move1 by making this move on the board and let the engine analyze it, once done negate the score and set it to score1. In the same situation, I give pos1 to engine and save its bestmove and bestscore. if move1 and bestmove are the same error is zero, otherwise error = bestscore - score1.
Maybe I'm misunderstanding what you are doing here. In general, your results make sense, they show the winning player in most matches as having the lower error on average (Alekhine-Capablanca shows nearly a tie), and it shows the error margins recently to be much lower than long ago. Perhaps starting at move 15 or so would be more fair to minimize memorized opening theory.
The starting move where calculation of errors begins can be any number close to 12 indeed, I just use the one used by Guid and Bratko. The analysis data is available in the repo. Here is the plot at move 15.

Image
To me the interesting thing is that the error rate in the matches played before 1980 is nearly double the error rate played since 2010. Of course the standard of play has risen over the past century, but I'm a bit surprised that it is so dramatic as to halve the error rate. Of course error rate depends on the openings played, so it is possible that this is a factor, but I don't think that opening choices have become so much quieter than long ago, maybe a bit so. Finding a way to deal with this is key to properly compare players of different times.
Komodo rules!
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Evaluation of moves from World Championship Games

Post by mwyoung »

lkaufman wrote: Fri Jan 01, 2021 3:16 am
Ferdy wrote: Fri Jan 01, 2021 1:13 am
lkaufman wrote: Thu Dec 31, 2020 11:56 pm
Ferdy wrote: Thu Dec 31, 2020 5:39 pm I tried anlyzing the games from wch games, something similar done by Guid and Bratko using Crafty at fixed depth. I think they did another analysis later with stronger engine.

I used Stockfish 13 dev 2020-12-14 using a movetime of 5s per position on 4 cores with i7-2600K. Analysis starts at move 12 similar to Guid and Bratko but I only stop the analysis when the score of the game move and engine move is above 500cp or below -500cp. In Guid and Bratko they stopped the analysis when score is above 200cp or below -200cp.

Not all matches were analyzed so far but here is the current plot.

Image

Carlsen 2013 is deadly against Anand. But Anand was not really that bad either.

Event: WCh 2013
Date: 2013.11.22

Code: Select all

A. Average error in centipawn (low is better)
+-----------+-----------+---------+-----------------+
|   GameNum |   Carlsen |   Anand |   Carlsen_score |
|-----------+-----------+---------+-----------------|
|         1 |      -1.8 |    10.4 |             0.5 |
|         2 |       3.3 |     2.9 |             0.5 |
|         3 |       6.1 |     5.4 |             0.5 |
|         4 |       5   |     8.1 |             0.5 |
|         5 |       0.9 |    12.3 |             1   |
|         6 |       1.5 |     9.9 |             1   |
|         7 |       1.6 |     2.6 |             0.5 |
|         8 |       0.2 |     0.8 |             0.5 |
|         9 |      11.1 |    16.6 |             1   |
|        10 |       9.7 |     9.9 |             0.5 |
The errors are presumably absolute values, since it wouldn't make sense to average differences in score otherwise. So how does Carlsen have a negative 1.8 error rate in game 1?
I calculate the error as

Code: Select all

error = engine_move_score - game_move_score
Usually engine_move_score is higher than game_move_score, so the error is positive. When the engine move and game move are the same, error is 0 of course. It happened that in game 1 there is a move by Carlsen that is stronger than the move suggested by Stockfish at 5s/pos on 4 threads on my hardware.

Code: Select all

WCh 2013;2013.11.09;1;Carlsen;Anand;14;r2q1rk1/pp2ppbp/2p3p1/n1Pn1b2/3P4/1QN2NP1/P3PPBP/R1B2RK1 w - -;white;30;b3a3;b3b2;-13;-22;-9;29;1/2-1/2
Data is here.

Code: Select all

WCh 2013;2013.11.09;1;Carlsen;Anand;12;r2q1rk1/pp2ppbp/2p2np1/2P2b2/2nP4/2N2NP1/PB2PPBP/R2Q1RK1 w - -;white;30;b2c1;d1b3;0;0;0;34;1/2-1/2
WCh 2013;2013.11.09;1;Carlsen;Anand;13;r2q1rk1/pp2ppbp/2p3p1/2Pn1b2/2nP4/2N2NP1/P3PPBP/R1BQ1RK1 w - -;white;30;d1b3;d1b3;-1;-1;0;32;1/2-1/2
WCh 2013;2013.11.09;1;Carlsen;Anand;14;r2q1rk1/pp2ppbp/2p3p1/n1Pn1b2/3P4/1QN2NP1/P3PPBP/R1B2RK1 w - -;white;30;b3a3;b3b2;-13;-22;-9;29;1/2-1/2
WCh 2013;2013.11.09;1;Carlsen;Anand;15;r2q1rk1/pp2ppbp/2p3p1/2Pn1b2/2nP4/Q1N2NP1/P3PPBP/R1B2RK1 w - -;white;30;a3b3;a3b3;-17;-17;0;31;1/2-1/2
WCh 2013;2013.11.09;1;Carlsen;Anand;16;r2q1rk1/pp2ppbp/2p3p1/n1Pn1b2/3P4/1QN2NP1/P3PPBP/R1B2RK1 w - -;white;30;b3a3;b3a3;-3;-3;0;31;1/2-1/2
epd: r2q1rk1/pp2ppbp/2p3p1/n1Pn1b2/3P4/1QN2NP1/P3PPBP/R1B2RK1 w - -
Carlsen move: b3a3, score: -13 spov
Engine move: b3b2, score: -22 spov
error: -22 - (-13) or -9
depth: 29
result: 1/2-1/2

In Guid and Bratko, error = abs(engine_move_score - game_move_score).

In my method it incorporates not only the error of the human move but also the error of engine if it turns out that human move is stronger than that of the engine's move. I did not use the backward analysis where the engine will analyze the game from the end and it already has an idea of the move/score. I analyze the position forward, so that the human move is at least comparable to the engine strength both have to figure out what is the best move at the current position. Given a game pos1 with move1, I evaluated move1 by making this move on the board and let the engine analyze it, once done negate the score and set it to score1. In the same situation, I give pos1 to engine and save its bestmove and bestscore. if move1 and bestmove are the same error is zero, otherwise error = bestscore - score1.
Maybe I'm misunderstanding what you are doing here. In general, your results make sense, they show the winning player in most matches as having the lower error on average (Alekhine-Capablanca shows nearly a tie), and it shows the error margins recently to be much lower than long ago. Perhaps starting at move 15 or so would be more fair to minimize memorized opening theory.
The starting move where calculation of errors begins can be any number close to 12 indeed, I just use the one used by Guid and Bratko. The analysis data is available in the repo. Here is the plot at move 15.

Image
To me the interesting thing is that the error rate in the matches played before 1980 is nearly double the error rate played since 2010. Of course the standard of play has risen over the past century, but I'm a bit surprised that it is so dramatic as to halve the error rate. Of course error rate depends on the openings played, so it is possible that this is a factor, but I don't think that opening choices have become so much quieter than long ago, maybe a bit so. Finding a way to deal with this is key to properly compare players of different times.
I agree. Opening choices is big factor. And people are not "smarter" today. But knowledge does increase and is past on to the next generation.

And WC matches is most likely not the best way to measure who is best.

Lots of drama, and in the soviet era, not really on the up and up.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.