Here's an idea for an objective calculation of playing strength comparisons between players of long ago vs those of today:
First, get a huge collection of PGN games from the past hundred years that includes titled players. For each titled player, calculate the fraction of games played in which the player missed a forced mate in N where N is from, say, one to five moves. Come up with a formula that assigns demerit based on frequency and distance, and use this to rank all titled players. Find the midpoint of each titled player's career span and compare the time weighted average of missed mate statistics.
I'll guess that this would give an answer to the question of how much better today's players are compared to those of the Old Days.
Calculating playing strength comparisons
Moderator: Ras
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Calculating playing strength comparisons
Bad idea. You would have very few statistical data, and there are complex issues about the K-factor in Elo's formula, a formula which actually is worse than linear Sonas one on empirical grounds. The present FIDE rating could be both inflationary or deflationary, depending on the rating of the newly entering players (which usually, or sometimes at least, progress, eating too much or too few of higher rated players rating points). The USCF rating was inflationary a time ago, but they tried to correct this. I tried to do Monte Carlo analysis of FIDE rating, but there are so much unknown factors not publicly available (for example how is the progress of the newly entered rated players) that I am unable to solve the problem if the FIDE rating is inflationary or deflationary. Again, not a good idea for me, as you would have a very subjective, poor on statistical grounds formula.sje wrote:Here's an idea for an objective calculation of playing strength comparisons between players of long ago vs those of today:
First, get a huge collection of PGN games from the past hundred years that includes titled players. For each titled player, calculate the fraction of games played in which the player missed a forced mate in N where N is from, say, one to five moves. Come up with a formula that assigns demerit based on frequency and distance, and use this to rank all titled players. Find the midpoint of each titled player's career span and compare the time weighted average of missed mate statistics.
I'll guess that this would give an answer to the question of how much better today's players are compared to those of the Old Days.
Kai
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: Calculating playing strength comparisons
Well, my idea ignores any K coefficient along with the whole Elo formula and all FIDE ratings. It only looks at how frequently forced mates are missed.
The remaining question concerns the (negative) correlation between missing mates and playing strength. This could be tested in a number of ways, but first we would need the missed mates statistics.
The remaining question concerns the (negative) correlation between missing mates and playing strength. This could be tested in a number of ways, but first we would need the missed mates statistics.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Calculating playing strength comparisons
The correlation would probably be negative indeed on average, but you will have a poor statistics and subjective (for example oriented to more combinative players) result.sje wrote:Well, my idea ignores any K coefficient along with the whole Elo formula and all FIDE ratings. It only looks at how frequently forced mates are missed.
The remaining question concerns the (negative) correlation between missing mates and playing strength. This could be tested in a number of ways, but first we would need the missed mates statistics.
Kai
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Calculating playing strength comparisons
You'd also need accurate timing information as missing forced mates becomes more frequent in a time scramble, which is not that uncommon in GM games.sje wrote:Well, my idea ignores any K coefficient along with the whole Elo formula and all FIDE ratings. It only looks at how frequently forced mates are missed.
The remaining question concerns the (negative) correlation between missing mates and playing strength. This could be tested in a number of ways, but first we would need the missed mates statistics.
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: Calculating playing strength comparisons
Yes; time controls were generally longer in the Old Days. Adornments or lack thereof are another difference. However, I still believe that a missed mates metric would be useful.bob wrote:You'd also need accurate timing information as missing forced mates becomes more frequent in a time scramble, which is not that uncommon in GM games.
Re: Calculating playing strength comparisons
I dont know if anyone has ever done this but what about checking the great games of the past on modern machines. Has anyone ran the Immortal game through Rybka? Or, the Evergreen game---Lasker-Capablanca, St. Pete 1914---Byrne-Fischer, game of the century etc.
I recall in 1999 John Nunn ran Carlsbad 1911 througha fritz 5 and came to the conclusion that the tournament average was 2100. It was controversial at the time.
I recall in 1999 John Nunn ran Carlsbad 1911 througha fritz 5 and came to the conclusion that the tournament average was 2100. It was controversial at the time.
-
- Posts: 28390
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Calculating playing strength comparisons
Why limit the testing to mates? They are rare, and humans do not minimize DTM. If a Human converts to KRK to have a sure win in 16 moves, rather than going for a mate-in-10 in KQRKQ, he is to be considered smart for this, rather than having missed a mate-in-10.
I think a much better test would be to valuate all their moves, and compare the score Rybka gives after the move to the score of the move Rybka considers best. This can probably be one at very fast TC, s you will average out errors anyway over the large set of moves. Just make a histogram of tee amount of score lost or gained in each move, subdivided in moves near equality and at a decisive advantage / disadvantage, and perhaps also subdivided by game stage. Then do a principal-component analysis of these histograms, and try linear regression of the rating of modern players on these components.
I think a much better test would be to valuate all their moves, and compare the score Rybka gives after the move to the score of the move Rybka considers best. This can probably be one at very fast TC, s you will average out errors anyway over the large set of moves. Just make a histogram of tee amount of score lost or gained in each move, subdivided in moves near equality and at a decisive advantage / disadvantage, and perhaps also subdivided by game stage. Then do a principal-component analysis of these histograms, and try linear regression of the rating of modern players on these components.
-
- Posts: 12792
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Calculating playing strength comparisons
I have done it. Unfortunately, the best move is to avoid the sacrifice.
Sure makes the game a lot less beautiful.
Sure makes the game a lot less beautiful.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Calculating playing strength comparisons
Even your proposal is hard to quantify into a rating, I tried that for Karpov-Kasparov matches, they are full of small "errors" just because the engine doesn't recognize a plan, and after a plethora of small "errors" (especially in the endgames) from both sides the engine sees that it was not so bad after all. I would try to find evident blunders (threshold 0.50 or so) which are not so uncommon (usually of the order 1-2 per game in high quality games) and try to quantify this into a rating.hgm wrote:Why limit the testing to mates? They are rare, and humans do not minimize DTM. If a Human converts to KRK to have a sure win in 16 moves, rather than going for a mate-in-10 in KQRKQ, he is to be considered smart for this, rather than having missed a mate-in-10.
I think a much better test would be to valuate all their moves, and compare the score Rybka gives after the move to the score of the move Rybka considers best. This can probably be one at very fast TC, s you will average out errors anyway over the large set of moves. Just make a histogram of tee amount of score lost or gained in each move, subdivided in moves near equality and at a decisive advantage / disadvantage, and perhaps also subdivided by game stage. Then do a principal-component analysis of these histograms, and try linear regression of the rating of modern players on these components.
Even these "evident" blunders sometimes are not fatal, for example just a longer, but more "humane" (and according to a plan) way of mating the opponent. What I saw is that there are more in Capablanca-Alekhine games than in Karpov-Kasparov case. Then you have to fit it to Elo ratings, and I am not sure that it is a linear regression (could be on small intervals). It seems easier to me just to check if the FIDE Elo ratings are inflationary or not. The general opinion is that they are, but I am not convinced.
Kai