Calculating playing strength comparisons

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Calculating playing strength comparisons

Post by sje »

Here's an idea for an objective calculation of playing strength comparisons between players of long ago vs those of today:

First, get a huge collection of PGN games from the past hundred years that includes titled players. For each titled player, calculate the fraction of games played in which the player missed a forced mate in N where N is from, say, one to five moves. Come up with a formula that assigns demerit based on frequency and distance, and use this to rank all titled players. Find the midpoint of each titled player's career span and compare the time weighted average of missed mate statistics.

I'll guess that this would give an answer to the question of how much better today's players are compared to those of the Old Days.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Calculating playing strength comparisons

Post by Laskos »

sje wrote:Here's an idea for an objective calculation of playing strength comparisons between players of long ago vs those of today:

First, get a huge collection of PGN games from the past hundred years that includes titled players. For each titled player, calculate the fraction of games played in which the player missed a forced mate in N where N is from, say, one to five moves. Come up with a formula that assigns demerit based on frequency and distance, and use this to rank all titled players. Find the midpoint of each titled player's career span and compare the time weighted average of missed mate statistics.

I'll guess that this would give an answer to the question of how much better today's players are compared to those of the Old Days.
Bad idea. You would have very few statistical data, and there are complex issues about the K-factor in Elo's formula, a formula which actually is worse than linear Sonas one on empirical grounds. The present FIDE rating could be both inflationary or deflationary, depending on the rating of the newly entering players (which usually, or sometimes at least, progress, eating too much or too few of higher rated players rating points). The USCF rating was inflationary a time ago, but they tried to correct this. I tried to do Monte Carlo analysis of FIDE rating, but there are so much unknown factors not publicly available (for example how is the progress of the newly entered rated players) that I am unable to solve the problem if the FIDE rating is inflationary or deflationary. Again, not a good idea for me, as you would have a very subjective, poor on statistical grounds formula.

Kai
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Calculating playing strength comparisons

Post by sje »

Well, my idea ignores any K coefficient along with the whole Elo formula and all FIDE ratings. It only looks at how frequently forced mates are missed.

The remaining question concerns the (negative) correlation between missing mates and playing strength. This could be tested in a number of ways, but first we would need the missed mates statistics.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Calculating playing strength comparisons

Post by Laskos »

sje wrote:Well, my idea ignores any K coefficient along with the whole Elo formula and all FIDE ratings. It only looks at how frequently forced mates are missed.

The remaining question concerns the (negative) correlation between missing mates and playing strength. This could be tested in a number of ways, but first we would need the missed mates statistics.
The correlation would probably be negative indeed on average, but you will have a poor statistics and subjective (for example oriented to more combinative players) result.

Kai
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Calculating playing strength comparisons

Post by bob »

sje wrote:Well, my idea ignores any K coefficient along with the whole Elo formula and all FIDE ratings. It only looks at how frequently forced mates are missed.

The remaining question concerns the (negative) correlation between missing mates and playing strength. This could be tested in a number of ways, but first we would need the missed mates statistics.
You'd also need accurate timing information as missing forced mates becomes more frequent in a time scramble, which is not that uncommon in GM games.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Calculating playing strength comparisons

Post by sje »

bob wrote:You'd also need accurate timing information as missing forced mates becomes more frequent in a time scramble, which is not that uncommon in GM games.
Yes; time controls were generally longer in the Old Days. Adornments or lack thereof are another difference. However, I still believe that a missed mates metric would be useful.
james uselton

Re: Calculating playing strength comparisons

Post by james uselton »

I dont know if anyone has ever done this but what about checking the great games of the past on modern machines. Has anyone ran the Immortal game through Rybka? Or, the Evergreen game---Lasker-Capablanca, St. Pete 1914---Byrne-Fischer, game of the century etc.

I recall in 1999 John Nunn ran Carlsbad 1911 througha fritz 5 and came to the conclusion that the tournament average was 2100. It was controversial at the time.
User avatar
hgm
Posts: 28388
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Calculating playing strength comparisons

Post by hgm »

Why limit the testing to mates? They are rare, and humans do not minimize DTM. If a Human converts to KRK to have a sure win in 16 moves, rather than going for a mate-in-10 in KQRKQ, he is to be considered smart for this, rather than having missed a mate-in-10.

I think a much better test would be to valuate all their moves, and compare the score Rybka gives after the move to the score of the move Rybka considers best. This can probably be one at very fast TC, s you will average out errors anyway over the large set of moves. Just make a histogram of tee amount of score lost or gained in each move, subdivided in moves near equality and at a decisive advantage / disadvantage, and perhaps also subdivided by game stage. Then do a principal-component analysis of these histograms, and try linear regression of the rating of modern players on these components.
Dann Corbit
Posts: 12792
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Calculating playing strength comparisons

Post by Dann Corbit »

I have done it. Unfortunately, the best move is to avoid the sacrifice.
Sure makes the game a lot less beautiful.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Calculating playing strength comparisons

Post by Laskos »

hgm wrote:Why limit the testing to mates? They are rare, and humans do not minimize DTM. If a Human converts to KRK to have a sure win in 16 moves, rather than going for a mate-in-10 in KQRKQ, he is to be considered smart for this, rather than having missed a mate-in-10.

I think a much better test would be to valuate all their moves, and compare the score Rybka gives after the move to the score of the move Rybka considers best. This can probably be one at very fast TC, s you will average out errors anyway over the large set of moves. Just make a histogram of tee amount of score lost or gained in each move, subdivided in moves near equality and at a decisive advantage / disadvantage, and perhaps also subdivided by game stage. Then do a principal-component analysis of these histograms, and try linear regression of the rating of modern players on these components.
Even your proposal is hard to quantify into a rating, I tried that for Karpov-Kasparov matches, they are full of small "errors" just because the engine doesn't recognize a plan, and after a plethora of small "errors" (especially in the endgames) from both sides the engine sees that it was not so bad after all. I would try to find evident blunders (threshold 0.50 or so) which are not so uncommon (usually of the order 1-2 per game in high quality games) and try to quantify this into a rating.
Even these "evident" blunders sometimes are not fatal, for example just a longer, but more "humane" (and according to a plan) way of mating the opponent. What I saw is that there are more in Capablanca-Alekhine games than in Karpov-Kasparov case. Then you have to fit it to Elo ratings, and I am not sure that it is a linear regression (could be on small intervals). It seems easier to me just to check if the FIDE Elo ratings are inflationary or not. The general opinion is that they are, but I am not convinced.

Kai