1. Determining WHICH moves to test
2. Attaching a probability of match for 1 position tested.
3. Using probabilities to test a game, or a tournament.
Point 1 is important as many moves simply play themselves and any reasonable player is going to match those moves. Including them hurts the test. My rule is that Stockfish likes the moves on depth 1-12 and it also matches Houdini's move as well as the players move, it is discarded.
The opening is problematic, the rule I use is not to match any moves before the 15th move - clearly not a perfect rule but given a PGN file of GM games it's difficult to know where the book (or prepared analysis) ends.
I have now processed 11,489 games from the week in chess data. I chose to ONLY test games played between Grandmasters. This comes out to data for 912 Grandmasters. The MEDIAN number of moves available for matching comes out to 136 moves.
I was able to determine that the average match rate for Grandmasters from my data is 45.71%. That is the chance that a GM will play the same move as Houdini (not including the moves I filtered out of course) is nearly 50%.
It's not good enough to use this value in the probability calculation because it's my feeling that certain players may have a style that is naturally more like Houdini's. To make a long story short, the data indicates that this is indeed the case. For a calculation like this one needs to estimate an upper bound on the move matching percentage of the most Houdini-like players. So I looked for the single player than had the highest match percentage. Note that this is a dicey proposition because players you can get reliable statistics from players with few samples. For example there are some players who match Houdini 100% of the time, but there were only 3 or 4 moves to sample.
The value I came up with was 60%. I admit that the value is a bit arbitrary but I believe I was being conservative. Of all the players with at least 250 moves to sample the number one matcher will match 54% percent of the time and there are 253 players meeting that criteria. If I use players who have tiny samples I can get arbitrarily high of course. If I go down to players with only 100 samples I get a value just under 60%.
So now, given a single PGN game or a set of games for a given player I can calculated (using the 60% figure) the probability that they would match Houdini's moves as often as they did if they were not cheating.
To make a long story short (I will elaborate on this later) the 60% does seem to be extremely conservative because I tested it against all the games I processed and only 3 player (out of 912) go below 1% probability.
Let's consider the tournament that was won by Ivanov. In that tournament Ivanov matched 35 out of 52 moves of Houdini 3 in my test. According to the probability calculation A GM would equal that "performance" 17% of the time. It doesn't seem like a very rare event.
If you look at the other players however they are all well over 90% and usually approaching 100% - in other words there match percentage is so low that one would expect to see AT LEAST this match percentage over 90% of the time.
The 60% value I am using in the probability calculation is probably way too conservative for the vast majority of players who would not match Houdini even 50% of the time on their own.
About the probability calculation.
If there is a 50% probability of matching a single move, one can use the binomial probability calculation to determine what the chances are of getting M matches out S samples (or moves.) For example if you have 20 moves and a player matched 15 times, you can calculate the odds of that happening given that each has exactly a 50/50 change of happening. For a test like the one I am proposing here, you want to know the probability that someone will match AT LEAST M times - and can be calculated by summing the probabilities of each possible match frequency greater than or equal to the target number. For example in the 15 out of 20 case you sum the binomial probability of 15 out of 20, 16 out of 20, 17 out of 20 and so on.
But for this test I cannot use the binomial distribution because the expected match rate I am using is not 50%. To avoid the tedious math I simply used a Monte Carlo simulation - which gives me a good estimate. My MC sim will take as input the number of samples and return a table like this:
Code: Select all
20 0.00003 0.003 (chance of matching AT LEAST 20 moves)
19 0.00048 0.052 (chance of matching AT LEAST 19 moves)
18 0.00305 0.357 (chance of matching AT LEAST 18 moves)
17 0.01228 1.585 (chance of matching AT LEAST 17 moves)
16 0.03494 5.079 (chance of matching AT LEAST 16 moves)
15 0.07463 12.542 (chance of matching AT LEAST 15 moves)
14 0.12437 24.979 (chance of matching AT LEAST 14 moves)
13 0.16586 41.565 (chance of matching AT LEAST 13 moves)
12 0.17968 59.533 (chance of matching AT LEAST 12 moves)
11 0.15988 75.521 (chance of matching AT LEAST 11 moves)
10 0.11720 87.241 (chance of matching AT LEAST 10 moves)
9 0.07095 94.336 (chance of matching AT LEAST 9 moves)
8 0.03552 97.889 (chance of matching AT LEAST 8 moves)
7 0.01464 99.353 (chance of matching AT LEAST 7 moves)
6 0.00486 99.839 (chance of matching AT LEAST 6 moves)
5 0.00130 99.969 (chance of matching AT LEAST 5 moves)
4 0.00026 99.995 (chance of matching AT LEAST 4 moves)
3 0.00004 100.000 (chance of matching AT LEAST 3 moves)
2 0.00000 100.000 (chance of matching AT LEAST 2 moves)
1 0.00000 100.000 (chance of matching AT LEAST 1 moves)
0 0.00000 100.000 (chance of matching AT LEAST 0 moves)
1. We don't know where the book ends and the matching should begin.
2. We don't know how to account for the nature of any particular game.
3. Are there any players who would naturally match Houdini more than 60% of the time?
4. Many others .....