Cheat testing

Don · Post by **Don** » Sun Jul 28, 2013 3:38 pm

I have put some work into my "cheat testing" tool and have done some analysis on some of its properties. I'll probably post a more technical description later but the basic idea is 3 fold:

1. Determining WHICH moves to test
2. Attaching a probability of match for 1 position tested.
3. Using probabilities to test a game, or a tournament.

Point 1 is important as many moves simply play themselves and any reasonable player is going to match those moves. Including them hurts the test. My rule is that Stockfish likes the moves on depth 1-12 and it also matches Houdini's move as well as the players move, it is discarded.

The opening is problematic, the rule I use is not to match any moves before the 15th move - clearly not a perfect rule but given a PGN file of GM games it's difficult to know where the book (or prepared analysis) ends.

I have now processed 11,489 games from the week in chess data. I chose to ONLY test games played between Grandmasters. This comes out to data for 912 Grandmasters. The MEDIAN number of moves available for matching comes out to 136 moves.

I was able to determine that the average match rate for Grandmasters from my data is 45.71%. That is the chance that a GM will play the same move as Houdini (not including the moves I filtered out of course) is nearly 50%.

It's not good enough to use this value in the probability calculation because it's my feeling that certain players may have a style that is naturally more like Houdini's. To make a long story short, the data indicates that this is indeed the case. For a calculation like this one needs to estimate an upper bound on the move matching percentage of the most Houdini-like players. So I looked for the single player than had the highest match percentage. Note that this is a dicey proposition because players you can get reliable statistics from players with few samples. For example there are some players who match Houdini 100% of the time, but there were only 3 or 4 moves to sample.

The value I came up with was 60%. I admit that the value is a bit arbitrary but I believe I was being conservative. Of all the players with at least 250 moves to sample the number one matcher will match 54% percent of the time and there are 253 players meeting that criteria. If I use players who have tiny samples I can get arbitrarily high of course. If I go down to players with only 100 samples I get a value just under 60%.

So now, given a single PGN game or a set of games for a given player I can calculated (using the 60% figure) the probability that they would match Houdini's moves as often as they did if they were not cheating.

To make a long story short (I will elaborate on this later) the 60% does seem to be extremely conservative because I tested it against all the games I processed and only 3 player (out of 912) go below 1% probability.

Let's consider the tournament that was won by Ivanov. In that tournament Ivanov matched 35 out of 52 moves of Houdini 3 in my test. According to the probability calculation A GM would equal that "performance" 17% of the time. It doesn't seem like a very rare event.

If you look at the other players however they are all well over 90% and usually approaching 100% - in other words there match percentage is so low that one would expect to see AT LEAST this match percentage over 90% of the time.

The 60% value I am using in the probability calculation is probably way too conservative for the vast majority of players who would not match Houdini even 50% of the time on their own.

About the probability calculation.

If there is a 50% probability of matching a single move, one can use the binomial probability calculation to determine what the chances are of getting M matches out S samples (or moves.) For example if you have 20 moves and a player matched 15 times, you can calculate the odds of that happening given that each has exactly a 50/50 change of happening. For a test like the one I am proposing here, you want to know the probability that someone will match AT LEAST M times - and can be calculated by summing the probabilities of each possible match frequency greater than or equal to the target number. For example in the 15 out of 20 case you sum the binomial probability of 15 out of 20, 16 out of 20, 17 out of 20 and so on.

But for this test I cannot use the binomial distribution because the expected match rate I am using is not 50%. To avoid the tedious math I simply used a Monte Carlo simulation - which gives me a good estimate. My MC sim will take as input the number of samples and return a table like this:

Code: Select all

 20    0.00003        0.003 &#40;chance of matching AT LEAST 20 moves&#41;
  19    0.00048        0.052 &#40;chance of matching AT LEAST 19 moves&#41;
  18    0.00305        0.357 &#40;chance of matching AT LEAST 18 moves&#41;
  17    0.01228        1.585 &#40;chance of matching AT LEAST 17 moves&#41;
  16    0.03494        5.079 &#40;chance of matching AT LEAST 16 moves&#41;
  15    0.07463       12.542 &#40;chance of matching AT LEAST 15 moves&#41;
  14    0.12437       24.979 &#40;chance of matching AT LEAST 14 moves&#41;
  13    0.16586       41.565 &#40;chance of matching AT LEAST 13 moves&#41;
  12    0.17968       59.533 &#40;chance of matching AT LEAST 12 moves&#41;
  11    0.15988       75.521 &#40;chance of matching AT LEAST 11 moves&#41;
  10    0.11720       87.241 &#40;chance of matching AT LEAST 10 moves&#41;
   9    0.07095       94.336 &#40;chance of matching AT LEAST  9 moves&#41;
   8    0.03552       97.889 &#40;chance of matching AT LEAST  8 moves&#41;
   7    0.01464       99.353 &#40;chance of matching AT LEAST  7 moves&#41;
   6    0.00486       99.839 &#40;chance of matching AT LEAST  6 moves&#41;
   5    0.00130       99.969 &#40;chance of matching AT LEAST  5 moves&#41;
   4    0.00026       99.995 &#40;chance of matching AT LEAST  4 moves&#41;
   3    0.00004      100.000 &#40;chance of matching AT LEAST  3 moves&#41;
   2    0.00000      100.000 &#40;chance of matching AT LEAST  2 moves&#41;
   1    0.00000      100.000 &#40;chance of matching AT LEAST  1 moves&#41;
   0    0.00000      100.000 &#40;chance of matching AT LEAST  0 moves&#41;

Are there problems and issues with this methodology? Of course there is! I can name a few:

1. We don't know where the book ends and the matching should begin.

2. We don't know how to account for the nature of any particular game.

3. Are there any players who would naturally match Houdini more than 60% of the time?

4. Many others .....

zullil · Post by **zullil** » Mon Jul 29, 2013 3:30 am

Don wrote: But for this test I cannot use the binomial distribution because the expected match rate I am using is not 50%. To avoid the tedious math I simply used a Monte Carlo simulation - which gives me a good estimate.

The binomial probability distribution doesn't require p=1/2.

Your Monte Carlo simulation seems fine. Here are the actual numbers for your example (n=20, p=.6), at least according to Mathematica:

Code: Select all

20	0.00365616
19	0.0524049
18	0.361147
17	1.59612
16	5.0952
15	12.5599
14	25.0011
13	41.5893
12	59.5599
11	75.5337
10	87.2479
9	94.3474
8	97.8971
7	99.3534
6	99.8388
5	99.9683
4	99.9953
3	99.9995
2	100.
1	100.
0	100

mwyoung · Post by **mwyoung** » Mon Jul 29, 2013 8:37 am

Don wrote:I have put some work into my "cheat testing" tool and have done some analysis on some of its properties. I'll probably post a more technical description later but the basic idea is 3 fold:

1. Determining WHICH moves to test
2. Attaching a probability of match for 1 position tested.
3. Using probabilities to test a game, or a tournament.

Point 1 is important as many moves simply play themselves and any reasonable player is going to match those moves. Including them hurts the test. My rule is that Stockfish likes the moves on depth 1-12 and it also matches Houdini's move as well as the players move, it is discarded.

The opening is problematic, the rule I use is not to match any moves before the 15th move - clearly not a perfect rule but given a PGN file of GM games it's difficult to know where the book (or prepared analysis) ends.

I have now processed 11,489 games from the week in chess data. I chose to ONLY test games played between Grandmasters. This comes out to data for 912 Grandmasters. The MEDIAN number of moves available for matching comes out to 136 moves.

I was able to determine that the average match rate for Grandmasters from my data is 45.71%. That is the chance that a GM will play the same move as Houdini (not including the moves I filtered out of course) is nearly 50%.

It's not good enough to use this value in the probability calculation because it's my feeling that certain players may have a style that is naturally more like Houdini's. To make a long story short, the data indicates that this is indeed the case. For a calculation like this one needs to estimate an upper bound on the move matching percentage of the most Houdini-like players. So I looked for the single player than had the highest match percentage. Note that this is a dicey proposition because players you can get reliable statistics from players with few samples. For example there are some players who match Houdini 100% of the time, but there were only 3 or 4 moves to sample.

The value I came up with was 60%. I admit that the value is a bit arbitrary but I believe I was being conservative. Of all the players with at least 250 moves to sample the number one matcher will match 54% percent of the time and there are 253 players meeting that criteria. If I use players who have tiny samples I can get arbitrarily high of course. If I go down to players with only 100 samples I get a value just under 60%.

So now, given a single PGN game or a set of games for a given player I can calculated (using the 60% figure) the probability that they would match Houdini's moves as often as they did if they were not cheating.

To make a long story short (I will elaborate on this later) the 60% does seem to be extremely conservative because I tested it against all the games I processed and only 3 player (out of 912) go below 1% probability.

Let's consider the tournament that was won by Ivanov. In that tournament Ivanov matched 35 out of 52 moves of Houdini 3 in my test. According to the probability calculation A GM would equal that "performance" 17% of the time. It doesn't seem like a very rare event.

If you look at the other players however they are all well over 90% and usually approaching 100% - in other words there match percentage is so low that one would expect to see AT LEAST this match percentage over 90% of the time.

The 60% value I am using in the probability calculation is probably way too conservative for the vast majority of players who would not match Houdini even 50% of the time on their own.

About the probability calculation.

If there is a 50% probability of matching a single move, one can use the binomial probability calculation to determine what the chances are of getting M matches out S samples (or moves.) For example if you have 20 moves and a player matched 15 times, you can calculate the odds of that happening given that each has exactly a 50/50 change of happening. For a test like the one I am proposing here, you want to know the probability that someone will match AT LEAST M times - and can be calculated by summing the probabilities of each possible match frequency greater than or equal to the target number. For example in the 15 out of 20 case you sum the binomial probability of 15 out of 20, 16 out of 20, 17 out of 20 and so on.

But for this test I cannot use the binomial distribution because the expected match rate I am using is not 50%. To avoid the tedious math I simply used a Monte Carlo simulation - which gives me a good estimate. My MC sim will take as input the number of samples and return a table like this:
Code: Select all
 20    0.00003        0.003 &#40;chance of matching AT LEAST 20 moves&#41;
  19    0.00048        0.052 &#40;chance of matching AT LEAST 19 moves&#41;
  18    0.00305        0.357 &#40;chance of matching AT LEAST 18 moves&#41;
  17    0.01228        1.585 &#40;chance of matching AT LEAST 17 moves&#41;
  16    0.03494        5.079 &#40;chance of matching AT LEAST 16 moves&#41;
  15    0.07463       12.542 &#40;chance of matching AT LEAST 15 moves&#41;
  14    0.12437       24.979 &#40;chance of matching AT LEAST 14 moves&#41;
  13    0.16586       41.565 &#40;chance of matching AT LEAST 13 moves&#41;
  12    0.17968       59.533 &#40;chance of matching AT LEAST 12 moves&#41;
  11    0.15988       75.521 &#40;chance of matching AT LEAST 11 moves&#41;
  10    0.11720       87.241 &#40;chance of matching AT LEAST 10 moves&#41;
   9    0.07095       94.336 &#40;chance of matching AT LEAST  9 moves&#41;
   8    0.03552       97.889 &#40;chance of matching AT LEAST  8 moves&#41;
   7    0.01464       99.353 &#40;chance of matching AT LEAST  7 moves&#41;
   6    0.00486       99.839 &#40;chance of matching AT LEAST  6 moves&#41;
   5    0.00130       99.969 &#40;chance of matching AT LEAST  5 moves&#41;
   4    0.00026       99.995 &#40;chance of matching AT LEAST  4 moves&#41;
   3    0.00004      100.000 &#40;chance of matching AT LEAST  3 moves&#41;
   2    0.00000      100.000 &#40;chance of matching AT LEAST  2 moves&#41;
   1    0.00000      100.000 &#40;chance of matching AT LEAST  1 moves&#41;
   0    0.00000      100.000 &#40;chance of matching AT LEAST  0 moves&#41;
Are there problems and issues with this methodology? Of course there is! I can name a few:

1. We don't know where the book ends and the matching should begin.

2. We don't know how to account for the nature of any particular game.

3. Are there any players who would naturally match Houdini more than 60% of the time?

4. Many others .....

Don is there no way for you to program your app to look at a large database of games like Megabase 2013. To find the novelty in the game being analyzed. Not a perfect solution, but should be better than a artificial move number assumed for every game.

Modern Times · Post by **Modern Times** » Mon Jul 29, 2013 10:04 am

As long as it is used just to identify where further investigation is required, rather than as conclusive evidence by itself, then I think this is good.

zullil · Post by **zullil** » Mon Jul 29, 2013 10:59 am

zullil wrote:
Don wrote: But for this test I cannot use the binomial distribution because the expected match rate I am using is not 50%. To avoid the tedious math I simply used a Monte Carlo simulation - which gives me a good estimate.
The binomial probability distribution doesn't require p=1/2.

Your Monte Carlo simulation seems fine. Here are the actual numbers for your example (n=20, p=.6), at least according to Mathematica:
Code: Select all
20	0.00365616
19	0.0524049
18	0.361147
17	1.59612
16	5.0952
15	12.5599
14	25.0011
13	41.5893
12	59.5599
11	75.5337
10	87.2479
9	94.3474
8	97.8971
7	99.3534
6	99.8388
5	99.9683
4	99.9953
3	99.9995
2	100.
1	100.
0	100

Since p=0.6 and q=0.4 are close to 0.5, the binomial distribution should be fairly well approximated using a normal distribution with mean n*p and standard deviation sqrt(n*p*q), as long as n is about 20 or more.

For n=20, we'd have a mean of 12 matches and a standard deviation of sqrt(4.8) (about 2.2). So 17 matches would already be more than 2 sd's above mean ...

Not sure if this might help in any way.

Ajedrecista · Post by **Ajedrecista** » Mon Jul 29, 2013 11:15 am

Hello Louis:

zullil wrote:
Don wrote: But for this test I cannot use the binomial distribution because the expected match rate I am using is not 50%. To avoid the tedious math I simply used a Monte Carlo simulation - which gives me a good estimate.
The binomial probability distribution doesn't require p=1/2.

Your Monte Carlo simulation seems fine. Here are the actual numbers for your example (n=20, p=.6), at least according to Mathematica:
Code: Select all
20	0.00365616
19	0.0524049
18	0.361147
17	1.59612
16	5.0952
15	12.5599
14	25.0011
13	41.5893
12	59.5599
11	75.5337
10	87.2479
9	94.3474
8	97.8971
7	99.3534
6	99.8388
5	99.9683
4	99.9953
3	99.9995
2	100.
1	100.
0	100

Obviously I can reproduce your numbers with a short programme written in Fortran:

Code: Select all

Binomial distribution from 0 up to 20&#58;

P = 60.0000 %

 n   Prob.(%)   Acum.(%)

20    0.0037      0.0037
19    0.0487      0.0524
18    0.3087      0.3611
17    1.2350      1.5961
16    3.4991      5.0952
15    7.4647     12.5599
14   12.4412     25.0011
13   16.5882     41.5893
12   17.9706     59.5599
11   15.9738     75.5337
10   11.7142     87.2479
 9    7.0995     94.3474
 8    3.5497     97.8971
 7    1.4563     99.3534
 6    0.4854     99.8388
 5    0.1294     99.9683
 4    0.0270     99.9953
 3    0.0042     99.9995
 2    0.0005    100.0000
 1    0.0000    100.0000
 0    0.0000    100.0000

But I think that the point why Don choosed a MC simulation instead of the direct binomial distribution is for estimate P (in his simulation, P ~ 60%, but could have been 57% or 62%, for example). He needed to first estimate P, which is unknown a priori and not evident. I can be wrong, of course.

------------

zullil wrote:Since p=0.6 and q=0.4 are close to 0.5, the binomial distribution should be fairly well approximated using a normal distribution with mean n*p and standard deviation sqrt(n*p*q), as long as n is about 20 or more.

For n=20, we'd have a mean of 12 matches and a standard deviation of sqrt(4.8) (about 2.2). So 17 matches would already be more than 2 sd's above mean ...

Not sure if this might help in any way.

IIRC, I was teached that these approximations are better for n > 30 or, being in the safe side, n > 100. Anyway, the error with n = 20 should be small enough.

Regards from Spain.

Ajedrecista.

Don · Post by **Don** » Mon Jul 29, 2013 1:49 pm

mwyoung wrote:
Don wrote:I have put some work into my "cheat testing" tool and have done some analysis on some of its properties. I'll probably post a more technical description later but the basic idea is 3 fold:

1. Determining WHICH moves to test
2. Attaching a probability of match for 1 position tested.
3. Using probabilities to test a game, or a tournament.

Point 1 is important as many moves simply play themselves and any reasonable player is going to match those moves. Including them hurts the test. My rule is that Stockfish likes the moves on depth 1-12 and it also matches Houdini's move as well as the players move, it is discarded.

The opening is problematic, the rule I use is not to match any moves before the 15th move - clearly not a perfect rule but given a PGN file of GM games it's difficult to know where the book (or prepared analysis) ends.

I have now processed 11,489 games from the week in chess data. I chose to ONLY test games played between Grandmasters. This comes out to data for 912 Grandmasters. The MEDIAN number of moves available for matching comes out to 136 moves.

I was able to determine that the average match rate for Grandmasters from my data is 45.71%. That is the chance that a GM will play the same move as Houdini (not including the moves I filtered out of course) is nearly 50%.

It's not good enough to use this value in the probability calculation because it's my feeling that certain players may have a style that is naturally more like Houdini's. To make a long story short, the data indicates that this is indeed the case. For a calculation like this one needs to estimate an upper bound on the move matching percentage of the most Houdini-like players. So I looked for the single player than had the highest match percentage. Note that this is a dicey proposition because players you can get reliable statistics from players with few samples. For example there are some players who match Houdini 100% of the time, but there were only 3 or 4 moves to sample.

The value I came up with was 60%. I admit that the value is a bit arbitrary but I believe I was being conservative. Of all the players with at least 250 moves to sample the number one matcher will match 54% percent of the time and there are 253 players meeting that criteria. If I use players who have tiny samples I can get arbitrarily high of course. If I go down to players with only 100 samples I get a value just under 60%.

So now, given a single PGN game or a set of games for a given player I can calculated (using the 60% figure) the probability that they would match Houdini's moves as often as they did if they were not cheating.

To make a long story short (I will elaborate on this later) the 60% does seem to be extremely conservative because I tested it against all the games I processed and only 3 player (out of 912) go below 1% probability.

Let's consider the tournament that was won by Ivanov. In that tournament Ivanov matched 35 out of 52 moves of Houdini 3 in my test. According to the probability calculation A GM would equal that "performance" 17% of the time. It doesn't seem like a very rare event.

If you look at the other players however they are all well over 90% and usually approaching 100% - in other words there match percentage is so low that one would expect to see AT LEAST this match percentage over 90% of the time.

The 60% value I am using in the probability calculation is probably way too conservative for the vast majority of players who would not match Houdini even 50% of the time on their own.

About the probability calculation.

If there is a 50% probability of matching a single move, one can use the binomial probability calculation to determine what the chances are of getting M matches out S samples (or moves.) For example if you have 20 moves and a player matched 15 times, you can calculate the odds of that happening given that each has exactly a 50/50 change of happening. For a test like the one I am proposing here, you want to know the probability that someone will match AT LEAST M times - and can be calculated by summing the probabilities of each possible match frequency greater than or equal to the target number. For example in the 15 out of 20 case you sum the binomial probability of 15 out of 20, 16 out of 20, 17 out of 20 and so on.

But for this test I cannot use the binomial distribution because the expected match rate I am using is not 50%. To avoid the tedious math I simply used a Monte Carlo simulation - which gives me a good estimate. My MC sim will take as input the number of samples and return a table like this:
Code: Select all
 20    0.00003        0.003 &#40;chance of matching AT LEAST 20 moves&#41;
  19    0.00048        0.052 &#40;chance of matching AT LEAST 19 moves&#41;
  18    0.00305        0.357 &#40;chance of matching AT LEAST 18 moves&#41;
  17    0.01228        1.585 &#40;chance of matching AT LEAST 17 moves&#41;
  16    0.03494        5.079 &#40;chance of matching AT LEAST 16 moves&#41;
  15    0.07463       12.542 &#40;chance of matching AT LEAST 15 moves&#41;
  14    0.12437       24.979 &#40;chance of matching AT LEAST 14 moves&#41;
  13    0.16586       41.565 &#40;chance of matching AT LEAST 13 moves&#41;
  12    0.17968       59.533 &#40;chance of matching AT LEAST 12 moves&#41;
  11    0.15988       75.521 &#40;chance of matching AT LEAST 11 moves&#41;
  10    0.11720       87.241 &#40;chance of matching AT LEAST 10 moves&#41;
   9    0.07095       94.336 &#40;chance of matching AT LEAST  9 moves&#41;
   8    0.03552       97.889 &#40;chance of matching AT LEAST  8 moves&#41;
   7    0.01464       99.353 &#40;chance of matching AT LEAST  7 moves&#41;
   6    0.00486       99.839 &#40;chance of matching AT LEAST  6 moves&#41;
   5    0.00130       99.969 &#40;chance of matching AT LEAST  5 moves&#41;
   4    0.00026       99.995 &#40;chance of matching AT LEAST  4 moves&#41;
   3    0.00004      100.000 &#40;chance of matching AT LEAST  3 moves&#41;
   2    0.00000      100.000 &#40;chance of matching AT LEAST  2 moves&#41;
   1    0.00000      100.000 &#40;chance of matching AT LEAST  1 moves&#41;
   0    0.00000      100.000 &#40;chance of matching AT LEAST  0 moves&#41;
Are there problems and issues with this methodology? Of course there is! I can name a few:

1. We don't know where the book ends and the matching should begin.

2. We don't know how to account for the nature of any particular game.

3. Are there any players who would naturally match Houdini more than 60% of the time?

4. Many others .....
Don is there no way for you to program your app to look at a large database of games like Megabase 2013. To find the novelty in the game being analyzed. Not a perfect solution, but should be better than a artificial move number assumed for every game.

Yes, I have all the TWIC games and I have considered building a book that stops when the position is unique. The only reason I didn't do that is that I don't have a lot of time to put into this.

mwyoung · Post by **mwyoung** » Mon Jul 29, 2013 9:43 pm

Modern Times wrote:As long as it is used just to identify where further investigation is required, rather than as conclusive evidence by itself, then I think this is good.

This should be lesson learned for the many people who were so quick to use move matching as some kind of over whelming proof of guilt. You can't just slap together a computer cheating testing protocol without any testing, research, or study, and then claim it is some kind of infallible proof.

Even if your names are Lilov, and Houdart, two very smart people, but were wrong in their approach in regards to their protocol.

What Don has shown more then anything is just how difficult it will be to make a reliable, working, and tested computer cheating protocol.

There are many problems to overcome, and much analysis and testing that has to come first. Before any kind of reliable computer cheating test filter could every be used. And that is all it could be use for is a filter to decide if further investigation would be needed against a player, but not proof in itself.

Don · Post by **Don** » Mon Jul 29, 2013 10:38 pm

mwyoung wrote:
Modern Times wrote:As long as it is used just to identify where further investigation is required, rather than as conclusive evidence by itself, then I think this is good.
This should be lesson learned for the many people who were so quick to use move matching as some kind of over whelming proof of guilt. You can't just slap together a computer cheating testing protocol without any testing, research, or study, and then claim it is some kind of infallible proof.

I think you are probably over-reacting a bit because one or two people were claiming this, not "many" - but most people fell somewhere in between and were being reasonable. But even if you don't think they were being reasonable your post comes across as a condescending rebuke to those who dare not agree with you like it's your job to teach them a lesson.

Most people have opinions, fears and concerns and we should respect that without putting them down for it.

It's totally not clear to me how damning this evidence really is or isn't. My tools isn't damning at all of course but I still don't know how good the test is. I may be doing it far from the ideal way. As you know I struggled with the issue of how to weight the moves - some moves are worth nothing for matching purposes. A chess Master or Grandmaster may be able to see something you and I cannot. An expert in any field can often immediately see things it would take you or I a great deal of time to comprehend even if we could - so it's plausible that an objective human GM could compare the moves with Houdini and see much more. For instance it is possible there are moves that are EXTREMELY unlikely any human would play. There may a number of these low probability move that I cannot measure with my test. I tend not to trust human judgement and would prefer an objective test, but I have to admit that human judgement often trumps a cold clinical procedure. I could easily see a GM looking at the moves themselves with understanding along with Houdini's choices and being completely convinced and I don't think you and I are qualified to tell them they are ignorant fools because we don't agree with them. Yes, we can speculate that they are not being objective or that they are too eager to hang him, but it would be only speculation.

Having said that, I never felt that move matching was a very big part of this but it does seem to be what people want to focus on. If I were someone who wanted to exonerate Ivanov I would do everything I could to keep the focus on move matching because that is a such a red herring. Move matching will always be dicey because a computer can be a huge benefit even if you only takes its advice occasionally or as a filter to avoid blunders.

Even if your names are Lilov, and Houdart, two very smart people, but were wrong in their approach in regards to their protocol.

What Don has shown more then anything is just how difficult it will be to make a reliable, working, and tested computer cheating protocol.

There are many problems to overcome, and much analysis and testing that has to come first. Before any kind of reliable computer cheating test filter could every be used. And that is all it could be use for is a filter to decide if further investigation would be needed against a player, but not proof in itself.

It could conceivably be used as a proof - DNA matching is and so is fingerprint matching under the right contexts. It's possible for 2 people (who are not brothers) to pass the DNA match test or fingerprint match test (which actually depends somewhat on human judgement.) We use the word proof as if we were seeking a formal mathematical proof, such as whether P=NP but in any kind of civil context it just means evidence that is so convincing that it cannot be REASONABLY doubted. As it stands now it would not pass that test but it doesn't necessarily mean that it couldn't. It could happen if the test was improved significantly and shown to be reliable and then Ivanov played a couple of back to back tournaments getting high match rates - it could put this in the "beyond a reasonable doubt" territory.

mwyoung · Post by **mwyoung** » Tue Jul 30, 2013 2:33 am

Don wrote:
mwyoung wrote:
Modern Times wrote:As long as it is used just to identify where further investigation is required, rather than as conclusive evidence by itself, then I think this is good.
This should be lesson learned for the many people who were so quick to use move matching as some kind of over whelming proof of guilt. You can't just slap together a computer cheating testing protocol without any testing, research, or study, and then claim it is some kind of infallible proof.

I think you are probably over-reacting a bit because one or two people were claiming this, not "many" - but most people fell somewhere in between and were being reasonable. But even if you don't think they were being reasonable your post comes across as a condescending rebuke to those who dare not agree with you like it's your job to teach them a lesson.

Most people have opinions, fears and concerns and we should respect that without putting them down for it.

It's totally not clear to me how damning this evidence really is or isn't. My tools isn't damning at all of course but I still don't know how good the test is. I may be doing it far from the ideal way. As you know I struggled with the issue of how to weight the moves - some moves are worth nothing for matching purposes. A chess Master or Grandmaster may be able to see something you and I cannot. An expert in any field can often immediately see things it would take you or I a great deal of time to comprehend even if we could - so it's plausible that an objective human GM could compare the moves with Houdini and see much more. For instance it is possible there are moves that are EXTREMELY unlikely any human would play. There may a number of these low probability move that I cannot measure with my test. I tend not to trust human judgement and would prefer an objective test, but I have to admit that human judgement often trumps a cold clinical procedure. I could easily see a GM looking at the moves themselves with understanding along with Houdini's choices and being completely convinced and I don't think you and I are qualified to tell them they are ignorant fools because we don't agree with them. Yes, we can speculate that they are not being objective or that they are too eager to hang him, but it would be only speculation.

Having said that, I never felt that move matching was a very big part of this but it does seem to be what people want to focus on. If I were someone who wanted to exonerate Ivanov I would do everything I could to keep the focus on move matching because that is a such a red herring. Move matching will always be dicey because a computer can be a huge benefit even if you only takes its advice occasionally or as a filter to avoid blunders.

Even if your names are Lilov, and Houdart, two very smart people, but were wrong in their approach in regards to their protocol.

What Don has shown more then anything is just how difficult it will be to make a reliable, working, and tested computer cheating protocol.

There are many problems to overcome, and much analysis and testing that has to come first. Before any kind of reliable computer cheating test filter could every be used. And that is all it could be use for is a filter to decide if further investigation would be needed against a player, but not proof in itself.
It could conceivably be used as a proof - DNA matching is and so is fingerprint matching under the right contexts. It's possible for 2 people (who are not brothers) to pass the DNA match test or fingerprint match test (which actually depends somewhat on human judgement.) We use the word proof as if we were seeking a formal mathematical proof, such as whether P=NP but in any kind of civil context it just means evidence that is so convincing that it cannot be REASONABLY doubted. As it stands now it would not pass that test but it doesn't necessarily mean that it couldn't. It could happen if the test was improved significantly and shown to be reliable and then Ivanov played a couple of back to back tournaments getting high match rates - it could put this in the "beyond a reasonable doubt" territory.

It maybe condescending or not, but it is accurate. I read the many peoples post not only here, but many other places as well. And they accepted the protocol because of who was backing the protocol without question. What Lilov and Houdart claimed about their protocol was 100% false regarding its accuracy. "No further evidence needed" - Indeed.

Save your outrage for them.

Cheat testing

Cheat testing

Re: Cheat testing

Re: Cheat testing

Re: Cheat testing

Re: Cheat testing

Re: Cheat testing.

Re: Cheat testing

Re: Cheat testing

Re: Cheat testing

Re: Cheat testing