Cheating suspicion at the Zadar Open in Croatia

Laskos · Post by **Laskos** » Sat May 11, 2013 12:13 pm

Don wrote:It appears that the test I proposed has already been done in a scientifically rigorous way. I was not aware of that. The PDF report is here:

http://www.cse.buffalo.edu/~regan/chess ... report.pdf

To cut to the chase, here is the conclusion he arrived at:

Conclusions

The bottom line of the test is that the results are about as strong as
one can reasonably expect a statistical move-matching test, done
scientifically and neutrally and with respect for due process, to
produce. My model projects that for a 2300 player to achieve the high
computer correspondence shown in the nine tested games, the odds
against are almost a million-to-one. The control data and bases for
comparison, which are wholly factual, show several respects in which
the performance is exceptional even for a 2700-player, and virtually
unprecedented for an untitled player. The z-scores I am reporting are
higher than in any other instance I have formally tested, which is
what prompts me to raise the questions in the cover letter now.

Thanks, that I was asking. Pretty convincing to me. Basically he plays at a level far above even his performance in tourney of 2700 (which by itself is very high). More like z=1-2 Super GM of 2800 on a good day with a 3000-3200 performance.

Don · Post by **Don** » Sat May 11, 2013 12:14 pm

Laskos wrote:
michiguel wrote:
The other problem is that "condemning" (i.e. banning etc.) based on statistics set a very dangerous precedent. Not that I have a solution, but I guess my point is that this is complex.

Miguel
DNA profiling is a statistical tool too. Each STR is polymorphic, but the number of alleles is very small. Each STR allele is shared by around 5 - 20% of individuals. The power of STR analysis comes from looking at multiple STR loci simultaneously. The pattern of alleles can identify an individual very accurately. Similar to what these Buffalo guys are doing about Ivanov.

DNA profiling is used in criminal investigations, often as proofs.

Some people cannot understand that though. They can only deal with one statistic at a time so you get comment such as, "any player might play the same move that Houdini plays" and that is about as far as their brain can stretch.

Another form of this is the Miller Rabin prime number test which actually tests if a value is a composite. It's not an admissible proof but a statistical test. But you can "prove" a given number with any arbitrary degree of certainty.

Don · Post by **Don** » Sat May 11, 2013 2:44 pm

Adam Hair wrote:
Don wrote:It appears that the test I proposed has already been done in a scientifically rigorous way. I was not aware of that. The PDF report is here:

http://www.cse.buffalo.edu/~regan/chess ... report.pdf

To cut to the chase, here is the conclusion he arrived at:

Conclusions

The bottom line of the test is that the results are about as strong as
one can reasonably expect a statistical move-matching test, done
scientifically and neutrally and with respect for due process, to
produce. My model projects that for a 2300 player to achieve the high
computer correspondence shown in the nine tested games, the odds
against are almost a million-to-one. The control data and bases for
comparison, which are wholly factual, show several respects in which
the performance is exceptional even for a 2700-player, and virtually
unprecedented for an untitled player. The z-scores I am reporting are
higher than in any other instance I have formally tested, which is
what prompts me to raise the questions in the cover letter now.
Has anybody focused on the moves played by Ivanov's opponents? I have not spent much time on this, but I did check out the game against Kurajica. The majority of Kurajica's moves matched the moves selected by Houdini 2.0 when I stepped through the game. I wonder how his other opponents' move selections compare to Houdini 2.0's move selections.

I don't know if that was done, but I don't know why his opponents would have specific significance here. Is that particular significant? Also, I think all strong players match a significant number of moves with each other and strong programs. You should know from the similarity tester data that the real differences and the interesting statistics are in a few percent of the moves. Even Jesse says he matches 75% of the first 2 move choices which I think is pretty low but he says he is a weak player.

The paper was interesting in more ways that one. It was not just straight move matching but a casual reading of it implied that move error was also considered. I'm not sure that is very appropriate except to determine if a player is playing over their head. But that is what aroused suspicion in the first place. What he accomplished is almost unheard of. It's not unusual for a 1300 to play over his head by a few hundred ELO due to rapid legitimate improvement but it's insane to see this at such high levels.

Another interesting point mentioned in the paper, and I think in part this is what he was trying to measure, is whether Ivanov was simply having a great tournament or was he actually performing amazingly well. The difference is in the quality of the moves. I could have a great tournament with a lot of help from my blundering opponents or I could have a great tournament because I am playing exceedingly well.

And finally he used several programs but it is now strongly believed that he was using Houdini 2. I would love to see 2 things added to this study:

1. statistics based on single games.
2. statistics (like you suggest) of hundreds of strong players in general.

The idea (and my intuition) is that some games will contain many moves which are easy to match and others will not. So it will be good to understand where the bounds are.

My other intuition here is that it's more about playing style that strength. I think Ivanov crushing these strong players is convincing evidence all by itself but I would be more comfortable with a totally independent metric for "testing' how he played that doesn't try to measure how well he played. We already KNOW he played well so why construct a test that proves this and pretend it is additional evidence? It actually is additional evidence and was the intent but it is not additional evidence that should be given much weight. I would have preferred a pure move matching test.

It is also possible that he arranged the test in a way to prove his point. There are many things that were done to "increase" the relevance of the data but I would like to know if he tweaked and tuned this after the fact or made all these decisions before running the data. Even if he didn't do that I know that even subconsciously you can push results in some direction without knowing this by how you construct the testing. So I would like to see a lot more comparison data with other players.

Please note that if you do run this against a million published games, you are probably going to find data that implies it is not remarkable to have a few games with really high match rates, simply because it's likely that some other players have cheated too and have not been caught.

One question for the entire forum. Many of the chess servers claim they can detect cheating. Does anyone know if this is a bluff to scare away cheaters? I have heard that many people have been kicked off the servers or waned for cheating so does anyone have any information on this and if/how they actually have a reliable way to detect this?

Don

Adam Hair · Post by **Adam Hair** » Sat May 11, 2013 3:52 pm

Regan compares Ivanov's move selection against a training set derived of games from 2006 to 2009. I wonder if it may be the case that more and more master level players are training with the top chess engines. If that is true, would their play be influenced by Houdini et al? That would increase matched move percentage in general. This is why I think the opponent's moves have some connection.

michiguel · Post by **michiguel** » Sat May 11, 2013 4:34 pm

Laskos wrote:
michiguel wrote:
The other problem is that "condemning" (i.e. banning etc.) based on statistics set a very dangerous precedent. Not that I have a solution, but I guess my point is that this is complex.

Miguel
DNA profiling is a statistical tool too. Each STR is polymorphic, but the number of alleles is very small. Each STR allele is shared by around 5 - 20% of individuals. The power of STR analysis comes from looking at multiple STR loci simultaneously. The pattern of alleles can identify an individual very accurately. Similar to what these Buffalo guys are doing about Ivanov.

DNA profiling is used in criminal investigations, often as proofs.

Only after a criminal is caught based on some other evidence. It is not used by searching DNA on a database. The difference is huge.

Why do we have a suspicion about this guy? It is only based on stats or the surprising results he got. In other words, any person in the planet in the last x years who would have had this type of rare results would have set the alarm (it is almost the same as searching from a database). Then the probabilities need to be calculated in a different way, and certain rare episodes do not seem "that" rare anymore. This is what the "prosecutor's fallacy" is a about, and by the way, some people in the past were condemned based on apparently solid statistics, who were later found innocent.

Yes, sounds convincing to me, but that is not the point. There should be different levels to convince me and to execute a ban.

Miguel

syzygy · Post by **syzygy** » Sat May 11, 2013 4:40 pm

michiguel wrote:Only after a criminal is caught based on some other evidence. It is not used by searching DNA on a database. The difference is huge.

I'm afraid there are some countries that do match DNA traces with a database in order to find a suspect. You are correct about the prosecutor's fallacy, but prosecutors (and judges) still fall for it...

noctiferus · Post by **noctiferus** » Sat May 11, 2013 6:35 pm

Hi Adam. I went through thr game with Kurajica some time ago.
Here is a summary of my check. Is it similar to yours?

Hardware, OS and settings: i7 / Win7
GUI Fritz 11/Hash 256 Mb/ Houdini 2.0.c/1 thread/ depth 18
Legenda:
first row:: moves from.. to..
second row: 1 means a matching between move played and Houd's first choice
third row: 1 means a matching between move played and Houd's second choice
fourth row: 1 means no matching, or matching between move played and third or worse
anywhere : ----- means never happened Houd's move

Ivanov
10-19
I 111111111
II 1
- ----
20-29
I 111111111
II 1
- ----
30-34
I 11111
II ----
- ----

Kurajica
10-19
I 111
II 111
- 1111
20-29
I 111111
II 111
- 1
30-34
I 111
II -----
- 11

TOTAL
Ivanov I 23 II 2 - 0
Kurajica I 12 II 6 - 7

Don · Post by **Don** » Sat May 11, 2013 7:09 pm

michiguel wrote:
Laskos wrote:
michiguel wrote:
The other problem is that "condemning" (i.e. banning etc.) based on statistics set a very dangerous precedent. Not that I have a solution, but I guess my point is that this is complex.

Miguel
DNA profiling is a statistical tool too. Each STR is polymorphic, but the number of alleles is very small. Each STR allele is shared by around 5 - 20% of individuals. The power of STR analysis comes from looking at multiple STR loci simultaneously. The pattern of alleles can identify an individual very accurately. Similar to what these Buffalo guys are doing about Ivanov.

DNA profiling is used in criminal investigations, often as proofs.
Only after a criminal is caught based on some other evidence. It is not used by searching DNA on a database. The difference is huge.

Why do we have a suspicion about this guy? It is only based on stats or the surprising results he got.

The initial suspicion was raised by the exceedingly unlikely results that he got. I don't see how there is anything improper about that. It's highly improper if it's considered proof but there is NO crime that doesn't start with a suspicion and the suspicion is almost always based on a personal motive that benefits the alleged suspect. That is why the term "suspect" is used to describe a "person of interest" in any crime.

In a court of law the investigation proceeded in stages so that things (in principle) are not carried too far. There are protections in place that attempt to limit the damage caused by false accusations, such as a hearing which often determines if things should be taken to the next step. Note that the hearing is not to establish innocence or guilt, otherwise that is all that would be needed.

In other words, any person in the planet in the last x years who would have had this type of rare results would have set the alarm (it is almost the same as searching from a database).

I'm not an advocate of seeking out crime without an actual accusation but when there is an accusation made then it makes sense to apply some initial test to determine if there is reason to proceed or whether it should be dropped.

Then the probabilities need to be calculated in a different way, and certain rare episodes do not seem "that" rare anymore. This is what the "prosecutor's fallacy" is a about, and by the way, some people in the past were condemned based on apparently solid statistics, who were later found innocent.

Yes, sounds convincing to me, but that is not the point. There should be different levels to convince me and to execute a ban.

I don't think this is a case of the prosecutors fallacy - at least not based on paper we just read.

A simple example of the prosecutors fallacy given in the wikipedia is that if the perpetrator is known to have the same blood type as the accused and 10% of the general population share that blood type, then the accused is 90% likely to be guilty. That is the fallacy because then millions of people are 90% likely to be guilty. It's a gross misunderstanding of conditional probabilities.

But it is a valid principle when there was not a search for suspects that was BASED solely on the blood type. So if the blood type is ignored and there is one strong suspect based on OTHER independent evidence, the blood type test can exonerate him or add significant additional confidence.

This is EXACTLY how this transpired if you consider the paper. The paper didn't "fish" for players to accuse out of some database, it applied a single test to person already strongly suspected of guilt due to a completely independent observation, his superhuman sudden improvement.

What would be a fallacy is if the test failed and then another test was constructed, then another, then another until finally one of these tests seemed to indicate his guilt. My only problem with the paper is that I have no way of knowing whether the author tuned his test before writing the paper. I would like to believe that the exact test was fully specified and not adjusted after the fact. We will never know that but if we really believe in the innocent until proven guilty we should give the paper author the benefit of the doubt, right?

At any rate, people can run their own tests, the data is not hidden.

Miguel

velmarin · Post by **velmarin** » Sat May 11, 2013 7:19 pm

Don wrote:
SzG wrote:
OK, so is Ippolit a reverse engineered Rybka?

No. Ippolit is not Rybka. It was heavily based on Rybka but it is a different program.

Interesting comment, I hope you remember and stop attacking those who develop Ivanhoe or its derivatives.

You are a very important member of this community and its previous criticism have done much damage.

rectify is wise.
thanks

carldaman · Post by **carldaman** » Sat May 11, 2013 7:20 pm

Don wrote: One question for the entire forum. Many of the chess servers claim they can detect cheating. Does anyone know if this is a bluff to scare away cheaters? I have heard that many people have been kicked off the servers or waned for cheating so does anyone have any information on this and if/how they actually have a reliable way to detect this?

Don

It's most certainly not a bluff. The best ones have very advanced detection techniques that include move correlation statistics for multiple games.

CL

Cheating suspicion at the Zadar Open in Croatia

Re: Cheating suspicion at the Zadar Open in Croatia.

Re: Cheating suspicion at the Zadar Open in Croatia.

Re: Cheating suspicion at the Zadar Open in Croatia.

Re: Cheating suspicion at the Zadar Open in Croatia.

Re: Cheating suspicion at the Zadar Open in Croatia.

Re: Cheating suspicion at the Zadar Open in Croatia.

Re: Cheating suspicion at the Zadar Open in Croatia.

Re: Cheating suspicion at the Zadar Open in Croatia.

Re: Cheating suspicion at the Zadar Open in Croatia.

Re: Cheating suspicion at the Zadar Open in Croatia.