Cheating suspicion at the Zadar Open in Croatia

michiguel · Post by **michiguel** » Sat May 11, 2013 12:06 am

Don wrote:
Jesse Gersenson wrote:
Houdini wrote:No human being can consistently play the #1 or #2 choice of Houdini.
What percentage of moves were consistent?

I'm rated 1800 fide and, in a typical over the board game, 75% of my moves will match the first or second choice of a strong engine.
The match was on virtual every move. And the second choice move was never a bad move but a move Houdini might even had played on a slightly different setting. I think the point is that even if you take Houdini's move every time you cannot check later and expect to get 100% match due to chaos theory, no engine plays exactly the same on different hardware, different conditions, etc.

I just watch the first half an hour of the Lilov's video. The guy is extremely biased. It started really bad when he said in the first game "dxe5 is a bad move and a human player would not play it". That is theory (I played it many times, in fact). Then, "Bg5 is suggested by the computer", again, basic theory. Almost every single time he said "what a weird move for a human", he was wrong. d5 in the Queen indian, is typical to block the Bb7. a5 in the slav, again, typical.

When he drew, blamed it to the computer weaknesses, or time trouble (why would you have time trouble if you are cheating?), when he won, blamed it on the brilliancy of the computer (not that the opponents screw it badly).

So, I would personally bet the guy is cheating, but we need less biased proof to condemn. Maybe it got better later in the video, I will continue some other time.

Miguel

michiguel · Post by **michiguel** » Sat May 11, 2013 12:07 am

Houdini wrote:
Modern Times wrote:Again, if you can't show HOW he is doing this, then there is no case to answer.
I disagree with this.
No human being can consistently play the #1 or #2 choice of Houdini. Detecting this pattern exposes the cheating, no further evidence is required.
I'm surprised that even on a computer chess forum this is not self evident.

Robert

Then the Houdini performance is really disappointing, or you need a better operator

Miguel

Houdini · Post by **Houdini** » Sat May 11, 2013 12:21 am

michiguel wrote:
Houdini wrote:
Modern Times wrote:Again, if you can't show HOW he is doing this, then there is no case to answer.
I disagree with this.
No human being can consistently play the #1 or #2 choice of Houdini. Detecting this pattern exposes the cheating, no further evidence is required.
I'm surprised that even on a computer chess forum this is not self evident.

Robert
Then the Houdini performance is really disappointing, or you need a better operator

Miguel

Your reply doesn't seem related to my post you quoted.
Thank you for reading better next time.

Robert

Don · Post by **Don** » Sat May 11, 2013 12:22 am

michiguel wrote:
Don wrote:
Jesse Gersenson wrote:
Houdini wrote:No human being can consistently play the #1 or #2 choice of Houdini.
What percentage of moves were consistent?

I'm rated 1800 fide and, in a typical over the board game, 75% of my moves will match the first or second choice of a strong engine.
The match was on virtual every move. And the second choice move was never a bad move but a move Houdini might even had played on a slightly different setting. I think the point is that even if you take Houdini's move every time you cannot check later and expect to get 100% match due to chaos theory, no engine plays exactly the same on different hardware, different conditions, etc.
I just watch the first half an hour of the Lilov's video. The guy is extremely biased. It started really bad when he said in the first game "dxe5 is a bad move and a human player would not play it". That is theory (I played it many times, in fact). Then, "Bg5 is suggested by the computer", again, basic theory. Almost every single time he said "what a weird move for a human", he was wrong. d5 in the Queen indian, is typical to block the Bb7. a5 in the slav, again, typical.

When he drew, blamed it to the computer weaknesses, or time trouble (why would you have time trouble if you are cheating?), when he won, blamed it on the brilliancy of the computer (not that the opponents screw it badly).

So, I would personally bet the guy is cheating, but we need less biased proof to condemn. Maybe it got better later in the video, I will continue some other time.

Miguel

The video is a form of expert testimony, it was not presented as a proof on paper.

But I don't want to get sucked into an argument. I have an idea that will be a good way to get some facts that are not biased:

Take perhaps 100 recent grandmaster games and perform a comparison between the moves the grandmaster played and Houdini. I would suggest that we take ANY move Houdini produces started with (say) the 10th ply and above and see how many matches the GM has with one of these moves. I guess we have to go beyond the opening book of course so perhaps we could start at some arbitrary move number.

Then we have a baseline of sorts. How many moves can we expect a human to match? What would be a reasonable upper and lower bound? I think we can simply calculate the odds that any single move would match to get the overall probability that Ivanov could do this on his own.

I'll try to to do this myself if I can figure out where to get the games in question. But it would be good if someone else duplicated my work.

Don · Post by **Don** » Sat May 11, 2013 12:48 am

It appears that the test I proposed has already been done in a scientifically rigorous way. I was not aware of that. The PDF report is here:

http://www.cse.buffalo.edu/~regan/chess ... report.pdf

To cut to the chase, here is the conclusion he arrived at:

Conclusions

The bottom line of the test is that the results are about as strong as
one can reasonably expect a statistical move-matching test, done
scientifically and neutrally and with respect for due process, to
produce. My model projects that for a 2300 player to achieve the high
computer correspondence shown in the nine tested games, the odds
against are almost a million-to-one. The control data and bases for
comparison, which are wholly factual, show several respects in which
the performance is exceptional even for a 2700-player, and virtually
unprecedented for an untitled player. The z-scores I am reporting are
higher than in any other instance I have formally tested, which is
what prompts me to raise the questions in the cover letter now.

Adam Hair · Post by **Adam Hair** » Sat May 11, 2013 5:43 am

Don wrote:It appears that the test I proposed has already been done in a scientifically rigorous way. I was not aware of that. The PDF report is here:

http://www.cse.buffalo.edu/~regan/chess ... report.pdf

To cut to the chase, here is the conclusion he arrived at:

Conclusions

The bottom line of the test is that the results are about as strong as
one can reasonably expect a statistical move-matching test, done
scientifically and neutrally and with respect for due process, to
produce. My model projects that for a 2300 player to achieve the high
computer correspondence shown in the nine tested games, the odds
against are almost a million-to-one. The control data and bases for
comparison, which are wholly factual, show several respects in which
the performance is exceptional even for a 2700-player, and virtually
unprecedented for an untitled player. The z-scores I am reporting are
higher than in any other instance I have formally tested, which is
what prompts me to raise the questions in the cover letter now.

Has anybody focused on the moves played by Ivanov's opponents? I have not spent much time on this, but I did check out the game against Kurajica. The majority of Kurajica's moves matched the moves selected by Houdini 2.0 when I stepped through the game. I wonder how his other opponents' move selections compare to Houdini 2.0's move selections.

michiguel · Post by **michiguel** » Sat May 11, 2013 6:17 am

Houdini wrote:
michiguel wrote:
Houdini wrote:
Modern Times wrote:Again, if you can't show HOW he is doing this, then there is no case to answer.
I disagree with this.
No human being can consistently play the #1 or #2 choice of Houdini. Detecting this pattern exposes the cheating, no further evidence is required.
I'm surprised that even on a computer chess forum this is not self evident.

Robert
Then the Houdini performance is really disappointing, or you need a better operator

Miguel
Your reply doesn't seem related to my post you quoted.
Thank you for reading better next time.

Robert

It was disappointing that Houdini operated by Ivanov did not win the tournament. So, you need a better operator than Ivanov. That was the idea, but I will explain it better next time.

Miguel

michiguel · Post by **michiguel** » Sat May 11, 2013 6:32 am

Don wrote:It appears that the test I proposed has already been done in a scientifically rigorous way. I was not aware of that. The PDF report is here:

http://www.cse.buffalo.edu/~regan/chess ... report.pdf

To cut to the chase, here is the conclusion he arrived at:

Conclusions

The bottom line of the test is that the results are about as strong as
one can reasonably expect a statistical move-matching test, done
scientifically and neutrally and with respect for due process, to
produce. My model projects that for a 2300 player to achieve the high
computer correspondence shown in the nine tested games, the odds
against are almost a million-to-one. The control data and bases for
comparison, which are wholly factual, show several respects in which
the performance is exceptional even for a 2700-player, and virtually
unprecedented for an untitled player. The z-scores I am reporting are
higher than in any other instance I have formally tested, which is
what prompts me to raise the questions in the cover letter now.

Not without some hesitations. This paragraphs summarizes the biggest concerns when statistics are apply to these cases

"The interpretation becomes dicult when only a small number of items are tested. What caused us to select them for testing? If the cause comes from preliminary indications of the same kind which in this case means the mere fact of beating strong players as well doing a quick check of games with an engine|then the bias in selection can upset the interpretation of probabilities. This is why my policy [12] has been to require an independent factor that determines the selection, such as behavioral or physical evidence of cheating. (It should be footnoted that the same policy can allow excluding at least the round-8 game, whose prior circumstances were different.)"

Then it is explained why he believes this case should be an exception. Anyway, it is a dangerous issue known as "prosecutor's fallacy" http://en.wikipedia.org/wiki/Prosecutor%27s_fallacy that is the reason why care should be exercised to the maximum.

The other problem is that "condemning" (i.e. banning etc.) based on statistics set a very dangerous precedent. Not that I have a solution, but I guess my point is that this is complex.

Miguel

Don · Post by **Don** » Sat May 11, 2013 12:02 pm

michiguel wrote:
Don wrote:It appears that the test I proposed has already been done in a scientifically rigorous way. I was not aware of that. The PDF report is here:

http://www.cse.buffalo.edu/~regan/chess ... report.pdf

To cut to the chase, here is the conclusion he arrived at:

Conclusions

The bottom line of the test is that the results are about as strong as
one can reasonably expect a statistical move-matching test, done
scientifically and neutrally and with respect for due process, to
produce. My model projects that for a 2300 player to achieve the high
computer correspondence shown in the nine tested games, the odds
against are almost a million-to-one. The control data and bases for
comparison, which are wholly factual, show several respects in which
the performance is exceptional even for a 2700-player, and virtually
unprecedented for an untitled player. The z-scores I am reporting are
higher than in any other instance I have formally tested, which is
what prompts me to raise the questions in the cover letter now.
Not without some hesitations. This paragraphs summarizes the biggest concerns when statistics are apply to these cases

This is standard boilerplate. He is stating the obvious stuff here than anyone who deals with sampling has to be aware of. If he didn't include it in his report he would be amiss.

"The interpretation becomes dicult when only a small number of items are tested. What caused us to select them for testing? If the cause comes from preliminary indications of the same kind which in this case means the mere fact of beating strong players as well doing a quick check of games with an engine|then the bias in selection can upset the interpretation of probabilities. This is why my policy [12] has been to require an independent factor that determines the selection, such as behavioral or physical evidence of cheating. (It should be footnoted that the same policy can allow excluding at least the round-8 game, whose prior circumstances were different.)"

Then it is explained why he believes this case should be an exception. Anyway, it is a dangerous issue known as "prosecutor's fallacy" http://en.wikipedia.org/wiki/Prosecutor%27s_fallacy that is the reason why care should be exercised to the maximum.

The other problem is that "condemning" (i.e. banning etc.) based on statistics set a very dangerous precedent. Not that I have a solution, but I guess my point is that this is complex.

Miguel

He is basically saying that you could introduce bias by hand picking specific games and the selection process itself must be considered. I think he was being thorough by stating his awareness of this principle and possible caveats that should always be considered. Nobody could fault the selection of games that he chose but it could be a major concern if people started using accusations as a kind of weapon. People gravitate immediately to that concern because we can all imagine it happening to us by a jealous opponent and instead of it becoming a factor to be strongly considered it becomes an irrational bias.

Laskos · Post by **Laskos** » Sat May 11, 2013 12:03 pm

michiguel wrote:
The other problem is that "condemning" (i.e. banning etc.) based on statistics set a very dangerous precedent. Not that I have a solution, but I guess my point is that this is complex.

Miguel

DNA profiling is a statistical tool too. Each STR is polymorphic, but the number of alleles is very small. Each STR allele is shared by around 5 - 20% of individuals. The power of STR analysis comes from looking at multiple STR loci simultaneously. The pattern of alleles can identify an individual very accurately. Similar to what these Buffalo guys are doing about Ivanov.

DNA profiling is used in criminal investigations, often as proofs.

Cheating suspicion at the Zadar Open in Croatia

Re: Cheating suspicion at the Zadar Open in Croatia.

Re: Cheating suspicion at the Zadar Open in Croatia.

Re: Cheating suspicion at the Zadar Open in Croatia.

Re: Cheating suspicion at the Zadar Open in Croatia.

Re: Cheating suspicion at the Zadar Open in Croatia.

Re: Cheating suspicion at the Zadar Open in Croatia.

Re: Cheating suspicion at the Zadar Open in Croatia.

Re: Cheating suspicion at the Zadar Open in Croatia.

Re: Cheating suspicion at the Zadar Open in Croatia.

Re: Cheating suspicion at the Zadar Open in Croatia.