Carlsen withdrawal after loss to Niemann

DrCliche · Post by **DrCliche** » Sun Sep 11, 2022 12:08 pm

AndrewGrant wrote: ↑Sun Sep 11, 2022 10:30 amMy point is to say, why should I take your analysis as true and correct, and not this guy's analysis as true and correct?

Because none of the claims I've made are obviously factually wrong, while Your Guy has committed significant and obvious blunders, and made factually incorrect claims, as detailed previously.

Usually, when one guy says multiple blatantly and factually wrong things, and another guy has not yet been shown to say any wrong things, you should place more credence in the claims and conclusions of the person who has not yet been shown to say any wrong things. (Or if you're not willing to do so, you should verify the information yourself.)

Is removing quick events proper?

They take place at a significantly shorter time control, so I'd imagine most people consider it correct to exclude them from an analysis of performance at classical time controls, as I do. They also aren't FIDE rated, and rarely have meaningful prizes, so I'd imagine most people consider there to be less incentive to cheat in quick events, as I do.

For example, suppose that the typical method of cheating requires the participation of a confidant, which imposes a nonzero upfront cost, not to mention further exposes yourself to potentially getting caught. Probably most cheaters are less likely to spend resources (or take the chance of getting caught) in order to cheat at such events.

Does that fact that you could not verify a few of the events have a significant impact? Were the few events pointed out marked correctly or incorrectly, can you answer that?

As I said in my original post, I was unable to locate any information contradicting the claimed broadcast status of any of the events in the original tweet. So if by "the few events pointed out marked correctly or incorrectly" you mean the ones that Your Guy claimed were wrong, then all I can say is that I have seen no evidence that the original tweet was wrong, and I verified what I could by going to each tournament's official website.

Your Guy gives absolutely no evidence for his corrections. His only links are to the USCF pages for the tournaments, and he provides no links to any pages that mention a live broadcast for the contested tournaments.

I can certainly find a record of the games of most of the tournaments online, for example, FollowChess has the 2019 World Open games (as noted by Your Guy), but I can find neither mention of nor links to a live broadcast of the 2019 World Open games on its official website, nor in any contemporaneous mentions of the 2019 World Open. That doesn't mean the 2019 World Open wasn't broadcast online, and that doesn't mean I didn't miss something, but I can only work with the information that I have available to me. As Your Guy actually presented no new information, and committed obvious blunders that call into question the thoroughness of his process, I see no reason to amend my analysis at this time.

I also consider it pretty unlikely that the original tweet systematically erred/lied about precisely those tournaments whose broadcast status I wasn't able to definitively confirm. While possible, it presupposes the original tweeter would somehow know exactly which information I would miss in my checks, or that he used the exact same information gathering process that I did, or that he otherwise committed the exact same errors that I did. But let's go wild and do some robustness checks, anyway.

If you flip the broadcast status of the Niemann's most significant tournament in the dataset, the 2019 World Open, broadcast status still explains 34% of the variation in his performance, and its regression coefficient is large and positive, with a p-value of 0.064:

True, that's just outside the traditional significance threshold of 0.05, but that's arbitrary to begin with. The 95% confidence interval is [-1, 29], and it's overwhelmingly likely that broadcast status is the single most important factor in determining Niemann's performance across the dataset.

If you additionally flip the broadcast status of Niemann's second most significant tournament in the dataset, the 2019 Marshall Chess Club Championship, broadcast status still explains 16% of the variation in his performance, and its regression coefficient is large and positive, with a p-value of 0.267:

Obviously, we're now starting to veer well into statistical insignificance, but the 95% confidence interval is [-7, 24], and it's still significantly more likely than not that the single most important factor in determining Niemann's performance remains broadcast status.

If you additionally flip the broadcast status of Niemann's third most significant tournament in the dataset, the 2020 Charlotte Fall GM, broadcast status still explains 6% of the variation in his performances, and its regression coefficient is large and positive, with a p-value of 0.5, and a 95% confidence interval of [-10, 20]. In this scenario, broadcast status is finally unlikely to be the single most important factor in determining Niemann's performance, at the last overtaken by the average rating of his opponents.

Should your analysis take into account what I mentioned before about seeded events and Swiss events?

If you mean that you believe the difference between those types of events implies a difference in preparation, which you (without evidence) believe is very strongly and positively correlated with broadcast status, you know what to do. Run the numbers and let me know what you find.

Should your analysis take into account the increased volatility of ratings of young players?

Niemann wasn't that young, and actually had very little ratings volatility (or growth) over the period. His USCF classical rating went from 2541 in December 2018 to 2569 at the end of the dataset in November 2020. His rating never dropped below 2496 or went above 2576 in that time. That's remarkably stable given USCF's K-factor of 15. (There's conflicting data online, but I believe that's the current schedule used for non-provisional players.)

No easy answer to those questions, and so I am not inclined to take anyone's spreadsheets as proof of anything more than one's ability to find data that supports their view.

I easily answered your questions.

I also didn't "find data that supports [my] view", as I more or less didn't have a view, though as I previously mentioned, I was somewhat (but not strongly) disinclined to believe Niemann after his Sinquefield interviews. I was sent the original tweet, and I thought it was interesting. I also wondered if the claims were true, and how statistically significant the observed effect was. I verified as much of the information as I reasonably could, found no verifiably contradictory information, gathered a bit more detailed data, and performed a simple and standard statistical analysis on it.

Notably, the only counterargument you've offered came from someone who made blatant, factual errors that were trivially easy to debunk.

The only people who can offer compelling arguments here are 1. Carlsen, who has refused to speak, and 2. Chesscom, who has claimed to send their information to Hans. Hans has not refuted this, so we can assume that Chesscom did indeed send Hans their information, and that Hans would prefer it not be public. I'm happy to hop on the online cheating bandwagon as a result, but OTB Carlsen must speak.

I mean, obviously there isn't and probably won't ever be any definitive proof that Niemann cheated at Sinquefield. Even the mechanical clicking caught on video just as Niemann appeared to fiddle with something behind his ear in his post round 3 interview can only ever be seen as highly suggestive. We will never know. But we can certainly use our heads and make well-informed arguments based on the information we do have.

As far as the people outside this forum are concerned, you are also a random tweet from a random person with little to no citations.

My use of "random" was obviously intended to be disparaging and connote the easily falsifiable nature of the claims made by Your Guy. In that sense, I'm not "random". My claims are verifiable to the extent that I have claimed, and neither my methods nor my results have yet to be seriously challenged. I provided links directly to every single data point I used, and showed the results of the (completely standard) regressions that I ran. Repeating my analysis is as simple as clicking on the links, verifying the numbers, and typing them into Excel, or R, or whatever. If you attempt to verify the claims of Your Guy, you immediately run into the issue that he committed obvious blunders and made factually incorrect claims, for example, by failing to understand how USCF classical ratings are calculated.

AndrewGrant · Post by **AndrewGrant** » Sun Sep 11, 2022 1:07 pm

As I said in my original post, I was unable to locate any information contradicting the claimed broadcast status of any of the events in the original tweet. So if by "the few events pointed out marked correctly or incorrectly" you mean the ones that Your Guy claimed were wrong, then all I can say is that I have seen no evidence that the original tweet was wrong, and I verified what I could by going to each tournament's official website.

Okay, so neither party can claim with certainty that the data is accurate? LOL. Nothing more to be discussed here then I reckon. Also, he is not 'My Guy'. He is 'A Guy' on Twitter who happened to respond to the claim with his own counter claim. Fuck me for hearing both sides lol?

I also consider it pretty unlikely that the original tweet systematically erred/lied about precisely those tournaments whose broadcast status I wasn't able to definitively confirm. While possible, it presupposes the original tweeter would somehow know exactly which information I would miss in my checks, or that he used the exact same information gathering process that I did, or that he otherwise committed the exact same errors that I did. But let's go wild and do some robustness checks, anyway.

I find it likely, and here is why: 100 people do this exact analysis in their own way. The one result that shows the greatest disparity becomes the most publicly talked about one. As far as I know 99 other people attempted to do this exact thing (which seems likely given how many posts there are about it on reddit), and those 99 failed to produce anything meaningful enough to share. However, the outlier gets published. Similar to how the majority of published research papers are wrong and or not repeatable.

The rest of the analysis on the numbers is bunk because it is predicated on the data from OTB events being accurate. If you cannot assert them as accurate, then the data is only conditional. IF the original posters data is full and correct, then the evidence is very strongly in favour of Han's cheating OTB. IF the original posters data is not full or not correct, then it has no value to the conversation.

Either verify the data is correct, or acknowledge the conditional outcome that I have presented. But just blanket saying guy A is right and guy B is wrong is extremely unconvincing.

DrCliche · Post by **DrCliche** » Sun Sep 11, 2022 1:37 pm

AndrewGrant wrote: ↑Sun Sep 11, 2022 1:07 pmEither verify the data is correct, or acknowledge the conditional outcome that I have presented. But just blanket saying guy A is right and guy B is wrong is extremely unconvincing.

I already made a reasonable effort to verify the correctness of the data, and you know that.

For the original tweet, I found no verifiably contradictory information. That doesn't mean that none exists, but in the limited time I allotted for my research, I discovered none. You also know my primary method for verifying the broadcast status of tournaments, as I mentioned it in my original post. I went directly to the official tournament websites and looked for mentions of or links to a live broadcast.

So far, your criticism amounts to nothing more than, "But you might be wrong!" Yeah, I might be. You haven't shown any evidence that I am, but I might be. You know who we know to be wrong, for sure? Your Guy.

When I looked at Your Guy's spreadsheet, I immediately recognized that he made simple, factual errors. I recognized instantly that he was erroneously claiming rapid events to have been incorrectly excluded from the original tweet's analysis because he wrongly believed the excluded events were classical.

I was able to recognize this so quickly because when I was verifying the claims of the original tweet, I, too, noticed that he excluded some events. Rather than jumping to conclusions and posting a "refutation" with no further thought, I instead chose to dig a little deeper. I learned that the excluded were events exclusively rapid events, despite the fact that they resulted in (slight) adjustments to Niemann's classical rating.

In other words, I was reasonably thorough. Not exhaustive, no. But reasonably thorough. Your Guy obviously was not. The claims of the original tweet and Your Guy are not on remotely even footing, and it's absurd for you to suggest otherwise.

AndrewGrant · Post by **AndrewGrant** » Sun Sep 11, 2022 1:40 pm

DrCliche wrote: ↑Sun Sep 11, 2022 1:37 pm
AndrewGrant wrote: ↑Sun Sep 11, 2022 1:07 pmEither verify the data is correct, or acknowledge the conditional outcome that I have presented. But just blanket saying guy A is right and guy B is wrong is extremely unconvincing.
I already made a reasonable effort to verify the correctness of the data, and you know that.

For the original tweet, I found no verifiably contradictory information. That doesn't mean that none exists, but in the limited time I allotted for my research, I discovered none. You also know my primary method for verifying the broadcast status of tournaments, as I mentioned it in my original post. I went directly to the official tournament websites and looked for mentions of or links to a live broadcast.

So far, your criticism amounts to nothing more than, "But you might be wrong!" Yeah, I might be. You haven't shown any evidence that I am, but I might be. You know who we know to be wrong, for sure? Your Guy.

When I looked at Your Guy's spreadsheet, I immediately recognized that he made simple, factual errors. I recognized instantly that he was erroneously claiming rapid events to have been incorrectly excluded from the original tweet's analysis because he wrongly believed the excluded events were classical.

I was able to recognize this so quickly because when I was verifying the claims of the original tweet, I, too, noticed that he excluded some events. Rather than jumping to conclusions and posting a "refutation" with no further thought, I instead chose to dig a little deeper. I learned that the excluded were events exclusively rapid events, despite the fact that they resulted in (slight) adjustments to Niemann's classical rating.

In other words, I was reasonably thorough. Not exhaustive, no. But reasonably thorough. Your Guy obviously was not. The claims of the original tweet and Your Guy are not on remotely even footing, and it's absurd for you to suggest otherwise.

Alright that sounds good. I'll hold out and hopefully we have more information in a day or so about the correctness of the data once it becomes more publicly seen. Someone has the information, and it will come out eventually with the power of the internet.

DrCliche · Post by **DrCliche** » Sun Sep 11, 2022 2:00 pm

AndrewGrant wrote: ↑Sun Sep 11, 2022 1:07 pmI find it likely, and here is why: 100 people do this exact analysis in their own way. The one result that shows the greatest disparity becomes the most publicly talked about one. As far as I know 99 other people attempted to do this exact thing (which seems likely given how many posts there are about it on reddit), and those 99 failed to produce anything meaningful enough to share. However, the outlier gets published. Similar to how the majority of published research papers are wrong and or not repeatable.

For your incredibly tortured logic to follow, you would need for 99 other interested parties to do the same sort of analysis, and come up short, AND not think it was worth mentioning at all (even though this issue is very polarizing and no doubt many would jump at the chance to vindicate Niemann with statistical analysis), AND (as I already explained) for the data gathering process used in the original tweet to just so happen to lie/err on precisely those bits of information that I didn't or wasn't able to falsify.

None of that is impossible, but your hypothetical superfecta is clearly on much shakier ground than my simple and well-reasoned analysis, especially given the abundance of other information we know about Niemann, his past cheating, and how easy it was to find more evidence of cheating both online (per chess.com, uncontested by Niemann) and OTB (e.g. his absurdly anomalous move statistics at Charlotte GM) once people starting looking.

DrCliche · Post by **DrCliche** » Sun Sep 11, 2022 2:28 pm

Discussion of divergent performances at tournaments in New York and Charlotte, where Niemann got his second and third GM norms: https://www.youtube.com/watch?v=AG9XeSPflrU.

Andrii Punin (an FM, so not world class, but not a patzer) shows wildly aberrant T1-3 and CP-loss statistics in Charlotte, and also steps through anomalous games from both Charlotte and New York where Niemann followed the PV of contemporaneous Stockfish (12 in Charlotte, 10 in New York) for many moves in a row, in difficult positions in crucial games.

Of course anyone who earns a norm must by definition have had an unusually good tournament ... but not that good!

AdminX · Post by **AdminX** » Sun Sep 11, 2022 3:00 pm

DrCliche wrote: ↑Sun Sep 11, 2022 2:28 pm Discussion of divergent performances at tournaments in New York and Charlotte, where Niemann got his second and third GM norms: https://www.youtube.com/watch?v=AG9XeSPflrU.

Andrii Punin (an FM, so not world class, but not a patzer) shows wildly aberrant T1-3 and CP-loss statistics in Charlotte, and also steps through anomalous games from both Charlotte and New York where Niemann followed the PV of contemporaneous Stockfish (12 in Charlotte, 10 in New York) for many moves in a row, in difficult positions in crucial games.

Of course anyone who earns a norm must by definition have had an unusually good tournament ... but not that good!

Wow!

Great video thanks for sharing. Reminds me of the quote "Figures Don't Lie, Liars Figure".

chrisw · Post by **chrisw** » Sun Sep 11, 2022 4:53 pm

DrCliche wrote: ↑Sun Sep 11, 2022 1:37 pm
AndrewGrant wrote: ↑Sun Sep 11, 2022 1:07 pmEither verify the data is correct, or acknowledge the conditional outcome that I have presented. But just blanket saying guy A is right and guy B is wrong is extremely unconvincing.
I already made a reasonable effort to verify the correctness of the data, and you know that.

For the original tweet, I found no verifiably contradictory information. That doesn't mean that none exists,

Well, leaving aside whether your data verification is sound or not, there’s a fail in the method. There’s no control group and without a control group, the “conclusion” is without value.
Take a look back at the original tweet thread and you’ll find that the OP didn’t think to generate a control group and, incredibly, declines to provide one when challenged.
The OP is obviously not fit in a scientific/educated/skilled sense to be taken seriously, there’s not much point in wasting time challenging his data collection and selection exercise for cherry picking and manipulation until it could be wrapped into some confirmation bias conclusion.
Moral of the story - never accept data tables and conclusions at face value, always check first before repeating.

but in the limited time I allotted for my research, I discovered none. You also know my primary method for verifying the broadcast status of tournaments, as I mentioned it in my original post. I went directly to the official tournament websites and looked for mentions of or links to a live broadcast.

So far, your criticism amounts to nothing more than, "But you might be wrong!" Yeah, I might be. You haven't shown any evidence that I am, but I might be. You know who we know to be wrong, for sure? Your Guy.

When I looked at Your Guy's spreadsheet, I immediately recognized that he made simple, factual errors. I recognized instantly that he was erroneously claiming rapid events to have been incorrectly excluded from the original tweet's analysis because he wrongly believed the excluded events were classical.

I was able to recognize this so quickly because when I was verifying the claims of the original tweet, I, too, noticed that he excluded some events. Rather than jumping to conclusions and posting a "refutation" with no further thought, I instead chose to dig a little deeper. I learned that the excluded were events exclusively rapid events, despite the fact that they resulted in (slight) adjustments to Niemann's classical rating.

In other words, I was reasonably thorough. Not exhaustive, no. But reasonably thorough. Your Guy obviously was not. The claims of the original tweet and Your Guy are not on remotely even footing, and it's absurd for you to suggest otherwise.

MonteCarlo · Post by **MonteCarlo** » Sun Sep 11, 2022 5:24 pm

As a clarification, it is not true that in the USCF quick games change your classical rating always.

Games with a total time (assuming 60 moves of increment or delay) greater than 10 minutes and less than 30 minutes only affect your quick rating.

Games with a total time between 30 and 65 minutes (inclusive of both bounds) are dual-rated (affect both quick and classical).

Games with a total time greater than 65 minutes affect only classical rating.

I haven't processed the rest of this data enough to form an opinion yet, but figured I should at least clarify this.

Cheers!

lkaufman · Post by **lkaufman** » Sun Sep 11, 2022 5:56 pm

DrCliche wrote: ↑Sun Sep 11, 2022 2:00 pm
AndrewGrant wrote: ↑Sun Sep 11, 2022 1:07 pmI find it likely, and here is why: 100 people do this exact analysis in their own way. The one result that shows the greatest disparity becomes the most publicly talked about one. As far as I know 99 other people attempted to do this exact thing (which seems likely given how many posts there are about it on reddit), and those 99 failed to produce anything meaningful enough to share. However, the outlier gets published. Similar to how the majority of published research papers are wrong and or not repeatable.
For your incredibly tortured logic to follow, you would need for 99 other interested parties to do the same sort of analysis, and come up short, AND not think it was worth mentioning at all (even though this issue is very polarizing and no doubt many would jump at the chance to vindicate Niemann with statistical analysis), AND (as I already explained) for the data gathering process used in the original tweet to just so happen to lie/err on precisely those bits of information that I didn't or wasn't able to falsify.

None of that is impossible, but your hypothetical superfecta is clearly on much shakier ground than my simple and well-reasoned analysis, especially given the abundance of other information we know about Niemann, his past cheating, and how easy it was to find more evidence of cheating both online (per chess.com, uncontested by Niemann) and OTB (e.g. his absurdly anomalous move statistics at Charlotte GM) once people starting looking.

I must say that I find your presentation to be pretty impressive. I would even add that according to your data, in all of the 9 events that were broadcast live, he gained elo points, while in none of the ten that were not broadcast did he gain any points (9 elo losses, one break-even). Furthermore his worst performance in the nine live events was 2495, whereas his best in the non-live was 2475! One doesn't need a PhD is statistics to conclude (assuming the data is complete and accurate) that this could not be due to chance. I would also rule out prep as a major factor; in all tournaments one can prep, it's just a question of how much prep time you have, and in any case the value of prep is much less at 2500 USCF level (roughly 2400 FIDE level) than at 2800 FIDE level; players simply make too many mistakes after the opening for an opening edge to translate to a much higher probability of victory, and of course prep works both ways. My own results in 3 World Senior Championships (where there was ample prep time) were about 100 elo above my actual elo, but this was probably due to the fact that many Seniors then had limited computer skills and were totally predictable in their opening choices, and of course to my own expertise in using engines to prep. Anyway, this analysis together with the huge dropoff in performance from the first 3 games of the current event to the next 5 (after delay was introduced), is HIGHLY suspicious. Before reading this I had seen no evidence of OTB cheating, only online, but now that has changed. I only wonder how Magnus was able to conclude that there was OTB cheating in his game, the game itself wasn't too suspicious if you assume that the 20 moves of prep was legitimate.

Carlsen withdrawal after loss to Niemann

Re: Carlsen withdrawal after loss to Niemann

Re: Carlsen withdrawal after loss to Niemann

Re: Carlsen withdrawal after loss to Niemann

Re: Carlsen withdrawal after loss to Niemann

Re: Carlsen withdrawal after loss to Niemann

Re: Carlsen withdrawal after loss to Niemann

Re: Carlsen withdrawal after loss to Niemann

Re: Carlsen withdrawal after loss to Niemann

Re: Carlsen withdrawal after loss to Niemann

Re: Carlsen withdrawal after loss to Niemann