Crowd Sourced processing of Super GMs + Hans for engine correlation

dkappe · Post by **dkappe** » Fri Sep 30, 2022 1:26 am

jhellis3 wrote: ↑Fri Sep 30, 2022 1:03 am You want a hint?

The match % is irrelevant. Whether they even played the singular move or not is irrelevant. Also, cherry picked examples are..... you guessed it, irrelevant.

If that doesn't provide enough to make the proper discernment, well...

After skipping the first 8 moves and any positions with only 1 legal move, Aronian had 44 positions of which 11 (25%) had one good move or “singular” in your parlance. If you want to continue being prickly and obtuse, go ahead. There was a reason I had you blocked.

jhellis3 · Post by **jhellis3** » Fri Sep 30, 2022 1:34 am

I am not sure you discerned the point....

chrisw · Post by **chrisw** » Fri Sep 30, 2022 1:48 am

Probably time to give up on this engine correlation thing and admit that Hans is completely busted by this game with a 100% engine match.

https://youtu.be/9k9j7EkcxDs

scchess · Post by **scchess** » Fri Sep 30, 2022 4:27 am

AndrewGrant wrote: ↑Thu Sep 29, 2022 8:30 am Hello. I have collected all games on chessgames.com for [Carlsen, Firouzja, Giri, Liren, Nakamura, Nepomniachtchi, Niemann, So]. I've uploaded them as individual files into a github repository here: https://github.com/AndyGrant/Correlation/

I have written a python utility, process_games.py, which goes through all games from those players. It performs a multipv=3 search at every position/move that the given player had to make. It saves all of this data, at depth=[16, 18, 20] to a file, one per PGN.

My goal: Process all of this data, using an assortment of engines. For example, locally I am producing the data using Ethereal 13.75. Then I will do 13.50, 13.25, 13.00. This is quite a long process. There are roughly 5,000 games in the collection. Locally, with Stockfish-15, I can process one game every minute or so per thread. On a 16 thread machine, this means about 10 games per minute. It will take ~10 hours to process them all, if I can do basic math.

I am looking for people willing to clone this repo, snag a public engine like building SF10/11/12/13/14/15 from source, or using any of the Komodo versions that are free (11/12/13), and process on their end. You can upload all the .analysis files somewhere, and I will collect them all in the repo. I am still working on the script. My progress so far is about to be commited to the repo, for anyone to view.

Please let me know if you think of more data we should be collecting.
Please let me know if you are willing to help in this process with your machine(s).
Please let me know which engine/version you would like to use, so that no one overlaps efforts.

I have dedicated hardware for chess for my works. If you can make the repo easy to run with a docker image such that I don't have to do anything but just run it, I will do it for you.

M ANSARI · Post by **M ANSARI** » Fri Sep 30, 2022 9:46 am

chrisw wrote: ↑Fri Sep 30, 2022 1:48 am Probably time to give up on this engine correlation thing and admit that Hans is completely busted by this game with a 100% engine match.

https://youtu.be/9k9j7EkcxDs

I think this game was extremely impressive ... very possibly we are seeing a future world chess champion ... and of course I don't mean Hans, but rather Mishra, the youngest human to ever get a GM norm. The loser of this game put incredible resistance and really when you think about it, this was played when he was 11 years old !!! My hats off to this kid !!!

adnoh · Post by **adnoh** » Fri Sep 30, 2022 12:15 pm

Here are the (incomplete) results for Komodo 13.3

https://1drv.ms/u/s!Aqe2-2i2H5hegZ8ysYB ... w?e=BAHPcH

2 of the PGN did not produce valid output analysis files.
Firouzja-2378185.pgn.analysis
So-2378164.pgn.analysis

For the 2 PGN, the script would not even start the engine task so I manually removed the FEN and SETUP tags thinking they should make no difference as they are starting at the normal board setup and with that change to the PGN I see the following output.

Code: Select all

C:\Users\Charles\Downloads\CC-Analysis>python process_games.py
error during pgn parsing
Traceback (most recent call last):
  File "C:\Users\Charles\AppData\Local\Programs\Python\Python310\lib\site-packages\chess\pgn.py", line 1685, in read_game
    move = visitor.parse_san(board_stack[-1], token)
  File "C:\Users\Charles\AppData\Local\Programs\Python\Python310\lib\site-packages\chess\pgn.py", line 1059, in parse_san
    return board.parse_san(san)
  File "C:\Users\Charles\AppData\Local\Programs\Python\Python310\lib\site-packages\chess\__init__.py", line 3039, in parse_san
    raise ValueError(f"illegal san: {san!r} in {self.fen()}")
ValueError: illegal san: 'Bh7' in rnbqkbnr/ppppppp1/7p/8/3PP3/8/PPP2PPP/RNBQKBNR b KQkq - 0 2
Finished analysis for Firouzja/2378185.pgn in 2.19 seconds
error during pgn parsing
Traceback (most recent call last):
  File "C:\Users\Charles\AppData\Local\Programs\Python\Python310\lib\site-packages\chess\pgn.py", line 1685, in read_game
    move = visitor.parse_san(board_stack[-1], token)
  File "C:\Users\Charles\AppData\Local\Programs\Python\Python310\lib\site-packages\chess\pgn.py", line 1059, in parse_san
    return board.parse_san(san)
  File "C:\Users\Charles\AppData\Local\Programs\Python\Python310\lib\site-packages\chess\__init__.py", line 3039, in parse_san
    raise ValueError(f"illegal san: {san!r} in {self.fen()}")
ValueError: illegal san: 'Ng3' in rnbqkbnr/pppp2pp/5p2/4p3/4P3/5P2/PPPP2PP/RNBQKBNR w KQkq - 0 3
Finished analysis for So/2378164.pgn in 4.70 seconds

C:\Users\Charles\Downloads\CC-Analysis>

Andrew: You alluded to this yourself that there are malformed PGN so how did you get a full set of working analysis files? Did you have to intervene?

adnoh · Post by **adnoh** » Fri Sep 30, 2022 12:36 pm

Following up on my own post above I just noticed the 2 games that failed were both Chess 960 games.

https://www.uschesschamps.com/2022-cham ... x/overview

Is the solution to modify the process_games script to setoption on the UCI_Chess960 flag when it detects a non standard starting position?

RobertJBarker3 · Post by **RobertJBarker3** » Fri Sep 30, 2022 4:26 pm

scchess wrote: ↑Fri Sep 30, 2022 4:27 am
AndrewGrant wrote: ↑Thu Sep 29, 2022 8:30 am Hello. I have collected all games on chessgames.com for [Carlsen, Firouzja, Giri, Liren, Nakamura, Nepomniachtchi, Niemann, So]. I've uploaded them as individual files into a github repository here: https://github.com/AndyGrant/Correlation/

I have written a python utility, process_games.py, which goes through all games from those players. It performs a multipv=3 search at every position/move that the given player had to make. It saves all of this data, at depth=[16, 18, 20] to a file, one per PGN.

My goal: Process all of this data, using an assortment of engines. For example, locally I am producing the data using Ethereal 13.75. Then I will do 13.50, 13.25, 13.00. This is quite a long process. There are roughly 5,000 games in the collection. Locally, with Stockfish-15, I can process one game every minute or so per thread. On a 16 thread machine, this means about 10 games per minute. It will take ~10 hours to process them all, if I can do basic math.

I am looking for people willing to clone this repo, snag a public engine like building SF10/11/12/13/14/15 from source, or using any of the Komodo versions that are free (11/12/13), and process on their end. You can upload all the .analysis files somewhere, and I will collect them all in the repo. I am still working on the script. My progress so far is about to be commited to the repo, for anyone to view.

Please let me know if you think of more data we should be collecting.
Please let me know if you are willing to help in this process with your machine(s).
Please let me know which engine/version you would like to use, so that no one overlaps efforts.
I have dedicated hardware for chess for my works. If you can make the repo easy to run with a docker image such that I don't have to do anything but just run it, I will do it for you.

Just pull the repo, add your selected engine's path to process_games.py and run it. It will output all results as .analysis files in the folder.

Ferdy · Post by **Ferdy** » Fri Sep 30, 2022 5:12 pm

chrisw wrote: ↑Fri Sep 30, 2022 1:48 am Probably time to give up on this engine correlation thing and admit that Hans is completely busted by this game with a 100% engine match.

https://youtu.be/9k9j7EkcxDs

I analyzed this game for all positions, with sf15 at 10s per position on i7-2600K processor, single thread, engine uses 256mb hash.

Here is the result, this is the output from my cheater detector. There are other features that are not shown.

Code: Select all

   game                    event        date                name  plycnt  \
0     1  Charlotte CCCSA Fall GM  2020.10.04   mishra, abhimanyu     112
1     1  Charlotte CCCSA Fall GM  2020.10.04  niemann, hans moke     112

   rating  oppr  color  b1  b2  b3  b4  b5  b6  b7  m1  m2  m3  m4  m5  dubi  \
0    2342  2465  white   0   0   0   0   0   1   0   0   0   0   0   1     3
1    2465  2342  black   0   0   0   0   0   0   0   0   0   0   0   1     5

   merr  pos   t1%   t2%   t3%  wmerr  wpos  wt1%  wfail%  wfcnt  wmate%  \
0    28   38  42.1  34.2  18.4    300     0   0.0   100.0      0     0.0
1    20   37  56.8  13.5  16.2      0    19  89.5     0.0     19    80.0

   wmatecnt  t1cplx%  cntcplx  merrcplx  point
0         0        0        1       163      0
1         5        0        2        60      1

Mishra white had 3 dubious moves, not really serious the score range is from -50 to -100cp. Niemann black also had 5 dubious moves, see dubi column. Both players had 1 category 5 mistake each, see column m5, m=mistake, 5 is the category. Score range is from -100 to -299. The best score according to engine is -99 or better. Mishra had 1 category 6 blunder, see column b6, meaning his move resulted to a score of -300 or less while the best engine score is only in the range -100 to -200, that means Mishra blundered from a bad position.

Let's check the error in cp. When the position is not yet losing (<= -300) or winning(>= 300), Mishra had a mean error of 28 while Niemann had a mean error of 20, see column merr. Very small difference but Niemann is better.

Top1, top2, top3 percentage when the position is not yet losing or winning, Mishra got top1 engine move match at 42.1% while Niemann got 56.8%. Niemann is dominating here, see column t1%. t2% and t3% are also shown.

The column wmerror means the player error when the position is already winning. Mishra got 300 meaning he did not get a winning position while Niemann got 19 winning (score >= 300) positions and his mean error in cp is zero, meaning Niemann in a winning position, is very accurate.

The wt1% column refers to percentage match of top1 move from the engine when the position is already winning. Mishra got 0 while Niemann got 89.5%. So in a winning position, Niemann in this game, is indeed dangerous - does not forgive Mishra.

There is a column cntcplx, that is the number of positions each player had experienced a complex positions. I have an algorithm/criteria on how to compute position complexity, one of those is the presence of queen on the board. Mishra got 1, while Niemann got 2. There is a column merrcplx, that is the mean error in cp when the position is complex. This will measure the player's evaluation accuracy in a complex position. Mishra got 163 cp, while Niemann got 60. So Nieman is better at handling complex position, he even had 2.

From experience analyzing some lichess and chesscom games, cheaters have a high top1% match move when the position is not yet losing or winning when the player rating is in the range 1500 to 2000. Cheaters also have high top1% match move when the position is already winning. Cheaters usually does not give the opponent a good position. Cheaters may commit dubious moves, but not a mistake (score <= -100 cp).

These players are already strong, Mishra has 2342 and Niemann has 2465.

Was that really true? From youtube title "Hans Niemann is AlphaZero CONFIRMED!"?

chrisw · Post by **chrisw** » Fri Sep 30, 2022 6:02 pm

Ferdy wrote: ↑Fri Sep 30, 2022 5:12 pm
chrisw wrote: ↑Fri Sep 30, 2022 1:48 am Probably time to give up on this engine correlation thing and admit that Hans is completely busted by this game with a 100% engine match.

https://youtu.be/9k9j7EkcxDs
I analyzed this game for all positions, with sf15 at 10s per position on i7-2600K processor, single thread, engine uses 256mb hash.

Here is the result, this is the output from my cheater detector. There are other features that are not shown.
Code: Select all
   game                    event        date                name  plycnt  \
0     1  Charlotte CCCSA Fall GM  2020.10.04   mishra, abhimanyu     112
1     1  Charlotte CCCSA Fall GM  2020.10.04  niemann, hans moke     112

   rating  oppr  color  b1  b2  b3  b4  b5  b6  b7  m1  m2  m3  m4  m5  dubi  \
0    2342  2465  white   0   0   0   0   0   1   0   0   0   0   0   1     3
1    2465  2342  black   0   0   0   0   0   0   0   0   0   0   0   1     5

   merr  pos   t1%   t2%   t3%  wmerr  wpos  wt1%  wfail%  wfcnt  wmate%  \
0    28   38  42.1  34.2  18.4    300     0   0.0   100.0      0     0.0
1    20   37  56.8  13.5  16.2      0    19  89.5     0.0     19    80.0

   wmatecnt  t1cplx%  cntcplx  merrcplx  point
0         0        0        1       163      0
1         5        0        2        60      1
Mishra white had 3 dubious moves, not really serious the score range is from -50 to -100cp. Niemann black also had 5 dubious moves, see dubi column. Both players had 1 category 5 mistake each, see column m5, m=mistake, 5 is the category. Score range is from -100 to -299. The best score according to engine is -99 or better. Mishra had 1 category 6 blunder, see column b6, meaning his move resulted to a score of -300 or less while the best engine score is only in the range -100 to -200, that means Mishra blundered from a bad position.

Let's check the error in cp. When the position is not yet losing (<= -300) or winning(>= 300), Mishra had a mean error of 28 while Niemann had a mean error of 20, see column merr. Very small difference but Niemann is better.

Top1, top2, top3 percentage when the position is not yet losing or winning, Mishra got top1 engine move match at 42.1% while Niemann got 56.8%. Niemann is dominating here, see column t1%. t2% and t3% are also shown.

The column wmerror means the player error when the position is already winning. Mishra got 300 meaning he did not get a winning position while Niemann got 19 winning (score >= 300) positions and his mean error in cp is zero, meaning Niemann in a winning position, is very accurate.

The wt1% column refers to percentage match of top1 move from the engine when the position is already winning. Mishra got 0 while Niemann got 89.5%. So in a winning position, Niemann in this game, is indeed dangerous - does not forgive Mishra.

There is a column cntcplx, that is the number of positions each player had experienced a complex positions. I have an algorithm/criteria on how to compute position complexity, one of those is the presence of queen on the board. Mishra got 1, while Niemann got 2. There is a column merrcplx, that is the mean error in cp when the position is complex. This will measure the player's evaluation accuracy in a complex position. Mishra got 163 cp, while Niemann got 60. So Nieman is better at handling complex position, he even had 2.

From experience analyzing some lichess and chesscom games, cheaters have a high top1% match move when the position is not yet losing or winning when the player rating is in the range 1500 to 2000. Cheaters also have high top1% match move when the position is already winning. Cheaters usually does not give the opponent a good position. Cheaters may commit dubious moves, but not a mistake (score <= -100 cp).

These players are already strong, Mishra has 2342 and Niemann has 2465.

Was that really true? From youtube title "Hans Niemann is AlphaZero CONFIRMED!"?

Ha! Yes, you need to watch the Agadmator video to the end to get the real point. He released it as comment to the recent 100% match video from the French IM. Agadmator did say the Hans had a near 100% engine match score, I’m not sure what software he was using, or which engines, but possibly the CB analysis program. But then he slyly slipped in (after Hans with near 100% comment), but so did Mishra. They both scored near 100%
I guess that this was agadmators advice to always check both the data and the functionality of the tools one is using.

Crowd Sourced processing of Super GMs + Hans for engine correlation

Re: Crowd Sourced processing of Super GMs + Hans for engine correlation

Re: Crowd Sourced processing of Super GMs + Hans for engine correlation

Re: Crowd Sourced processing of Super GMs + Hans for engine correlation

Re: Crowd Sourced processing of Super GMs + Hans for engine correlation

Re: Crowd Sourced processing of Super GMs + Hans for engine correlation

Re: Crowd Sourced processing of Super GMs + Hans for engine correlation

Re: Crowd Sourced processing of Super GMs + Hans for engine correlation

Re: Crowd Sourced processing of Super GMs + Hans for engine correlation

Re: Crowd Sourced processing of Super GMs + Hans for engine correlation

Re: Crowd Sourced processing of Super GMs + Hans for engine correlation