Crowd Sourced processing of Super GMs + Hans for engine correlation

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

dkappe
Posts: 1632
Joined: Tue Aug 21, 2018 7:52 pm
Full name: Dietrich Kappe

Re: Crowd Sourced processing of Super GMs + Hans for engine correlation

Post by dkappe »

jhellis3 wrote: Fri Sep 30, 2022 1:03 am You want a hint?

The match % is irrelevant. Whether they even played the singular move or not is irrelevant. Also, cherry picked examples are..... you guessed it, irrelevant.

If that doesn't provide enough to make the proper discernment, well...
After skipping the first 8 moves and any positions with only 1 legal move, Aronian had 44 positions of which 11 (25%) had one good move or “singular” in your parlance. If you want to continue being prickly and obtuse, go ahead. There was a reason I had you blocked. :-)
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".
jhellis3
Posts: 548
Joined: Sat Aug 17, 2013 12:36 am

Re: Crowd Sourced processing of Super GMs + Hans for engine correlation

Post by jhellis3 »

I am not sure you discerned the point....
chrisw
Posts: 4764
Joined: Tue Apr 03, 2012 4:28 pm
Location: Midi-Pyrénées
Full name: Christopher Whittington

Re: Crowd Sourced processing of Super GMs + Hans for engine correlation

Post by chrisw »

Probably time to give up on this engine correlation thing and admit that Hans is completely busted by this game with a 100% engine match.

https://youtu.be/9k9j7EkcxDs
scchess
Posts: 51
Joined: Mon Jan 04, 2021 4:12 pm
Full name: scchess

Re: Crowd Sourced processing of Super GMs + Hans for engine correlation

Post by scchess »

AndrewGrant wrote: Thu Sep 29, 2022 8:30 am Hello. I have collected all games on chessgames.com for [Carlsen, Firouzja, Giri, Liren, Nakamura, Nepomniachtchi, Niemann, So]. I've uploaded them as individual files into a github repository here: https://github.com/AndyGrant/Correlation/

I have written a python utility, process_games.py, which goes through all games from those players. It performs a multipv=3 search at every position/move that the given player had to make. It saves all of this data, at depth=[16, 18, 20] to a file, one per PGN.

My goal: Process all of this data, using an assortment of engines. For example, locally I am producing the data using Ethereal 13.75. Then I will do 13.50, 13.25, 13.00. This is quite a long process. There are roughly 5,000 games in the collection. Locally, with Stockfish-15, I can process one game every minute or so per thread. On a 16 thread machine, this means about 10 games per minute. It will take ~10 hours to process them all, if I can do basic math.

I am looking for people willing to clone this repo, snag a public engine like building SF10/11/12/13/14/15 from source, or using any of the Komodo versions that are free (11/12/13), and process on their end. You can upload all the .analysis files somewhere, and I will collect them all in the repo. I am still working on the script. My progress so far is about to be commited to the repo, for anyone to view.

Please let me know if you think of more data we should be collecting.
Please let me know if you are willing to help in this process with your machine(s).
Please let me know which engine/version you would like to use, so that no one overlaps efforts.
I have dedicated hardware for chess for my works. If you can make the repo easy to run with a docker image such that I don't have to do anything but just run it, I will do it for you.
User avatar
M ANSARI
Posts: 3734
Joined: Thu Mar 16, 2006 7:10 pm

Re: Crowd Sourced processing of Super GMs + Hans for engine correlation

Post by M ANSARI »

chrisw wrote: Fri Sep 30, 2022 1:48 am Probably time to give up on this engine correlation thing and admit that Hans is completely busted by this game with a 100% engine match.

https://youtu.be/9k9j7EkcxDs
I think this game was extremely impressive ... very possibly we are seeing a future world chess champion ... and of course I don't mean Hans, but rather Mishra, the youngest human to ever get a GM norm. The loser of this game put incredible resistance and really when you think about it, this was played when he was 11 years old !!! My hats off to this kid !!!
adnoh
Posts: 76
Joined: Tue Jun 26, 2007 6:31 am
Full name: Charles Wong

Re: Crowd Sourced processing of Super GMs + Hans for engine correlation

Post by adnoh »

Here are the (incomplete) results for Komodo 13.3

https://1drv.ms/u/s!Aqe2-2i2H5hegZ8ysYB ... w?e=BAHPcH

2 of the PGN did not produce valid output analysis files.
Firouzja-2378185.pgn.analysis
So-2378164.pgn.analysis

For the 2 PGN, the script would not even start the engine task so I manually removed the FEN and SETUP tags thinking they should make no difference as they are starting at the normal board setup and with that change to the PGN I see the following output.

Code: Select all

C:\Users\Charles\Downloads\CC-Analysis>python process_games.py
error during pgn parsing
Traceback (most recent call last):
  File "C:\Users\Charles\AppData\Local\Programs\Python\Python310\lib\site-packages\chess\pgn.py", line 1685, in read_game
    move = visitor.parse_san(board_stack[-1], token)
  File "C:\Users\Charles\AppData\Local\Programs\Python\Python310\lib\site-packages\chess\pgn.py", line 1059, in parse_san
    return board.parse_san(san)
  File "C:\Users\Charles\AppData\Local\Programs\Python\Python310\lib\site-packages\chess\__init__.py", line 3039, in parse_san
    raise ValueError(f"illegal san: {san!r} in {self.fen()}")
ValueError: illegal san: 'Bh7' in rnbqkbnr/ppppppp1/7p/8/3PP3/8/PPP2PPP/RNBQKBNR b KQkq - 0 2
Finished analysis for Firouzja/2378185.pgn in 2.19 seconds
error during pgn parsing
Traceback (most recent call last):
  File "C:\Users\Charles\AppData\Local\Programs\Python\Python310\lib\site-packages\chess\pgn.py", line 1685, in read_game
    move = visitor.parse_san(board_stack[-1], token)
  File "C:\Users\Charles\AppData\Local\Programs\Python\Python310\lib\site-packages\chess\pgn.py", line 1059, in parse_san
    return board.parse_san(san)
  File "C:\Users\Charles\AppData\Local\Programs\Python\Python310\lib\site-packages\chess\__init__.py", line 3039, in parse_san
    raise ValueError(f"illegal san: {san!r} in {self.fen()}")
ValueError: illegal san: 'Ng3' in rnbqkbnr/pppp2pp/5p2/4p3/4P3/5P2/PPPP2PP/RNBQKBNR w KQkq - 0 3
Finished analysis for So/2378164.pgn in 4.70 seconds

C:\Users\Charles\Downloads\CC-Analysis>
Andrew: You alluded to this yourself that there are malformed PGN so how did you get a full set of working analysis files? Did you have to intervene?
adnoh
Posts: 76
Joined: Tue Jun 26, 2007 6:31 am
Full name: Charles Wong

Re: Crowd Sourced processing of Super GMs + Hans for engine correlation

Post by adnoh »

Following up on my own post above I just noticed the 2 games that failed were both Chess 960 games.

https://www.uschesschamps.com/2022-cham ... x/overview

Is the solution to modify the process_games script to setoption on the UCI_Chess960 flag when it detects a non standard starting position?
RobertJBarker3
Posts: 5
Joined: Thu Sep 29, 2022 3:25 pm
Full name: Robert Barker

Re: Crowd Sourced processing of Super GMs + Hans for engine correlation

Post by RobertJBarker3 »

scchess wrote: Fri Sep 30, 2022 4:27 am
AndrewGrant wrote: Thu Sep 29, 2022 8:30 am Hello. I have collected all games on chessgames.com for [Carlsen, Firouzja, Giri, Liren, Nakamura, Nepomniachtchi, Niemann, So]. I've uploaded them as individual files into a github repository here: https://github.com/AndyGrant/Correlation/

I have written a python utility, process_games.py, which goes through all games from those players. It performs a multipv=3 search at every position/move that the given player had to make. It saves all of this data, at depth=[16, 18, 20] to a file, one per PGN.

My goal: Process all of this data, using an assortment of engines. For example, locally I am producing the data using Ethereal 13.75. Then I will do 13.50, 13.25, 13.00. This is quite a long process. There are roughly 5,000 games in the collection. Locally, with Stockfish-15, I can process one game every minute or so per thread. On a 16 thread machine, this means about 10 games per minute. It will take ~10 hours to process them all, if I can do basic math.

I am looking for people willing to clone this repo, snag a public engine like building SF10/11/12/13/14/15 from source, or using any of the Komodo versions that are free (11/12/13), and process on their end. You can upload all the .analysis files somewhere, and I will collect them all in the repo. I am still working on the script. My progress so far is about to be commited to the repo, for anyone to view.

Please let me know if you think of more data we should be collecting.
Please let me know if you are willing to help in this process with your machine(s).
Please let me know which engine/version you would like to use, so that no one overlaps efforts.
I have dedicated hardware for chess for my works. If you can make the repo easy to run with a docker image such that I don't have to do anything but just run it, I will do it for you.
Just pull the repo, add your selected engine's path to process_games.py and run it. It will output all results as .analysis files in the folder.
Ferdy
Posts: 4851
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Crowd Sourced processing of Super GMs + Hans for engine correlation

Post by Ferdy »

chrisw wrote: Fri Sep 30, 2022 1:48 am Probably time to give up on this engine correlation thing and admit that Hans is completely busted by this game with a 100% engine match.

https://youtu.be/9k9j7EkcxDs
I analyzed this game for all positions, with sf15 at 10s per position on i7-2600K processor, single thread, engine uses 256mb hash.

Here is the result, this is the output from my cheater detector. There are other features that are not shown.

Code: Select all

   game                    event        date                name  plycnt  \
0     1  Charlotte CCCSA Fall GM  2020.10.04   mishra, abhimanyu     112
1     1  Charlotte CCCSA Fall GM  2020.10.04  niemann, hans moke     112

   rating  oppr  color  b1  b2  b3  b4  b5  b6  b7  m1  m2  m3  m4  m5  dubi  \
0    2342  2465  white   0   0   0   0   0   1   0   0   0   0   0   1     3
1    2465  2342  black   0   0   0   0   0   0   0   0   0   0   0   1     5

   merr  pos   t1%   t2%   t3%  wmerr  wpos  wt1%  wfail%  wfcnt  wmate%  \
0    28   38  42.1  34.2  18.4    300     0   0.0   100.0      0     0.0
1    20   37  56.8  13.5  16.2      0    19  89.5     0.0     19    80.0

   wmatecnt  t1cplx%  cntcplx  merrcplx  point
0         0        0        1       163      0
1         5        0        2        60      1
Mishra white had 3 dubious moves, not really serious the score range is from -50 to -100cp. Niemann black also had 5 dubious moves, see dubi column. Both players had 1 category 5 mistake each, see column m5, m=mistake, 5 is the category. Score range is from -100 to -299. The best score according to engine is -99 or better. Mishra had 1 category 6 blunder, see column b6, meaning his move resulted to a score of -300 or less while the best engine score is only in the range -100 to -200, that means Mishra blundered from a bad position.

Let's check the error in cp. When the position is not yet losing (<= -300) or winning(>= 300), Mishra had a mean error of 28 while Niemann had a mean error of 20, see column merr. Very small difference but Niemann is better.

Top1, top2, top3 percentage when the position is not yet losing or winning, Mishra got top1 engine move match at 42.1% while Niemann got 56.8%. Niemann is dominating here, see column t1%. t2% and t3% are also shown.

The column wmerror means the player error when the position is already winning. Mishra got 300 meaning he did not get a winning position while Niemann got 19 winning (score >= 300) positions and his mean error in cp is zero, meaning Niemann in a winning position, is very accurate.

The wt1% column refers to percentage match of top1 move from the engine when the position is already winning. Mishra got 0 while Niemann got 89.5%. So in a winning position, Niemann in this game, is indeed dangerous - does not forgive Mishra.

There is a column cntcplx, that is the number of positions each player had experienced a complex positions. I have an algorithm/criteria on how to compute position complexity, one of those is the presence of queen on the board. Mishra got 1, while Niemann got 2. There is a column merrcplx, that is the mean error in cp when the position is complex. This will measure the player's evaluation accuracy in a complex position. Mishra got 163 cp, while Niemann got 60. So Nieman is better at handling complex position, he even had 2.

From experience analyzing some lichess and chesscom games, cheaters have a high top1% match move when the position is not yet losing or winning when the player rating is in the range 1500 to 2000. Cheaters also have high top1% match move when the position is already winning. Cheaters usually does not give the opponent a good position. Cheaters may commit dubious moves, but not a mistake (score <= -100 cp).

These players are already strong, Mishra has 2342 and Niemann has 2465.

Was that really true? From youtube title "Hans Niemann is AlphaZero CONFIRMED!"?
chrisw
Posts: 4764
Joined: Tue Apr 03, 2012 4:28 pm
Location: Midi-Pyrénées
Full name: Christopher Whittington

Re: Crowd Sourced processing of Super GMs + Hans for engine correlation

Post by chrisw »

Ferdy wrote: Fri Sep 30, 2022 5:12 pm
chrisw wrote: Fri Sep 30, 2022 1:48 am Probably time to give up on this engine correlation thing and admit that Hans is completely busted by this game with a 100% engine match.

https://youtu.be/9k9j7EkcxDs
I analyzed this game for all positions, with sf15 at 10s per position on i7-2600K processor, single thread, engine uses 256mb hash.

Here is the result, this is the output from my cheater detector. There are other features that are not shown.

Code: Select all

   game                    event        date                name  plycnt  \
0     1  Charlotte CCCSA Fall GM  2020.10.04   mishra, abhimanyu     112
1     1  Charlotte CCCSA Fall GM  2020.10.04  niemann, hans moke     112

   rating  oppr  color  b1  b2  b3  b4  b5  b6  b7  m1  m2  m3  m4  m5  dubi  \
0    2342  2465  white   0   0   0   0   0   1   0   0   0   0   0   1     3
1    2465  2342  black   0   0   0   0   0   0   0   0   0   0   0   1     5

   merr  pos   t1%   t2%   t3%  wmerr  wpos  wt1%  wfail%  wfcnt  wmate%  \
0    28   38  42.1  34.2  18.4    300     0   0.0   100.0      0     0.0
1    20   37  56.8  13.5  16.2      0    19  89.5     0.0     19    80.0

   wmatecnt  t1cplx%  cntcplx  merrcplx  point
0         0        0        1       163      0
1         5        0        2        60      1
Mishra white had 3 dubious moves, not really serious the score range is from -50 to -100cp. Niemann black also had 5 dubious moves, see dubi column. Both players had 1 category 5 mistake each, see column m5, m=mistake, 5 is the category. Score range is from -100 to -299. The best score according to engine is -99 or better. Mishra had 1 category 6 blunder, see column b6, meaning his move resulted to a score of -300 or less while the best engine score is only in the range -100 to -200, that means Mishra blundered from a bad position.

Let's check the error in cp. When the position is not yet losing (<= -300) or winning(>= 300), Mishra had a mean error of 28 while Niemann had a mean error of 20, see column merr. Very small difference but Niemann is better.

Top1, top2, top3 percentage when the position is not yet losing or winning, Mishra got top1 engine move match at 42.1% while Niemann got 56.8%. Niemann is dominating here, see column t1%. t2% and t3% are also shown.

The column wmerror means the player error when the position is already winning. Mishra got 300 meaning he did not get a winning position while Niemann got 19 winning (score >= 300) positions and his mean error in cp is zero, meaning Niemann in a winning position, is very accurate.

The wt1% column refers to percentage match of top1 move from the engine when the position is already winning. Mishra got 0 while Niemann got 89.5%. So in a winning position, Niemann in this game, is indeed dangerous - does not forgive Mishra.

There is a column cntcplx, that is the number of positions each player had experienced a complex positions. I have an algorithm/criteria on how to compute position complexity, one of those is the presence of queen on the board. Mishra got 1, while Niemann got 2. There is a column merrcplx, that is the mean error in cp when the position is complex. This will measure the player's evaluation accuracy in a complex position. Mishra got 163 cp, while Niemann got 60. So Nieman is better at handling complex position, he even had 2.

From experience analyzing some lichess and chesscom games, cheaters have a high top1% match move when the position is not yet losing or winning when the player rating is in the range 1500 to 2000. Cheaters also have high top1% match move when the position is already winning. Cheaters usually does not give the opponent a good position. Cheaters may commit dubious moves, but not a mistake (score <= -100 cp).

These players are already strong, Mishra has 2342 and Niemann has 2465.

Was that really true? From youtube title "Hans Niemann is AlphaZero CONFIRMED!"?
Ha! Yes, you need to watch the Agadmator video to the end to get the real point. He released it as comment to the recent 100% match video from the French IM. Agadmator did say the Hans had a near 100% engine match score, I’m not sure what software he was using, or which engines, but possibly the CB analysis program. But then he slyly slipped in (after Hans with near 100% comment), but so did Mishra. They both scored near 100%
I guess that this was agadmators advice to always check both the data and the functionality of the tools one is using.