Hello. I have collected all games on chessgames.com for [Carlsen, Firouzja, Giri, Liren, Nakamura, Nepomniachtchi, Niemann, So]. I've uploaded them as individual files into a github repository here: https://github.com/AndyGrant/Correlation/
I have written a python utility, process_games.py, which goes through all games from those players. It performs a multipv=3 search at every position/move that the given player had to make. It saves all of this data, at depth=[16, 18, 20] to a file, one per PGN.
My goal: Process all of this data, using an assortment of engines. For example, locally I am producing the data using Ethereal 13.75. Then I will do 13.50, 13.25, 13.00. This is quite a long process. There are roughly 5,000 games in the collection. Locally, with Stockfish-15, I can process one game every minute or so per thread. On a 16 thread machine, this means about 10 games per minute. It will take ~10 hours to process them all, if I can do basic math.
I am looking for people willing to clone this repo, snag a public engine like building SF10/11/12/13/14/15 from source, or using any of the Komodo versions that are free (11/12/13), and process on their end. You can upload all the .analysis files somewhere, and I will collect them all in the repo. I am still working on the script. My progress so far is about to be commited to the repo, for anyone to view.
Please let me know if you think of more data we should be collecting.
Please let me know if you are willing to help in this process with your machine(s).
Please let me know which engine/version you would like to use, so that no one overlaps efforts.
Crowd Sourced processing of Super GMs + Hans for engine correlation
Moderator: Ras
-
AndrewGrant
- Posts: 1966
- Joined: Tue Apr 19, 2016 6:08 am
- Location: U.S.A
- Full name: Andrew Grant
-
AndrewGrant
- Posts: 1966
- Joined: Tue Apr 19, 2016 6:08 am
- Location: U.S.A
- Full name: Andrew Grant
Re: Crowd Sourced processing of Super GMs + Hans for engine correlation
I think the metric of "How often does player X match engine Y's top 3 moves" is not great for cheating detection by itself. However, the data in general is interesting. It would be nice to have a collection for these players, from many engines/versions, at various depths. Then you can compare how that is per engine, per player, etc.
The data I am collecting as of now also includes the evaluation for the top 3 moves. So someone could look to filter the data a bit for example -- tossing out positions where there is an "only" move, for example.
No clue yet how to process the final product and make something meaningful, but every PGN getting read will produce a .json file that can be read later.
The data I am collecting as of now also includes the evaluation for the top 3 moves. So someone could look to filter the data a bit for example -- tossing out positions where there is an "only" move, for example.
No clue yet how to process the final product and make something meaningful, but every PGN getting read will produce a .json file that can be read later.
-
Scally
- Posts: 232
- Joined: Thu Sep 28, 2017 9:34 pm
- Location: Bermondsey, London
- Full name: Alan Cooper
Re: Crowd Sourced processing of Super GMs + Hans for engine correlation
Hi Andrew,
I’ll do the Stockfish searches as I have all versions, as long as you don’t mind this being done on a Raspberry Pi 4 running at 2 GHz (my development RPi for building & testing Picochess Engines, NNUEs & running Cutechess tournaments)
Al
I’ll do the Stockfish searches as I have all versions, as long as you don’t mind this being done on a Raspberry Pi 4 running at 2 GHz (my development RPi for building & testing Picochess Engines, NNUEs & running Cutechess tournaments)
Al
-
AndrewGrant
- Posts: 1966
- Joined: Tue Apr 19, 2016 6:08 am
- Location: U.S.A
- Full name: Andrew Grant
Re: Crowd Sourced processing of Super GMs + Hans for engine correlation
That would be fine, but I'm guessing it is going to take you many many days to execute this on the Pi.
The whole dataset looks like 10 hrs for a Ryzen 3700x. So it might not be tenable for you on your setup.
-
AndrewGrant
- Posts: 1966
- Joined: Tue Apr 19, 2016 6:08 am
- Location: U.S.A
- Full name: Andrew Grant
Re: Crowd Sourced processing of Super GMs + Hans for engine correlation
Assuming no broken parsing, the process is to clone that repo, modify the "ENGINE =" line of the .py file to point to your compiler engine, and then run. It should create ~5k .analysis files. A few errors will be thrown while running, -- it seems some PGNs are malformed --. After completion, have to upload somewhere. Presumably it should not be that much data. Process will be slower for engines that are slow time to depth searchers. Really depends on the engine.
I'm running Ethereal 13.00 and Ethereal 13.25 on my two boxes now.
I'm running Ethereal 13.00 and Ethereal 13.25 on my two boxes now.
-
Scally
- Posts: 232
- Joined: Thu Sep 28, 2017 9:34 pm
- Location: Bermondsey, London
- Full name: Alan Cooper
Re: Crowd Sourced processing of Super GMs + Hans for engine correlation
-
Scally
- Posts: 232
- Joined: Thu Sep 28, 2017 9:34 pm
- Location: Bermondsey, London
- Full name: Alan Cooper
Re: Crowd Sourced processing of Super GMs + Hans for engine correlation
I stopped and restarted it without the -j4 as your program works out the threads/cores
Al.
Al.
-
chrisw
- Posts: 4768
- Joined: Tue Apr 03, 2012 4:28 pm
- Location: Midi-Pyrénées
- Full name: Christopher Whittington
Re: Crowd Sourced processing of Super GMs + Hans for engine correlation
I could do this using the current dev version of Chess System Tal (about 3580 Elo on the Rebel gambit list). Inclusion of a unpublished reference engine that could NOT possibly have been used by any player might be useful.AndrewGrant wrote: ↑Thu Sep 29, 2022 8:30 am Hello. I have collected all games on chessgames.com for [Carlsen, Firouzja, Giri, Liren, Nakamura, Nepomniachtchi, Niemann, So]. I've uploaded them as individual files into a github repository here: https://github.com/AndyGrant/Correlation/
I have written a python utility, process_games.py, which goes through all games from those players. It performs a multipv=3 search at every position/move that the given player had to make. It saves all of this data, at depth=[16, 18, 20] to a file, one per PGN.
My goal: Process all of this data, using an assortment of engines. For example, locally I am producing the data using Ethereal 13.75. Then I will do 13.50, 13.25, 13.00. This is quite a long process. There are roughly 5,000 games in the collection. Locally, with Stockfish-15, I can process one game every minute or so per thread. On a 16 thread machine, this means about 10 games per minute. It will take ~10 hours to process them all, if I can do basic math.
I am looking for people willing to clone this repo, snag a public engine like building SF10/11/12/13/14/15 from source, or using any of the Komodo versions that are free (11/12/13), and process on their end. You can upload all the .analysis files somewhere, and I will collect them all in the repo. I am still working on the script. My progress so far is about to be commited to the repo, for anyone to view.
Please let me know if you think of more data we should be collecting.
Please let me know if you are willing to help in this process with your machine(s).
Please let me know which engine/version you would like to use, so that no one overlaps efforts.
-
Scally
- Posts: 232
- Joined: Thu Sep 28, 2017 9:34 pm
- Location: Bermondsey, London
- Full name: Alan Cooper
Re: Crowd Sourced processing of Super GMs + Hans for engine correlation
Hi Andrew,
I’m afraid you were right, it’s just not tenable for me on a Raspberry Pi 4.
After 3 hrs I have only processed 24 of Carlsen’s 1015 games, so it would take 5.29 days just to do Carlsen's games alone, and a total of 25.84 days to do all 8 players (4962 games). That’s with Stockfish15-NN and there would still be the other 5 Stockfish versions to run, so a total of 155 days or 5.17 Months.

(Click on the thumbnail for a larger image)
So I’m sorry, but I’ve cancelled the run, let’s hope someone else with a faster processor can help you.
Al.
I’m afraid you were right, it’s just not tenable for me on a Raspberry Pi 4.
After 3 hrs I have only processed 24 of Carlsen’s 1015 games, so it would take 5.29 days just to do Carlsen's games alone, and a total of 25.84 days to do all 8 players (4962 games). That’s with Stockfish15-NN and there would still be the other 5 Stockfish versions to run, so a total of 155 days or 5.17 Months.

(Click on the thumbnail for a larger image)
So I’m sorry, but I’ve cancelled the run, let’s hope someone else with a faster processor can help you.
Al.
-
RobertJBarker3
- Posts: 5
- Joined: Thu Sep 29, 2022 3:25 pm
- Full name: Robert Barker
Re: Crowd Sourced processing of Super GMs + Hans for engine correlation
I'm doing stockfish7
