I checked the documentaion of pgn-extract and found thru combination of options, there is a way to output games in 2.pgn that are not in 1.pgn. There is no need to delete common games found in 2.pgn. Use this command.JManion wrote:Ferdy wrote:Say you have unique games in each 4 pgn filesJManion wrote:Ok I have 4 pgn files that are all about 2GB. If I run the program it says 1 has no duplicates, and 2 has no duplicates, 3 has not duplicates, and 4 has none.
However I want to find some way to compare files 1 and 2 to see if it has any duplicates (and 1 and 3 and 1 and 4 etc).
1. 1.pgn
2. 2.pgn
3. 3.pgn
4. 4.pgn
First Phase:
A. To check if there are same game between 1.pgn and 2.pgn, use the ff command,
Check the output dup.pgn, if there is game here then that is the game common to 1.pgn and 2.pgn. Locate the game or games in 2.pgn and delete it. That common game will be retained in 1.pgn. Rename the revised 2.pgn to R1-2.pgn.Code: Select all
pgn-extract -U -ddup.pgn 1.pgn 2.pgn
B. To check if there are same game between 1.pgn and 3.pgn, use the ff command,
Check the output dup.pgn, if there is game here then that is the game common to 1.pgn and 3.pgn. Locate the game in 3.pgn and delete it. That common game will be retained in 1.pgn. Rename the revised 3.pgn to R1-3.pgn.Code: Select all
pgn-extract -U -ddup.pgn 1.pgn 3.pgn
C. Do it for 1.pgn vs 4.pgn
Second Phase:
D. Next do it with R1-2.pgn vs R1-3.pgn
R1-2.pgn refers to 1st revision of 2.pgn, if 2.pgn was revised in 1st phase.
R1-3.pgn refers to 1st revision of 3.pgn, if 3.pgn was revised in 1st phase.
E. Next do it with R1-2.pgn vs R1-4.pgn
Third Phase:
F. Do it with R2-3.pgn vs R2-4.pgn
R2-3.pgn refers to 2nd revision of 3.pgn, if it was revised, otherwise use the latest file always.
So what we have done was
1. compare 1.pgn vs 2.pgn
2. 1.pgn vs 3.pgn
3. 1.pgn vs 4.pgn
4. 2.pgn vs 3.pgn (use revised files)
5. 2.pgn vs 4.pgn (use revised files)
6. 3.pgn vs 4.pgn (use revised files)
*If there are common games, always delete those games from the higher numbered pgn file. Example compare 1.pgn vs 2.pgn, if there are common games, delete those in 2.pgn file.
thank you Ferdinand
after I ran pgn 1 vs 2.
I had a dupe file which has 1344 dupes. Is there any easy command that can delete those 1344 games from pgn2?
thanks again.
Code: Select all
pgn-extract -c1.pgn -dCommon.pgn -oR1-2.pgn 2.pgn
-c (for check file)
-d (to output common or duplicate games)
-o (to output unique games that are not found in 1.pgn)
1.pgn (master file)
Common.pgn (dupes are found here for inspection)
R1-2.pgn (revised 2.pgn without the common games inside it, this is the file you are interested with)
2.pgn (the file you want to check with the master file 1.pgn)
I tested this command only on small files and it works.
By combining options I guess there are more things this pgn-extract can do that we have not yet discovered