JManion wrote:Ok I have 4 pgn files that are all about 2GB. If I run the program it says 1 has no duplicates, and 2 has no duplicates, 3 has not duplicates, and 4 has none.
However I want to find some way to compare files 1 and 2 to see if it has any duplicates (and 1 and 3 and 1 and 4 etc).
Say you have unique games in each 4 pgn files
1. 1.pgn
2. 2.pgn
3. 3.pgn
4. 4.pgn
First Phase:
A. To check if there are same game between 1.pgn and 2.pgn, use the ff command,
Code: Select all
pgn-extract -U -ddup.pgn 1.pgn 2.pgn
Check the output dup.pgn, if there is game here then that is the game common to 1.pgn and 2.pgn. Locate the game or games in 2.pgn and delete it. That common game will be retained in 1.pgn. Rename the revised 2.pgn to R1-2.pgn.
B. To check if there are same game between 1.pgn and 3.pgn, use the ff command,
Code: Select all
pgn-extract -U -ddup.pgn 1.pgn 3.pgn
Check the output dup.pgn, if there is game here then that is the game common to 1.pgn and 3.pgn. Locate the game in 3.pgn and delete it. That common game will be retained in 1.pgn. Rename the revised 3.pgn to R1-3.pgn.
C. Do it for 1.pgn vs 4.pgn
Second Phase:
D. Next do it with R1-2.pgn vs R1-3.pgn
R1-2.pgn refers to 1st revision of 2.pgn, if 2.pgn was revised in 1st phase.
R1-3.pgn refers to 1st revision of 3.pgn, if 3.pgn was revised in 1st phase.
E. Next do it with R1-2.pgn vs R1-4.pgn
Third Phase:
F. Do it with R2-3.pgn vs R2-4.pgn
R2-3.pgn refers to 2nd revision of 3.pgn, if it was revised, otherwise use the latest file always.
So what we have done was
1. compare 1.pgn vs 2.pgn
2. 1.pgn vs 3.pgn
3. 1.pgn vs 4.pgn
4. 2.pgn vs 3.pgn (use revised files)
5. 2.pgn vs 4.pgn (use revised files)
6. 3.pgn vs 4.pgn (use revised files)
*If there are common games, always delete those games from the higher numbered pgn file. Example compare 1.pgn vs 2.pgn, if there are common games, delete those in 2.pgn file.