PGN Extract

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Dann Corbit, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
Denis P. Mendoza
Posts: 415
Joined: Fri Dec 15, 2006 8:46 pm
Location: Philippines
Contact:

Re: PGN Extract

Post by Denis P. Mendoza » Tue Oct 25, 2011 12:53 am

JManion wrote:I would try in DOS but I have no idea how to do that.
OK, sorry for that. Best is use "Drop to Dos" by Terabyte.
http://www.terabyteunlimited.com/downloads/DOSDROP.ZIP
Install it, and everytime you like to drop to DOS on any of your folders, just right-click the folder and click "Drop to DOS" on the drop-down list. You can now execute any executable or bat file you like. I hope this helped.

JManion
Posts: 195
Joined: Wed Dec 23, 2009 7:53 am

Re: PGN Extract

Post by JManion » Tue Oct 25, 2011 7:06 am

OK

I have another dumb question. Is there a way to make a pgn larger then 2 GB.. I know the answer will be no... but....


If I take my database and break it up into 4 or 5, 2 GB files, i might have duplicate games over the different files.

Ferdy
Posts: 4591
Joined: Sun Aug 10, 2008 1:15 pm
Location: Philippines

Re: PGN Extract

Post by Ferdy » Tue Oct 25, 2011 9:07 am

If I take my database and break it up into 4 or 5, 2 GB files, i might have duplicate games over the different files
From the pgn-extract help file.
The -U flag suppresses output of the first occurrence of a particular game. This is useful when combined with the -d flag as a means of identifying just those games that are duplicated in a list of multiple files. As the duplicate games are commented with the file in which they were located, it then becomes possible to prune a set of files containing common games. For instance, suppose oldfile.pgn contains a set of games without duplicates, and you wish to know which games in newfile.pgn already occur in oldfile.pgn:

pgn-extract -U -ddupes.pgn oldfile.pgn newfile.pgn

will write to dupes.pgn the duplicate games so that you can go through newfile.pgn and remove them. Of course, if you simply want to hold the combined set of unique games in a single file you would use something like:

pgn-extract -D -onewset.pgn oldfile.pgn newfile.pgn

JManion
Posts: 195
Joined: Wed Dec 23, 2009 7:53 am

Re: PGN Extract

Post by JManion » Tue Oct 25, 2011 7:16 pm

Ferdy wrote:
If I take my database and break it up into 4 or 5, 2 GB files, i might have duplicate games over the different files
From the pgn-extract help file.
The -U flag suppresses output of the first occurrence of a particular game. This is useful when combined with the -d flag as a means of identifying just those games that are duplicated in a list of multiple files. As the duplicate games are commented with the file in which they were located, it then becomes possible to prune a set of files containing common games. For instance, suppose oldfile.pgn contains a set of games without duplicates, and you wish to know which games in newfile.pgn already occur in oldfile.pgn:

pgn-extract -U -ddupes.pgn oldfile.pgn newfile.pgn

will write to dupes.pgn the duplicate games so that you can go through newfile.pgn and remove them. Of course, if you simply want to hold the combined set of unique games in a single file you would use something like:

pgn-extract -D -onewset.pgn oldfile.pgn newfile.pgn
Ok I missed this so thank you. So lets say I went through 5 pgn files.

Now I

1. Add all the duplicates into 1 file names dupes.pgn and put them in the extract folder.

2. Take the All the Uniquegames.pgn and put them together... looks like 4 pgns.

3. add this line pgn-extract -U -ddupes.pgn oldfile.pgn newfile.pgn to the .bat script.

4. re run the bat which each of my pgns?

Thanks again for the help

bob
Posts: 20923
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: PGN Extract

Post by bob » Tue Oct 25, 2011 8:21 pm

JManion wrote:OK

I have another dumb question. Is there a way to make a pgn larger then 2 GB.. I know the answer will be no... but....


If I take my database and break it up into 4 or 5, 2 GB files, i might have duplicate games over the different files.
yes. Use a real operating system. :)

JManion
Posts: 195
Joined: Wed Dec 23, 2009 7:53 am

Re: PGN Extract

Post by JManion » Thu Oct 27, 2011 11:56 pm

JManion wrote:
Ferdy wrote:
If I take my database and break it up into 4 or 5, 2 GB files, i might have duplicate games over the different files
From the pgn-extract help file.
The -U flag suppresses output of the first occurrence of a particular game. This is useful when combined with the -d flag as a means of identifying just those games that are duplicated in a list of multiple files. As the duplicate games are commented with the file in which they were located, it then becomes possible to prune a set of files containing common games. For instance, suppose oldfile.pgn contains a set of games without duplicates, and you wish to know which games in newfile.pgn already occur in oldfile.pgn:

pgn-extract -U -ddupes.pgn oldfile.pgn newfile.pgn

will write to dupes.pgn the duplicate games so that you can go through newfile.pgn and remove them. Of course, if you simply want to hold the combined set of unique games in a single file you would use something like:

pgn-extract -D -onewset.pgn oldfile.pgn newfile.pgn
Ok I missed this so thank you. So lets say I went through 5 pgn files.

Now I

1. Add all the duplicates into 1 file names dupes.pgn and put them in the extract folder.

2. Take the All the Uniquegames.pgn and put them together... looks like 4 pgns.

3. add this line pgn-extract -U -ddupes.pgn oldfile.pgn newfile.pgn to the .bat script.

4. re run the bat which each of my pgns?

Thanks again for the help
Ok this is not right.. could anyone give me some help? Thanks!

Ferdy
Posts: 4591
Joined: Sun Aug 10, 2008 1:15 pm
Location: Philippines

Re: PGN Extract

Post by Ferdy » Fri Oct 28, 2011 2:48 am

JManion wrote:
JManion wrote:
Ferdy wrote:
If I take my database and break it up into 4 or 5, 2 GB files, i might have duplicate games over the different files
From the pgn-extract help file.
The -U flag suppresses output of the first occurrence of a particular game. This is useful when combined with the -d flag as a means of identifying just those games that are duplicated in a list of multiple files. As the duplicate games are commented with the file in which they were located, it then becomes possible to prune a set of files containing common games. For instance, suppose oldfile.pgn contains a set of games without duplicates, and you wish to know which games in newfile.pgn already occur in oldfile.pgn:

pgn-extract -U -ddupes.pgn oldfile.pgn newfile.pgn

will write to dupes.pgn the duplicate games so that you can go through newfile.pgn and remove them. Of course, if you simply want to hold the combined set of unique games in a single file you would use something like:

pgn-extract -D -onewset.pgn oldfile.pgn newfile.pgn
Ok I missed this so thank you. So lets say I went through 5 pgn files.

Now I

1. Add all the duplicates into 1 file names dupes.pgn and put them in the extract folder.

2. Take the All the Uniquegames.pgn and put them together... looks like 4 pgns.

3. add this line pgn-extract -U -ddupes.pgn oldfile.pgn newfile.pgn to the .bat script.

4. re run the bat which each of my pgns?

Thanks again for the help
Ok this is not right.. could anyone give me some help? Thanks!
Can you tell your problem, what really do you want to achieve with your pgn files?

JManion
Posts: 195
Joined: Wed Dec 23, 2009 7:53 am

Re: PGN Extract

Post by JManion » Fri Oct 28, 2011 4:10 am

Ok I have 4 pgn files that are all about 2GB. If I run the program it says 1 has no duplicates, and 2 has no duplicates, 3 has not duplicates, and 4 has none.

However I want to find some way to compare files 1 and 2 to see if it has any duplicates (and 1 and 3 and 1 and 4 etc).

Ferdy
Posts: 4591
Joined: Sun Aug 10, 2008 1:15 pm
Location: Philippines

Re: PGN Extract

Post by Ferdy » Fri Oct 28, 2011 6:55 am

JManion wrote:Ok I have 4 pgn files that are all about 2GB. If I run the program it says 1 has no duplicates, and 2 has no duplicates, 3 has not duplicates, and 4 has none.

However I want to find some way to compare files 1 and 2 to see if it has any duplicates (and 1 and 3 and 1 and 4 etc).
Say you have unique games in each 4 pgn files
1. 1.pgn
2. 2.pgn
3. 3.pgn
4. 4.pgn

First Phase:
A. To check if there are same game between 1.pgn and 2.pgn, use the ff command,

Code: Select all

pgn-extract -U -ddup.pgn 1.pgn 2.pgn 
Check the output dup.pgn, if there is game here then that is the game common to 1.pgn and 2.pgn. Locate the game or games in 2.pgn and delete it. That common game will be retained in 1.pgn. Rename the revised 2.pgn to R1-2.pgn.

B. To check if there are same game between 1.pgn and 3.pgn, use the ff command,

Code: Select all

pgn-extract -U -ddup.pgn 1.pgn 3.pgn 
Check the output dup.pgn, if there is game here then that is the game common to 1.pgn and 3.pgn. Locate the game in 3.pgn and delete it. That common game will be retained in 1.pgn. Rename the revised 3.pgn to R1-3.pgn.

C. Do it for 1.pgn vs 4.pgn

Second Phase:
D. Next do it with R1-2.pgn vs R1-3.pgn
R1-2.pgn refers to 1st revision of 2.pgn, if 2.pgn was revised in 1st phase.
R1-3.pgn refers to 1st revision of 3.pgn, if 3.pgn was revised in 1st phase.

E. Next do it with R1-2.pgn vs R1-4.pgn

Third Phase:
F. Do it with R2-3.pgn vs R2-4.pgn
R2-3.pgn refers to 2nd revision of 3.pgn, if it was revised, otherwise use the latest file always.

So what we have done was
1. compare 1.pgn vs 2.pgn
2. 1.pgn vs 3.pgn
3. 1.pgn vs 4.pgn

4. 2.pgn vs 3.pgn (use revised files)
5. 2.pgn vs 4.pgn (use revised files)

6. 3.pgn vs 4.pgn (use revised files)

*If there are common games, always delete those games from the higher numbered pgn file. Example compare 1.pgn vs 2.pgn, if there are common games, delete those in 2.pgn file.

JManion
Posts: 195
Joined: Wed Dec 23, 2009 7:53 am

Re: PGN Extract

Post by JManion » Sat Oct 29, 2011 1:13 am

Ferdy wrote:
JManion wrote:Ok I have 4 pgn files that are all about 2GB. If I run the program it says 1 has no duplicates, and 2 has no duplicates, 3 has not duplicates, and 4 has none.

However I want to find some way to compare files 1 and 2 to see if it has any duplicates (and 1 and 3 and 1 and 4 etc).
Say you have unique games in each 4 pgn files
1. 1.pgn
2. 2.pgn
3. 3.pgn
4. 4.pgn

First Phase:
A. To check if there are same game between 1.pgn and 2.pgn, use the ff command,

Code: Select all

pgn-extract -U -ddup.pgn 1.pgn 2.pgn 
Check the output dup.pgn, if there is game here then that is the game common to 1.pgn and 2.pgn. Locate the game or games in 2.pgn and delete it. That common game will be retained in 1.pgn. Rename the revised 2.pgn to R1-2.pgn.

B. To check if there are same game between 1.pgn and 3.pgn, use the ff command,

Code: Select all

pgn-extract -U -ddup.pgn 1.pgn 3.pgn 
Check the output dup.pgn, if there is game here then that is the game common to 1.pgn and 3.pgn. Locate the game in 3.pgn and delete it. That common game will be retained in 1.pgn. Rename the revised 3.pgn to R1-3.pgn.

C. Do it for 1.pgn vs 4.pgn

Second Phase:
D. Next do it with R1-2.pgn vs R1-3.pgn
R1-2.pgn refers to 1st revision of 2.pgn, if 2.pgn was revised in 1st phase.
R1-3.pgn refers to 1st revision of 3.pgn, if 3.pgn was revised in 1st phase.

E. Next do it with R1-2.pgn vs R1-4.pgn

Third Phase:
F. Do it with R2-3.pgn vs R2-4.pgn
R2-3.pgn refers to 2nd revision of 3.pgn, if it was revised, otherwise use the latest file always.

So what we have done was
1. compare 1.pgn vs 2.pgn
2. 1.pgn vs 3.pgn
3. 1.pgn vs 4.pgn

4. 2.pgn vs 3.pgn (use revised files)
5. 2.pgn vs 4.pgn (use revised files)

6. 3.pgn vs 4.pgn (use revised files)

*If there are common games, always delete those games from the higher numbered pgn file. Example compare 1.pgn vs 2.pgn, if there are common games, delete those in 2.pgn file.

thank you Ferdinand

after I ran pgn 1 vs 2.

I had a dupe file which has 1344 dupes. Is there any easy command that can delete those 1344 games from pgn2?

thanks again.

Post Reply