Need Help with Large PGN's

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Need Help with Large PGN's

Post by Adam Hair »

I am trying to organize the PGN files associated with the rating list that I
am constructing, which has proven to be a bit of a headache. What is the
best way, using free utilities, to do the following things:

1. Remove annotations
2. Edit engine names
3. Combine the files
4. Remove games involving older engine versions

These are the utilities I have been using: Arena, NotePad, ChessDB, Trim,
and a simple bat file to combine the files.

There are 104,000+ games involved.
muxecoid
Posts: 150
Joined: Sat Jan 30, 2010 10:54 am
Location: Israel

Re: Need Help with Large PGN's

Post by muxecoid »

Why not try SCID?
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Need Help with Large PGN's

Post by Adam Hair »

muxecoid wrote:Why not try SCID?
ChessDB and SCID are similar. I did not know that engine names could be
edited with SCID. I use Arena to remove the older engine versions because
I can delete filtered games. I don't think I can do that with ChessDB.
Norm Pollock
Posts: 1056
Joined: Thu Mar 09, 2006 4:15 pm
Location: Long Island, NY, USA

Re: Need Help with Large PGN's

Post by Norm Pollock »

Adam Hair wrote:I am trying to organize the PGN files associated with the rating list that I
am constructing, which has proven to be a bit of a headache. What is the
best way, using free utilities, to do the following things:

1. Remove annotations
2. Edit engine names
3. Combine the files
4. Remove games involving older engine versions

These are the utilities I have been using: Arena, NotePad, ChessDB, Trim,
and a simple bat file to combine the files.

There are 104,000+ games involved.
1. Remove annotations
Use "trim" from "40H'

2. Edit engine names
Use "nameChange" from "40H"

3. Combine the files
"copy a.pgn + b.pgn c.pgn" in a command window

4. Remove games involving older engine versions
Use "listExtract" from "40H"
Updated links for 40H Tools and Databases
http://40Hchess.epizy.com
http://nk-qy.info/40h
User avatar
David Dahlem
Posts: 900
Joined: Wed Mar 08, 2006 9:06 pm

Re: Need Help with Large PGN's

Post by David Dahlem »

Adam Hair wrote:I am trying to organize the PGN files associated with the rating list that I
am constructing, which has proven to be a bit of a headache. What is the
best way, using free utilities, to do the following things:

1. Remove annotations
2. Edit engine names
3. Combine the files
4. Remove games involving older engine versions

These are the utilities I have been using: Arena, NotePad, ChessDB, Trim,
and a simple bat file to combine the files.

There are 104,000+ games involved.
Pgn Extract is what i use.

http://www.cs.kent.ac.uk/people/staff/djb/pgn-extract/
Vinvin
Posts: 5228
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: Need Help with Large PGN's

Post by Vinvin »

Here's what I used to make the big (716 226 games) rating list ( see here http://www.talkchess.com/forum/viewtopi ... 409#331409 ) :

1. Remove annotations
- Use the 40H PGN Utilities -> http://www.hoflink.com/~npollock/chess.html

2. Edit engine names
- Textpad is very efficient to edit text files of several GB (!) -> http://www.textpad.com/

3. Combine the files
- You can use the "copy" command from the windows command line exemple : "copy file1.pgn+file2.pgn+file3.pgn bigfile.txt"

4. Remove games involving older engine versions
- The 40H PGN Utilities should do this too, see doc here : http://www.hoflink.com/~npollock/overview-40H.txt

My best,
Vincent
Vinvin
Posts: 5228
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: Need Help with Large PGN's

Post by Vinvin »

Hi norm ! Thanks for your utilities.

I got a little bug some times ago when I removed moves from games : IIRC, it's when there's a comment "{}" on the same line as the results (in the end of the game), the program stop just after that and I lost the end of the file ... if you are interested, I can look to reproduce that ...

My best,
Vincent

Norm Pollock wrote: 1. Remove annotations
Use "trim" from "40H'

2. Edit engine names
Use "nameChange" from "40H"

3. Combine the files
"copy a.pgn + b.pgn c.pgn" in a command window

4. Remove games involving older engine versions
Use "listExtract" from "40H"
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Need Help with Large PGN's

Post by Adam Hair »

Hi Norm

I use "trim", but for some reason I was unaware of "namechange" and
"listExtract". I will give these a try.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Need Help with Large PGN's

Post by Adam Hair »

David Dahlem wrote:
Adam Hair wrote:I am trying to organize the PGN files associated with the rating list that I
am constructing, which has proven to be a bit of a headache. What is the
best way, using free utilities, to do the following things:

1. Remove annotations
2. Edit engine names
3. Combine the files
4. Remove games involving older engine versions

These are the utilities I have been using: Arena, NotePad, ChessDB, Trim,
and a simple bat file to combine the files.

There are 104,000+ games involved.
Pgn Extract is what i use.

http://www.cs.kent.ac.uk/people/staff/djb/pgn-extract/
I have to admit that I have had some trouble using PGN Extract. I understand
the commands but I could not get it to run.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Need Help with Large PGN's

Post by Adam Hair »

Vinvin wrote:Here's what I used to make the big (716 226 games) rating list ( see here http://www.talkchess.com/forum/viewtopi ... 409#331409 ) :

1. Remove annotations
- Use the 40H PGN Utilities -> http://www.hoflink.com/~npollock/chess.html

2. Edit engine names
- Textpad is very efficient to edit text files of several GB (!) -> http://www.textpad.com/

3. Combine the files
- You can use the "copy" command from the windows command line exemple : "copy file1.pgn+file2.pgn+file3.pgn bigfile.txt"

4. Remove games involving older engine versions
- The 40H PGN Utilities should do this too, see doc here : http://www.hoflink.com/~npollock/overview-40H.txt

My best,
Vincent
If TextPad works with large files better than NotePad then I will definitely
use it.