clean pgn tool

Discussion of chess software programming and technical issues.

Moderator: Ras

lucasart
Posts: 3243
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

clean pgn tool

Post by lucasart »

Does anyone have a script to clean a PGN ? A Python script would be great.

The idea is to shrink the PGN from

Code: Select all

[Event "40th Amateur D3"]
[Site "ChessGUI3"]
[Date "2013.03.27"]
[Round "10.3"]
[White "DiscoCheck 4.1 64-bit"]
[Black "RedQueen 1.1.3 64-bit"]
[Result "1-0"]
[ECO "B94"]
[PlyCount "49"]
[EventDate "2013.??.??"]
[TimeControl "40/1500:40/1500:40/1500"]

{i5 Quad} 1. e4 {[%eval 0,1] [%emt 0:00:00]} c5 {[%eval 0,1] [%emt 0:00:00]} 2.
Nf3 {[%eval 0,1] [%emt 0:00:00]} d6 {[%eval 0,1] [%emt 0:00:00]} 3. d4 {
[%eval 0,1] [%emt 0:00:00]} cxd4 {[%eval 0,1] [%emt 0:00:00]} 4. Nxd4 {
[%eval 0,1] [%emt 0:00:00]} Nf6 {[%eval 0,1] [%emt 0:00:00]} 5. Nc3 {
[%eval 0,1] [%emt 0:00:00]} a6 {[%eval 0,1] [%emt 0:00:00]} 6. Bg5 {
[%eval 0,1] [%emt 0:00:00]} Nbd7 {[%eval 0,1] [%emt 0:00:00]} 7. Bc4 {
[%eval 0,1] [%emt 0:00:00]} Qb6 {[%eval 0,1] [%emt 0:00:00]} 8. Bb3 {
[%eval 0,1] [%emt 0:00:00]} e6 {[%eval 0,1] [%emt 0:00:00]} 9. a4 {
[%eval 24,18] [%emt 0:00:46]} h6 {(Qa5) [%eval -2,19] [%emt 0:01:04]} 10. a5 {
(a5) [%eval 21,19] [%emt 0:00:46]} Qb4 {(Qd8) [%eval -16,19] [%emt 0:01:15]}
11. Bxf6 {(Ra4) [%eval 26,19] [%emt 0:00:46]} Nxf6 {
(Nxf6) [%eval 0,20] [%emt 0:00:50]} 12. Ba4+ {
(O-O) [%eval 16,19] [%emt 0:00:46]} Nd7 {(Bd7) [%eval 0,19] [%emt 0:01:54]} 13.
O-O {(O-O) [%eval 0,20] [%emt 0:00:46]} Be7 {
(Qxb2) [%eval -5,20] [%emt 0:00:52]} 14. Nf5 {
(Nce2) [%eval 43,19] [%emt 0:00:46]} Bf8 {(Bf8) [%eval -39,19] [%emt 0:02:02]}
15. Ne3 {(Qd4) [%eval 51,19] [%emt 0:00:46]} Be7 {
(Qc5) [%eval -78,20] [%emt 0:00:48]} 16. Ncd5 {
(Ncd5) [%eval 162,21] [%emt 0:00:46]} exd5 {
(exd5) [%eval -139,22] [%emt 0:01:14]} 17. Nxd5 {
(Nxd5) [%eval 228,21] [%emt 0:00:50]} Qc5 {(Qc5) [%eval -176,20] [%emt 0:01:00]
} 18. Nb6 {(Qg4) [%eval 232,22] [%emt 0:00:46]} Rb8 {
(Bd8) [%eval -176,21] [%emt 0:01:23]} 19. Qg4 {
(Qg4) [%eval 226,23] [%emt 0:00:46]} Qc7 {(O-O) [%eval -155,20] [%emt 0:01:28]}
20. Qxg7 {(Nd5) [%eval 336,20] [%emt 0:00:23]} Rf8 {
(Rf8) [%eval -431,19] [%emt 0:00:45]} 21. e5 {
(e5) [%eval 451,20] [%emt 0:00:47]} Kd8 {(dxe5) [%eval -515,18] [%emt 0:01:09]}
22. exd6 {(Rad1) [%eval 548,20] [%emt 0:00:47]} Bxd6 {
(Bxd6) [%eval -689,18] [%emt 0:00:49]} 23. Rad1 {
(Rad1) [%eval 670,18] [%emt 0:00:47]} Qc5 {(Re8) [%eval -713,18] [%emt 0:02:00]
} 24. Rd5 {(Rfe1) [%eval 788,20] [%emt 0:00:47]} Qxd5 {
(Qb4) [%eval -1043,17] [%emt 0:01:04]} 25. Nxd5 {
(Nxd5) [%eval 864,19] [%emt 0:00:34]} 1-0
to

Code: Select all

[Event "40th Amateur D3"]
[Site "ChessGUI3"]
[Date "2013.03.27"]
[Round "10.3"]
[White "DiscoCheck 4.1 64-bit"]
[Black "RedQueen 1.1.3 64-bit"]
[Result "1-0"]
[ECO "B94"]
[PlyCount "49"]
[EventDate "2013.??.??"]
[TimeControl "40/1500:40/1500:40/1500"]

1. e4 c5 2. Nf3 d6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 a6 6. Bg5 Nbd7 7. Bc4 Qb6 8. Bb3 e6 9. a4 h6 10. a5 Qb4 11. Bxf6 Nxf6 12. Ba4+ Nd7 13. Be7 14. Nf5 Bf8 15. Ne3 Be7 16. Ncd5 exd5 17. Nxd5 Qc5 18. Nb6 Rb8 19. Qg4 Qc7 20. Qxg7 Rf8 21. e5 Kd8 22. exd6 Bxd6 23. Rad1 Qc5 24. Rd5 Qxd5 25. Nxd5 1-0
?
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
Maarten

Re: clean pgn tool

Post by Maarten »

You could take a look at: http://www.hoflink.com/~npollock/chess.html
Regards,
Maarten
jdart
Posts: 4420
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: clean pgn tool

Post by jdart »

pgn-extract -C -N -V will strip out all comments, annotations and variations.

http://www.cs.kent.ac.uk/people/staff/djb/pgn-extract/

--Jon
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: clean pgn tool

Post by Adam Hair »

This is from Norm's readme:

"40H-PGN" is a collection of 46 PGN utility chess programs written by
Norman Pollock for use in "Windows" or in a "Java Runtime Environment".


If Norm's utilities can run in Java (I have not tried yet), then use the "trim" utility. It removes comments from pgn bodies.


Or there is pgn extract - http://www.cs.kent.ac.uk/people/staff/djb/pgn-extract/ . I do not know how portable the source is, but I have seen Linux compiles out there. If it is suitable for you, then use the flags -C -N -V. Those will remove comments, nags, and variations.


If all else fails and you do not feel like writing your own tool, then you could use Scid vs PC.
lucasart
Posts: 3243
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: clean pgn tool

Post by lucasart »

jdart wrote:pgn-extract -C -N -V will strip out all comments, annotations and variations.

http://www.cs.kent.ac.uk/people/staff/djb/pgn-extract/

--Jon
Very nice tool!
* the C source code compiles at the speed of light, without a single error or warning message
* it runs very fast. In fact, what slows it down is that it prints in stderr everytime it parses one game. If I redirect stderr to /dev/null, it's uber-fast. Example with a PGN of 20 MB, containing 5814 games:

Code: Select all

$ time ./pgn-extract -C -N -V ./test.pgn 1> ./t.pgn 2>/dev/null
real	0m1.940s
user	0m1.920s
sys	0m0.016s
8-)
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
jdart
Posts: 4420
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: clean pgn tool

Post by jdart »

add -s to suppress progress output.

--Jon
User avatar
jshriver
Posts: 1371
Joined: Wed Mar 08, 2006 9:41 pm
Location: Morgantown, WV, USA

Re: clean pgn tool

Post by jshriver »

lucasart wrote: Very nice tool!
Agree pgn-extract is an amazing program. I use it in so many projects I couldn't easily count.

Should be called pgn-swiss-army-knife :)