Looking for a PGN/EPD utility

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Norm Pollock
Posts: 1029
Joined: Thu Mar 09, 2006 3:15 pm
Location: Long Island, NY, USA
Contact:

Re: Looking for a PGN/EPD utility

Post by Norm Pollock » Thu Feb 16, 2017 1:51 pm

note: pgn-extract is now up to 17-34 and the "--version" option works correctly.

User avatar
Ozymandias
Posts: 1190
Joined: Sun Oct 25, 2009 12:30 am

Re: Looking for a PGN/EPD utility

Post by Ozymandias » Thu Feb 16, 2017 2:28 pm

Norm Pollock wrote:Based on the output of "manifest-ps" that you sent me, my first suspicion is that the "pgn" files you used were not up to PGN Standards. Perhaps they did not have a blank line separating games.

Please use "trim" on those files and then rename.
trim alpha.pgn
outr.pgn alpha-t.pgn
I will try that, but I thought that cleaning the PGNs with pgn-extract was enough. They certainly behave with any other program.
Norm Pollock wrote:I am surprised that you were able to get output from manifest-ps during execution. It must have been due to an overflow of the buffer because during execution of my test the files "manifest-ps" and "numbers", the two output files of epdPosition, had 0 bytes.
I use a trick to peek at the progress, I ask for the properties of the file and it sometimes updates the size, then I can copy the file to another dir and open it.
Norm Pollock wrote:Based on the size of your test, it will take a very long time for it to finish. Perhaps a month. In which time there could be a power blip that wipes everything out. I would do the project in small parts.
I'm using a PC plugged into an UPS, but that doesn't mean I'm willing to wait a month :wink: By doing "the project in small parts", do you also mean to split the temp.epd file?
Norm Pollock wrote:And finally, are you using a recent version of pgn-extract? Use "17-21" or later. I noticed D. Barnes' new version "17-30" outputs "17-26" when using the "--version" command.
The latest one, 17-34.

Norm Pollock
Posts: 1029
Joined: Thu Mar 09, 2006 3:15 pm
Location: Long Island, NY, USA
Contact:

Re: Looking for a PGN/EPD utility

Post by Norm Pollock » Thu Feb 16, 2017 3:03 pm

Assuming my latest suggestion does not help, here is what we can do.

Can you send me a sample of the 2 pgns: about 25K games of search pgn, 100K games of data pgn? Maybe you can upload somewhere I can download.

The problem could be in the pgns. If you can, try "txtChar" to see if there are any unexpected (hidden) characters in the pgn files. Sometimes concatenating files brings in extra characters. Then there is a possible UTF-8 problem.

Also try using "join" which you can download from my page. Only need to test on a subset of your pgn files, of about 100K games.

Norm Pollock
Posts: 1029
Joined: Thu Mar 09, 2006 3:15 pm
Location: Long Island, NY, USA
Contact:

Re: Looking for a PGN/EPD utility

Post by Norm Pollock » Thu Feb 16, 2017 3:15 pm

Ozymandias wrote: By doing "the project in small parts", do you also mean to split the temp.epd file?
split the "search.epd" file, that is the one that is limited by 100000. "temp.epd" is unlimited in size, but of course, the bigger it is, the longer it takes for the program to finish.

User avatar
Ozymandias
Posts: 1190
Joined: Sun Oct 25, 2009 12:30 am

Re: Looking for a PGN/EPD utility

Post by Ozymandias » Thu Feb 16, 2017 9:30 pm

Norm Pollock wrote:
Ozymandias wrote: By doing "the project in small parts", do you also mean to split the temp.epd file?
split the "search.epd" file, that is the one that is limited by 100000. "temp.epd" is unlimited in size, but of course, the bigger it is, the longer it takes for the program to finish.
I figured out how to do it: png-extract --fuzzydepth 0 -U -ddupes.pgn set1a20.pgn pyramid_ply40.pgn

Knowing that the games are the same at the truncated ply, this gives me the the duplicate games (dupes.pgn). With them being now an exact duplicate, and not just a positional one (same tags), it's easy to detect them, mark them for deletion and compact the DB, getting the new unique positions/games.

png-extract took less than an hour to perform this task!

Post Reply