Looking for a PGN/EPD utility
Moderators: hgm, Rebel, chrisw
-
- Posts: 1056
- Joined: Thu Mar 09, 2006 4:15 pm
- Location: Long Island, NY, USA
Re: Looking for a PGN/EPD utility
note: pgn-extract is now up to 17-34 and the "--version" option works correctly.
-
- Posts: 1535
- Joined: Sun Oct 25, 2009 2:30 am
Re: Looking for a PGN/EPD utility
I will try that, but I thought that cleaning the PGNs with pgn-extract was enough. They certainly behave with any other program.Norm Pollock wrote:Based on the output of "manifest-ps" that you sent me, my first suspicion is that the "pgn" files you used were not up to PGN Standards. Perhaps they did not have a blank line separating games.
Please use "trim" on those files and then rename.
trim alpha.pgn
outr.pgn alpha-t.pgn
I use a trick to peek at the progress, I ask for the properties of the file and it sometimes updates the size, then I can copy the file to another dir and open it.Norm Pollock wrote:I am surprised that you were able to get output from manifest-ps during execution. It must have been due to an overflow of the buffer because during execution of my test the files "manifest-ps" and "numbers", the two output files of epdPosition, had 0 bytes.
I'm using a PC plugged into an UPS, but that doesn't mean I'm willing to wait a month By doing "the project in small parts", do you also mean to split the temp.epd file?Norm Pollock wrote:Based on the size of your test, it will take a very long time for it to finish. Perhaps a month. In which time there could be a power blip that wipes everything out. I would do the project in small parts.
The latest one, 17-34.Norm Pollock wrote:And finally, are you using a recent version of pgn-extract? Use "17-21" or later. I noticed D. Barnes' new version "17-30" outputs "17-26" when using the "--version" command.
-
- Posts: 1056
- Joined: Thu Mar 09, 2006 4:15 pm
- Location: Long Island, NY, USA
Re: Looking for a PGN/EPD utility
Assuming my latest suggestion does not help, here is what we can do.
Can you send me a sample of the 2 pgns: about 25K games of search pgn, 100K games of data pgn? Maybe you can upload somewhere I can download.
The problem could be in the pgns. If you can, try "txtChar" to see if there are any unexpected (hidden) characters in the pgn files. Sometimes concatenating files brings in extra characters. Then there is a possible UTF-8 problem.
Also try using "join" which you can download from my page. Only need to test on a subset of your pgn files, of about 100K games.
Can you send me a sample of the 2 pgns: about 25K games of search pgn, 100K games of data pgn? Maybe you can upload somewhere I can download.
The problem could be in the pgns. If you can, try "txtChar" to see if there are any unexpected (hidden) characters in the pgn files. Sometimes concatenating files brings in extra characters. Then there is a possible UTF-8 problem.
Also try using "join" which you can download from my page. Only need to test on a subset of your pgn files, of about 100K games.
-
- Posts: 1056
- Joined: Thu Mar 09, 2006 4:15 pm
- Location: Long Island, NY, USA
Re: Looking for a PGN/EPD utility
split the "search.epd" file, that is the one that is limited by 100000. "temp.epd" is unlimited in size, but of course, the bigger it is, the longer it takes for the program to finish.Ozymandias wrote: By doing "the project in small parts", do you also mean to split the temp.epd file?
-
- Posts: 1535
- Joined: Sun Oct 25, 2009 2:30 am
Re: Looking for a PGN/EPD utility
I figured out how to do it: png-extract --fuzzydepth 0 -U -ddupes.pgn set1a20.pgn pyramid_ply40.pgnNorm Pollock wrote:split the "search.epd" file, that is the one that is limited by 100000. "temp.epd" is unlimited in size, but of course, the bigger it is, the longer it takes for the program to finish.Ozymandias wrote: By doing "the project in small parts", do you also mean to split the temp.epd file?
Knowing that the games are the same at the truncated ply, this gives me the the duplicate games (dupes.pgn). With them being now an exact duplicate, and not just a positional one (same tags), it's easy to detect them, mark them for deletion and compact the DB, getting the new unique positions/games.
png-extract took less than an hour to perform this task!