...strip duplicate EPDs from a file?
And is the a program to break a PGN in to EPD records?
Regards,
Zenmastur
Is there an easy way to...
Moderators: hgm, Rebel, chrisw
-
- Posts: 919
- Joined: Sat May 31, 2014 8:28 am
Is there an easy way to...
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
-
- Posts: 4556
- Joined: Tue Jul 03, 2007 4:30 am
-
- Posts: 4889
- Joined: Thu Mar 09, 2006 6:34 am
- Location: Pen Argyl, Pennsylvania
Re: Is there an easy way to...
The unix tool command line tool 'sort'
Under Windows it is installed with MYSYS2.
sort {file-name} | uniq -u
For Q#2: pgn extract does that
https://github.com/MichaelB7/pgn-extract
-
- Posts: 12538
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Is there an easy way to...
For number 1, it is very important to strip irrelevant e.p. flags first, since more than 95% of e.p. flags in the wild are irrelevant.
After that, pipe to sort and uniq works well, or (if you store your data in SQL like I do), simply SELECT DISTINCT.
There are a bunch of tools to accomplish number two, but I am partial to pgn2fen (of the publicly available tools).
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 919
- Joined: Sat May 31, 2014 8:28 am
Re: Is there an easy way to...
Thanks, all! I think I'll try pgn2fen. I don't use SQL. Seems a bit overkill for what I want to do. I think I found a way to get rid of duplicates that is easy and I already have the software to do it. I'm only going to be dealing with a few thousand FEN so I can just load them in a spreadsheet sort them and then delete dups.Dann Corbit wrote: ↑Sun Mar 08, 2020 4:18 amFor number 1, it is very important to strip irrelevant e.p. flags first, since more than 95% of e.p. flags in the wild are irrelevant.
After that, pipe to sort and uniq works well, or (if you store your data in SQL like I do), simply SELECT DISTINCT.
There are a bunch of tools to accomplish number two, but I am partial to pgn2fen (of the publicly available tools).
Regards,
Zenmastur
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
-
- Posts: 6991
- Joined: Thu Aug 18, 2011 12:04 pm
Re: Is there an easy way to...
SOMU -> Remove Doubles
SOMU -> F9 -> PGN to MEA or SIMEXAnd is the a program to break a PGN in to EPD records?
http://rebel13.nl/download/utilities.html
90% of coding is debugging, the other 10% is writing bugs.
-
- Posts: 919
- Joined: Sat May 31, 2014 8:28 am
Re: Is there an easy way to...
Thanks! I'll check them out.Rebel wrote: ↑Sun Mar 08, 2020 8:41 amSOMU -> Remove Doubles
SOMU -> F9 -> PGN to MEA or SIMEXAnd is the a program to break a PGN in to EPD records?
http://rebel13.nl/download/utilities.html
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
-
- Posts: 919
- Joined: Sat May 31, 2014 8:28 am
Re: Is there an easy way to...
That's a pretty nice utilities package you have there. It seems to work great. I just set it up for an overnight run of a few thousand epd records using a bunch of threads. Only thing I wish it had was some kind of timer so I could see how much time each thread takes so I can time it for approximately 8-hour runs. Other than that it has everything I need.Rebel wrote: ↑Sun Mar 08, 2020 8:41 amSOMU -> Remove Doubles
SOMU -> F9 -> PGN to MEA or SIMEXAnd is the a program to break a PGN in to EPD records?
http://rebel13.nl/download/utilities.html
Thanks a bunch!
Regards,
Zenmastur
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
-
- Posts: 6991
- Joined: Thu Aug 18, 2011 12:04 pm
Re: Is there an easy way to...
No clock possible, but you can easily calculate it yourself. Say each thread has 1000 EPD's and the time control is one second, then when the counter is on 500 you have 500 seconds to go.
90% of coding is debugging, the other 10% is writing bugs.
-
- Posts: 919
- Joined: Sat May 31, 2014 8:28 am
Re: Is there an easy way to...
Late last night and I wasn't quite ready to do a full run of EPDs. I also didn't want the cpu to sit idle all night. So, I took 1,000 pgns and wrote them to a file and processed them to EPDs and removed duplicates. I then started a bunch of threads at depth 45 to process them.
I guess this was a poor choice of depth for a test run. When I got up this morning each thread had processed about 9 positions. It looks like it will be a while before any of the threads finish.
Not sure if I should let them run or stop them all. I hate to lose the work but I didn't leave enough threads free that I can do normal work during the day.
On top of this, windows won't let me view all the thread as there are too many of them! It will only show me the first 20 of them.
In the future I'll start a limited number of threads, say 4 or 8. That way they will be easier to check on and I can let them run as long as needed.
The only thing I haven't figured out yet is how to process the EPDs into a dot.bin book when I'm done. Any suggestions?
Regards,
Zenmastur
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.