Is there an easy way to...

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Is there an easy way to...

Post by Zenmastur »

...strip duplicate EPDs from a file?

And is the a program to break a PGN in to EPD records?

Regards,

Zenmastur
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: Is there an easy way to...

Post by Ovyron »

Zenmastur wrote: Sun Mar 08, 2020 3:05 am And is the a program to break a PGN in to EPD records?
Chess Openings Wizard allows you to import PGN files up to some move, then export all the leaf nodes as EPD files, and if you want later have an engine give a score to all of them.
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: Is there an easy way to...

Post by MikeB »

Zenmastur wrote: Sun Mar 08, 2020 3:05 am ...strip duplicate EPDs from a file?

And is the a program to break a PGN in to EPD records?

Regards,

Zenmastur
The unix tool command line tool 'sort'
Under Windows it is installed with MYSYS2.
sort {file-name} | uniq -u

For Q#2: pgn extract does that

https://github.com/MichaelB7/pgn-extract
Image
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Is there an easy way to...

Post by Dann Corbit »

Zenmastur wrote: Sun Mar 08, 2020 3:05 am ...strip duplicate EPDs from a file?

And is the a program to break a PGN in to EPD records?

Regards,

Zenmastur
For number 1, it is very important to strip irrelevant e.p. flags first, since more than 95% of e.p. flags in the wild are irrelevant.
After that, pipe to sort and uniq works well, or (if you store your data in SQL like I do), simply SELECT DISTINCT.
There are a bunch of tools to accomplish number two, but I am partial to pgn2fen (of the publicly available tools).
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: Is there an easy way to...

Post by Zenmastur »

Dann Corbit wrote: Sun Mar 08, 2020 4:18 am
Zenmastur wrote: Sun Mar 08, 2020 3:05 am ...strip duplicate EPDs from a file?

And is the a program to break a PGN in to EPD records?

Regards,

Zenmastur
For number 1, it is very important to strip irrelevant e.p. flags first, since more than 95% of e.p. flags in the wild are irrelevant.
After that, pipe to sort and uniq works well, or (if you store your data in SQL like I do), simply SELECT DISTINCT.
There are a bunch of tools to accomplish number two, but I am partial to pgn2fen (of the publicly available tools).
Thanks, all! I think I'll try pgn2fen. I don't use SQL. Seems a bit overkill for what I want to do. I think I found a way to get rid of duplicates that is easy and I already have the software to do it. I'm only going to be dealing with a few thousand FEN so I can just load them in a spreadsheet sort them and then delete dups.

Regards,

Zenmastur
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: Is there an easy way to...

Post by Rebel »

Zenmastur wrote: Sun Mar 08, 2020 3:05 am ...strip duplicate EPDs from a file?
SOMU -> Remove Doubles
And is the a program to break a PGN in to EPD records?
SOMU -> F9 -> PGN to MEA or SIMEX

http://rebel13.nl/download/utilities.html
90% of coding is debugging, the other 10% is writing bugs.
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: Is there an easy way to...

Post by Zenmastur »

Rebel wrote: Sun Mar 08, 2020 8:41 am
Zenmastur wrote: Sun Mar 08, 2020 3:05 am ...strip duplicate EPDs from a file?
SOMU -> Remove Doubles
And is the a program to break a PGN in to EPD records?
SOMU -> F9 -> PGN to MEA or SIMEX

http://rebel13.nl/download/utilities.html
Thanks! I'll check them out.
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: Is there an easy way to...

Post by Zenmastur »

Rebel wrote: Sun Mar 08, 2020 8:41 am
Zenmastur wrote: Sun Mar 08, 2020 3:05 am ...strip duplicate EPDs from a file?
SOMU -> Remove Doubles
And is the a program to break a PGN in to EPD records?
SOMU -> F9 -> PGN to MEA or SIMEX

http://rebel13.nl/download/utilities.html
That's a pretty nice utilities package you have there. It seems to work great. I just set it up for an overnight run of a few thousand epd records using a bunch of threads. Only thing I wish it had was some kind of timer so I could see how much time each thread takes so I can time it for approximately 8-hour runs. Other than that it has everything I need. :D :D :D

Thanks a bunch!

Regards,

Zenmastur
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: Is there an easy way to...

Post by Rebel »

No clock possible, but you can easily calculate it yourself. Say each thread has 1000 EPD's and the time control is one second, then when the counter is on 500 you have 500 seconds to go.
90% of coding is debugging, the other 10% is writing bugs.
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: Is there an easy way to...

Post by Zenmastur »

Rebel wrote: Tue Mar 10, 2020 10:07 am No clock possible, but you can easily calculate it yourself. Say each thread has 1000 EPD's and the time control is one second, then when the counter is on 500 you have 500 seconds to go.
Late last night and I wasn't quite ready to do a full run of EPDs. I also didn't want the cpu to sit idle all night. So, I took 1,000 pgns and wrote them to a file and processed them to EPDs and removed duplicates. I then started a bunch of threads at depth 45 to process them. :D :D :D

I guess this was a poor choice of depth for a test run. When I got up this morning each thread had processed about 9 positions. It looks like it will be a while before any of the threads finish.

Not sure if I should let them run or stop them all. I hate to lose the work but I didn't leave enough threads free that I can do normal work during the day. :( :( :(

On top of this, windows won't let me view all the thread as there are too many of them! :shock: :shock: :shock: It will only show me the first 20 of them.

In the future I'll start a limited number of threads, say 4 or 8. That way they will be easier to check on and I can let them run as long as needed.

The only thing I haven't figured out yet is how to process the EPDs into a dot.bin book when I'm done. Any suggestions?

Regards,

Zenmastur
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.