Need A Utility Program to Remove Duplicate EPD Positions

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Amstaff
Posts: 148
Joined: Thu Nov 19, 2009 4:58 pm
Location: College Station, Texas

Need A Utility Program to Remove Duplicate EPD Positions

Post by Amstaff »

Any help would be appreciated.
Thanks,
Gerald
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Need A Utility Program to Remove Duplicate EPD Positions

Post by bob »

Amstaff wrote:Any help would be appreciated.
Thanks,
Gerald
In unix, what about:

sort file.epd | uniq

:)
User avatar
F.Huber
Posts: 853
Joined: Thu Mar 09, 2006 4:50 pm
Location: Austria

Re: Need A Utility Program to Remove Duplicate EPD Positions

Post by F.Huber »

Amstaff wrote:Any help would be appreciated.
Thanks,
Gerald
Hi Gerald,

you can do this with my tool 'EPDUtil', which is included in the ChestUCI v5.1 package -
look here:
http://fhub.110mb.com/

It's quite easy to use, e.g. if your file is 'dups.epd' then just enter the following on the commandline:
EPDUtil dups.epd > nodups.epd

There are much more options, but they are all described in the included file EPDUtil.txt.

Regards,
Franz
Amstaff
Posts: 148
Joined: Thu Nov 19, 2009 4:58 pm
Location: College Station, Texas

Re: Need A Utility Program to Remove Duplicate EPD Positions

Post by Amstaff »

Thanks Mr. Huber for your good help. That worked perfectly. One more question, if I have a file of pgn games and want to export only the first 10 moves of each game, is that something I can do with your program?

Thanks,
Gerald
Frank Quisinsky
Posts: 6808
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: Need A Utility Program to Remove Duplicate EPD Positions

Post by Frank Quisinsky »

Hi Gerald,

check the EPD utilitie collection by Norm Pollock.

Great software by Norm Pollock
http://www.hoflink.com/~npollock/chess.html

Good luck

Best
Frank
User avatar
Aser Huerga
Posts: 812
Joined: Tue Jun 16, 2009 10:09 am
Location: Spain

Re: Need A Utility Program to Remove Duplicate EPD Positions

Post by Aser Huerga »

Hi Gerald,

Norm Pollock PGN utilities suite included such utility:

=========================(42) truncate =============================

"truncate" counts the number of plies (half-moves) in each game and
then removes any plies occurring after a user-specified number. If a
game finishes with less plies than the user-specified number, that
game is output as is.

"truncate" removes comments, nags, variations and major symbolic
annotation symbols (see "trim" above for the list).

Syntax: truncate filename.pgn maximum_plies

Usage: truncate alpha.pgn 50

Output: outU.pgn

Comments:

1. "truncate" only counts those plies that are actually present
in the game description, and does not rely on the values in
"PlyCount" tags.

2. "truncate" may require extra waiting time when it is used
with a very large "pgn" input file.

====================================================================


http://www.hoflink.com/~npollock/chess.html

Cheers.

Edit: Frank, we cross our posts. Greetings.
Frank Quisinsky
Posts: 6808
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: Need A Utility Program to Remove Duplicate EPD Positions

Post by Frank Quisinsky »

Hello Gerald,

can be also do with the PGN collection of Norm Pollock, see the URL in the message I wrote before.

Norm have a collection for own PGN and EPD tools. In the downloads you can be find a very good readme which explain all.

If you using a w32 OS (not x64 compatible) you can do that with a tool by George Lyapko (programmer of the UCI engine "Bestia") too:

George Lyabko:
http://lyapko.110mb.com/ (truncate utilitie)

Best
Frank
Last edited by Frank Quisinsky on Mon Aug 30, 2010 7:30 pm, edited 2 times in total.
Frank Quisinsky
Posts: 6808
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: Need A Utility Program to Remove Duplicate EPD Positions

Post by Frank Quisinsky »

Hi Aser,

my greetings too :-)

Best
Frank
User avatar
F.Huber
Posts: 853
Joined: Thu Mar 09, 2006 4:50 pm
Location: Austria

Re: Need A Utility Program to Remove Duplicate EPD Positions

Post by F.Huber »

Amstaff wrote: One more question, if I have a file of pgn games and want to export only the first 10 moves of each game, is that something I can do with your program?
Hi Gerald,

no, EPDUtil is especially made for ChestUCI to manipulate EPD-files -
for PGN-files the only feature is to extract EPD positions for further use.

But there have already been many other good suggestions here for your request. :)

Regards,
Franz
Norm Pollock
Posts: 1056
Joined: Thu Mar 09, 2006 4:15 pm
Location: Long Island, NY, USA

Re: Need A Utility Program to Remove Duplicate EPD Positions

Post by Norm Pollock »

Gerald,

My utility "epdList" in 40H-EPD suite, identifies duplicate positions within an epd file, and gives the line numbers of the duplicates. The opcodes can be different, but I still consider them duplicates if the position (including en passant rights) is the same.

For the first 10 moves of games in PGN format, use "truncate" in 40H-PGN suite. Since it is calibrated to half-moves or plies, the command would be
truncate filename.pgn 20

Code: Select all

=========================(7) epdList ===============================

"epdList" lists each distinct position together with the number
of times each occurred, and the line numbers where they occurred.

Existing opcodes are removed. The output file is sorted by decreasing
number of occurrences, and then alphanumerically.

"epdList" is useful for removing duplicate positions.

"epdList" removes blank lines.

Sample output record:

3Q4/p3b1k1/2p2rPp/2q5/4B3/P2P4/8/6RK w - - c0 2; c1 line(s): 467 468;

The above output record shows that this position occurred 2 times,
and that the position occurred in lines 467 and 468 of the "epd"
input file.

Usage:  epdList alpha.epd

Output: outL.epd 

Comments:

     1. Processing time can be long if the "epd" input file is very
        large.

-Norm