Ferdy wrote: ↑Thu Apr 11, 2019 2:39 am
Spill_The_Tea wrote: ↑Thu Apr 11, 2019 12:53 am
Hi Ferdy,
Big Fan of your work and chess tools.
If this thread is still active, I would like to make a request.
Recently, I have been collecting epd testsuites; however, testsuites are notoriously incestuous, meaning they almost always include identical positions often with different ids.
In contrast, managing pgn databases is a bit easier to remove duplicate games using tools such as pgn-extract, but I have not found any similar tool to manage epd databases (specifically removing duplicate fen positions from epd files).
Right now, the easiest way to accomplish this is to strip all op codes from a database (using scidvspc tools) and then use a simple awk expression in terminal to remove duplicate lines: awk '!seen[$0]++' Input.epd > Output.epd
But this simply perpetuates the same problem I am describing, because I end up prescribing a different ID to positions already accounted for.
Could you create a tool to manage epd databases, that parses for the FEN of each epd position, and remove the lines of any subsequent duplicates?
Given epd's
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - bm e4; id "my test 1";
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - bm d4; id "my test 2";
Are the two epd's above identical? Perhaps not.
But this one has, and 2nd epd can be removed.
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - bm e4; id "my test 1";
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - bm e4; id "my test 2";
What if there are other opcodes? But it seems like [piece locations] [stm] [castle right] [ep square] bm [move] can be considered as independent and can be used to identify identical, depends on your criteria too.
I think you need to give me some examples which epd's are considered identical.
Hi Ferdy,
Right, I have previously found situations like this before in my database, where identical positions had different best moves. So I am asking for an imperfect solution, and ignore the bm/am/pm op code. Here is an example I found in various sources I pulled together:
Example Pooled EPD File (i.e. Input):
line 1: 8/2N4r/1p3pkp/8/5K1p/2P4N/P3Bn2/8 w - - bm Bh5+; id "W.Eigenmann: Brillant 002";
line 2 : 8/2N4r/1p3pkp/8/5K1p/2P4N/P3Bn2/8 w - - id ?; bm Bh5;
line 3 : 8/2p1k3/3p3p/2PP1pp1/1P1K1P2/6P1/8/8 w - - bm g4; id "arasan19.45"; c0 "Camacho Martinez-An. C. Hernandez, Cuba 1995";
line 4 : 8/2p1k3/3p3p/2PP1pp1/1P1K1P2/6P1/8/8 w - - bm g4; id peg124;
Desired Output:
line 1: 8/2N4r/1p3pkp/8/5K1p/2P4N/P3Bn2/8 w - - bm Bh5+; id "W.Eigenmann: Brillant 002";
line 2 : 8/2p1k3/3p3p/2PP1pp1/1P1K1P2/6P1/8/8 w - - bm g4; id "arasan19.45"; c0 "Camacho Martinez-An. C. Hernandez, Cuba 1995";
And If the lines in the example input file were reversed, then lines 4 and 2 would be printed instead:
line 1 : 8/2N4r/1p3pkp/8/5K1p/2P4N/P3Bn2/8 w - - id ?; bm Bh5;
line 2 : 8/2p1k3/3p3p/2PP1pp1/1P1K1P2/6P1/8/8 w - - bm g4; id peg124;
Please note that there are also situations with more than two replicates, for example:
8/2p1p1pp/2P1P2k/6pP/p7/P4Bp1/1P4P1/n5K1 w - - bm Bd1; id "MES.256";
8/2p1p1pp/2P1P2k/6pP/p7/P4Bp1/1P4P1/n5K1 w - - bm Bd1; id "Holmes Endgame Pos. 1362"; c0 "white pieces=8 black pieces=9"; c1 "material balance: -1,0";
8/2p1p1pp/2P1P2k/6pP/p7/P4Bp1/1P4P1/n5K1 w - - bm Bd1; c0 "level: med-3"; id "EG 0866";