Hi everyone, I am sorry is this is a FAQ, but I am looking for a utility to search files with thousand of EPD lines for themes like Black queenside castiling, or a given pawn structures, etc., may be combined with scores (ce, centipawn evaluation) higher than a threshold value. Is there any utility, preferably command line linux, that would allow me to do so?
Thanks in advance.
Searching EPD lines
Moderator: Ras
-
Norm Pollock
- Posts: 1087
- Joined: Thu Mar 09, 2006 4:15 pm
- Location: Long Island, NY, USA
Re: Searching EPD lines
Giovanni,giovanni wrote:Hi everyone, I am sorry is this is a FAQ, but I am looking for a utility to search files with thousand of EPD lines for themes like Black queenside castiling, or a given pawn structures, etc., may be combined with scores (ce, centipawn evaluation) higher than a threshold value. Is there any utility, preferably command line linux, that would allow me to do so?
Thanks in advance.
I might be interested in writing something along the lines of what you are asking for. However I only write command-line "Windows" tools with "Java" versions. With Linux, you would have to use a Java environment.
You are asking for 2 different extractions. The first one would require you to provide a "mask" of the structure you want to extract. For example:
2kr4/8/8/8/8/8/8/8
8/8/8/8/3PP3/P1P5/1P3PPP/8
Then the tool would extract epd records matching the mask.
A single mask, rather than a file of many masks, would be better because the latter would not separate different structures. Also some records could have more than one of the structures you are looking for, and if multiple masks were used, the record would only be extracted once.
The other aspect of your request is to extract records based on centipawn evaluation. Centipawn evaluation is in a different section of the epd record -- the optional opcode section. Handling centipawn evaluation requires a different tool. For that, I have written a tool, but it is a command-line Windows tool with a Java version. See "epdOrder" in "40H-EPD" utility suite. (link at www below). It does not extract, but instead sorts the records by ce value. From a text editor you can copy and paste the records you want. From its readme:
"epdOrder" sorts the records in descending order based on the
value of the "centipawn evaluation" ("ce") opcode.
-Norm
-
giovanni
- Posts: 142
- Joined: Wed Jul 08, 2015 12:30 pm
Re: Searching EPD lines
Hi, Norm.
Many thanks for this further contribution you could give to our community. It would perfectly suit my needs.
BTW, I am a linux guy, but I found out that your utilities run well under wine. So probably they are used even more than you think.
Looking forward to hear from you again.
Many thanks for this further contribution you could give to our community. It would perfectly suit my needs.
BTW, I am a linux guy, but I found out that your utilities run well under wine. So probably they are used even more than you think.
Looking forward to hear from you again.
-
Norm Pollock
- Posts: 1087
- Joined: Thu Mar 09, 2006 4:15 pm
- Location: Long Island, NY, USA
Re: Searching EPD lines
Here is v1.00 of epdMask.exe
http://www.mediafire.com/download/u8r92 ... pdMask.exe
It seems to be ok, but needs more testing.
usage:
epdMask alpha.epd mask_position
example:
epdMask alpha.epd 2kr4/8/8/8/8/8/8/8
output:
outK.epd (no input line number opcode)
outK2.epd (with input line number opcode)
http://www.mediafire.com/download/u8r92 ... pdMask.exe
It seems to be ok, but needs more testing.
usage:
epdMask alpha.epd mask_position
example:
epdMask alpha.epd 2kr4/8/8/8/8/8/8/8
output:
outK.epd (no input line number opcode)
outK2.epd (with input line number opcode)
-
giovanni
- Posts: 142
- Joined: Wed Jul 08, 2015 12:30 pm
Re: Searching EPD lines
Hi Norm. Thanks for for your file. It works great, but it choked on files greater than 2000 EPD lines, complaining that it could not find the file, i.e.,Norm Pollock wrote:Here is v1.00 of epdMask.exe
http://www.mediafire.com/download/u8r92 ... pdMask.exe
It seems to be ok, but needs more testing.
usage:
epdMask alpha.epd mask_position
example:
epdMask alpha.epd 2kr4/8/8/8/8/8/8/8
output:
outK.epd (no input line number opcode)
outK2.epd (with input line number opcode)
......
wine epdMask.exe Results/all.epd 2kr4/8/8/8/8/8/8/8
File not found!
......
Thanks again for your help and assistance.
Giovanni
-
Norm Pollock
- Posts: 1087
- Joined: Thu Mar 09, 2006 4:15 pm
- Location: Long Island, NY, USA
Re: Searching EPD lines
I just tested in on an epd file with over 9 million lines (records plus blank lines) and 9 million records. Everything worked fine.giovanni wrote:Hi Norm. Thanks for for your file. It works great, but it choked on files greater than 2000 EPD lines, complaining that it could not find the file, i.e.,Norm Pollock wrote:Here is v1.00 of epdMask.exe
http://www.mediafire.com/download/u8r92 ... pdMask.exe
It seems to be ok, but needs more testing.
usage:
epdMask alpha.epd mask_position
example:
epdMask alpha.epd 2kr4/8/8/8/8/8/8/8
output:
outK.epd (no input line number opcode)
outK2.epd (with input line number opcode)
......
wine epdMask.exe Results/all.epd 2kr4/8/8/8/8/8/8/8
File not found!
......
Thanks again for your help and assistance.
Giovanni
I suspect 2 things may have happened in your testing.
First, your epd file may have had some strange characters or whatever, at about line 2000. epdMask only accepts epd records and blank lines. Other stuff should be ignored. Or it could be UTF-8 encoding (check 1st 3 characters on 1st line with a hex editor). I have run into UTF-8 encoding in pgn files, and they screwed up my PGN tools until I found out how to remove that encoding.
Second, it could be Wine. I don't mention Wine in the readme because other users have had occasional trouble using it with my tools. Try using a Java environment instead.
Try other epd files, particularly ones that you download, not created by you.
If the problem persists, I would have to see the data files you are using.
-
giovanni
- Posts: 142
- Joined: Wed Jul 08, 2015 12:30 pm
Re: Searching EPD lines
Norm Pollock wrote:I just tested in on an epd file with over 9 million lines (records plus blank lines) and 9 million records. Everything worked fine.giovanni wrote:Hi Norm. Thanks for for your file. It works great, but it choked on files greater than 2000 EPD lines, complaining that it could not find the file, i.e.,Norm Pollock wrote:Here is v1.00 of epdMask.exe
http://www.mediafire.com/download/u8r92 ... pdMask.exe
It seems to be ok, but needs more testing.
usage:
epdMask alpha.epd mask_position
example:
epdMask alpha.epd 2kr4/8/8/8/8/8/8/8
output:
outK.epd (no input line number opcode)
outK2.epd (with input line number opcode)
......
wine epdMask.exe Results/all.epd 2kr4/8/8/8/8/8/8/8
File not found!
......
Thanks again for your help and assistance.
Giovanni
I suspect 2 things may have happened in your testing.
First, your epd file may have had some strange characters or whatever, at about line 2000. epdMask only accepts epd records and blank lines. Other stuff should be ignored. Or it could be UTF-8 encoding (check 1st 3 characters on 1st line with a hex editor). I have run into UTF-8 encoding in pgn files, and they screwed up my PGN tools until I found out how to remove that encoding.
Second, it could be Wine. I don't mention Wine in the readme because other users have had occasional trouble using it with my tools. Try using a Java environment instead.
Try other epd files, particularly ones that you download, not created by you.
If the problem persists, I would have to see the data files you are using.
Hi Norm. Thanks for your troubleshooting. You were indeed right: the problem was with UTF-8 encoding. Things improved considerably when I cleaned the file with the command:
iconv -f utf-8 -t utf-8 -c all.epd >all_clean.epd
When fed with the cleaned file, sometimes the program still complains or acts silently but at the end it still produces the two output files. This is already good enough for me, but I'll try also your suggestion of running things in a Java environment.
Thanks again.
Giovanni
-
Norm Pollock
- Posts: 1087
- Joined: Thu Mar 09, 2006 4:15 pm
- Location: Long Island, NY, USA
Re: Searching EPD lines
Hi Giovanni,
Please try this updated version 1.01 of "epdMask" and let me know if it is an improvement. It avoids the issue with UTF-8 encoding. Output does not have UTF-8 encoding.
http://www.mediafire.com/download/u8r92 ... pdMask.exe
-Norm
Please try this updated version 1.01 of "epdMask" and let me know if it is an improvement. It avoids the issue with UTF-8 encoding. Output does not have UTF-8 encoding.
http://www.mediafire.com/download/u8r92 ... pdMask.exe
-Norm
-
giovanni
- Posts: 142
- Joined: Wed Jul 08, 2015 12:30 pm
Re: Searching EPD lines
Hi,Norm.Norm Pollock wrote:Hi Giovanni,
Please try this updated version 1.01 of "epdMask" and let me know if it is an improvement. It avoids the issue with UTF-8 encoding. Output does not have UTF-8 encoding.
http://www.mediafire.com/download/u8r92 ... pdMask.exe
-Norm
Thanks for the new file. It definitely solves all my UTF-8 problems, since it can read all the files without prior cleaning. Howver, there is still left one stubborn file that I can't process:
giovanni@giovanni-Lenovo-G505s:~/Downloads$ wine epdMask.exe Results/KaroKann_2.epd 2kr4/8/8/8/8/8/8/8
err:seh:setup_exception_record stack overflow 1152 bytes in thread 0043 eip 7bc462ef esp 009d0eb0 stack 0x9d0000-0x9d1000-0xbd0000
Sometimes I get also this other error message;
giovanni@giovanni-Lenovo-G505s:~/Downloads$ wine epdMask.exe Results/KaroKann_2.epd 2kr4/8/8/8/8/8/8/8
Exception in thread "LibgcjInternalFinalizerThread" err:seh:raise_exception Unhandled exception code c0000005 flags 0 addr 0x448f70
This is a 4500 lines epd files and, if needed, I could sent it over via wetransfer.
Thanks again for your help.
Giovanni
-
Norm Pollock
- Posts: 1087
- Joined: Thu Mar 09, 2006 4:15 pm
- Location: Long Island, NY, USA
Re: Searching EPD lines
Giovanni,
I sent you a PM
-Norm
I sent you a PM
-Norm