I have seen the same thing (loss of connection to the output file)can00336 wrote:I have tried three different versions of Arena. They all stop communicating with the engine at some random point during long runs.Dann Corbit wrote:Use Arena.
You can run an EPD test suite with that.
I think that ChessGui can do it also, but I did not try it myself.
I haven't tried ChessGui, but I would prefer a cmd line interface, if possible.
Thanks for the suggestion!
Best EPD Testing Software
Moderators: hgm, Rebel, chrisw
-
- Posts: 12564
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Best EPD Testing Software
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 658
- Joined: Wed Mar 08, 2006 8:58 pm
Re: Best EPD Testing Software
You may do
Kind regards
Bernhard
Code: Select all
polyglot.exe epd-test -min-time 0.1 -max-time 0.50 -min-depth 12 -max-depth 127 -depth-delta 5 -epd G:\Testsets\EET_1.epd
Bernhard
-
- Posts: 4840
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: Best EPD Testing Software
There are 5 output files generated.
1. AET_result_summary.txt
Append mode all engine tests are recorded here. It can take any normal epd with am and bm too.
2. <engine>_log.txt
Overwrite mode, this is for manual inspection of engine analysis and solutions.
3. <engine>_test_details.txt
Overwrite mode, some test details on test conditions and engine's score cp/mate and time elapsed calculations.
4. <engine>_not_solved.epd
5. <engine>_solved.epd
Command line options:
That movetime is in milliseconds.
The --log is for engine log only, solved, unsolved and others are always written.
The --option has the format
That is for single option only. Note about the double quotes.
For two or more options separate it with a comma.
There is option --name <engine name> for customized engine options, example.
That name will be displayed in the summary as well as it is used in <name>_log.txt and others.
Limitation:
1. This tool is not capable of handling an epd with both am and bm in it. It is intended only for either all am or all bm in the epd. Epd's without bm or am will be skipped.
2. It does not interrupt the engine search even when the solution is seen early. It assumes that uci engine follows the command
go movetime <time in millisec>
and should stop searching once it reaches that time limit.
This was only tested on windows 7.
Test it perhaps you may find errors especially in the engine log output. Next version will output file in csv format for viewing in spreadsheet app.
This tool uses python-chess library for converting the engine uci move to SAN move. I can then compare the am and bm in epd to determine if the move has matched or not. The script was converted to exe using py2exe app.
I will release the messy script once this tool is usable and stable.
Download the exe file.
https://app.box.com/s/fm0pv5s9gfvymnek2loaggp7mjegeeg5
Download the sample batch file.
https://app.box.com/s/ses501locnbgaaytdjr6qjbua3g4q8tz
1. AET_result_summary.txt
Append mode all engine tests are recorded here. It can take any normal epd with am and bm too.
Code: Select all
Engine Hash(mb) Thre Time/pos(s) TotalTime Positions Correct % TestFile
Stockfish 7Beta1 64 POPCNT 128 1 0.200 00h:00m:50s:448ms 250 40 16.0 arasan18.epd
Houdini 4 x64 Tactical 128 1 0.200 00h:00m:50s:000ms 250 25 10.0 arasan18.epd
Houdini 4 x64 128 1 0.200 00h:00m:50s:000ms 250 15 6.0 arasan18.epd
Deuterium v2015.1.35.321 offensive 128 1 0.200 00h:00m:50s:699ms 250 68 27.2 arasan18.epd
Deuterium v2015.1.35.321 128 1 0.200 00h:00m:50s:699ms 250 11 4.4 arasan18.epd
Deuterium v2015.1.35.321 128 1 0.200 00h:00m:20s:280ms 100 32 32.0 sts15.epd
Deuterium v2015.1.35.321 offensive 128 1 0.200 00h:00m:20s:279ms 100 47 47.0 sts15.epd
Arasan 18.2 128 1 0.200 00h:00m:50s:700ms 250 10 4.0 arasan18.epd
Hakkapeliitta 3.0 x64 128 1 0.200 00h:00m:50s:407ms 250 30 12.0 arasan18.epd
Fire 4 x64 128 1 0.200 00h:00m:29s:454ms 250 21 8.4 arasan18.epd
Overwrite mode, this is for manual inspection of engine analysis and solutions.
Code: Select all
Starting engine stockfish_15122720_x64_modern.exe ...
>> uci
<< Stockfish 7Beta1 64 POPCNT by T. Romstad, M. Costalba, J. Kiiski, G. Linscott
<< id name Stockfish 7Beta1 64 POPCNT
<< id author T. Romstad, M. Costalba, J. Kiiski, G. Linscott
<<
<< option name Write Debug Log type check default false
<< option name Contempt type spin default 0 min -100 max 100
<< option name Threads type spin default 1 min 1 max 128
<< option name Hash type spin default 16 min 1 max 1048576
<< option name Clear Hash type button
<< option name Ponder type check default false
<< option name MultiPV type spin default 1 min 1 max 500
<< option name Skill Level type spin default 20 min 0 max 20
<< option name Move Overhead type spin default 30 min 0 max 5000
<< option name Minimum Thinking Time type spin default 20 min 0 max 5000
<< option name Slow Mover type spin default 84 min 10 max 1000
<< option name nodestime type spin default 0 min 0 max 10000
<< option name UCI_Chess960 type check default false
<< option name SyzygyPath type string default <empty>
<< option name SyzygyProbeDepth type spin default 1 min 1 max 100
<< option name Syzygy50MoveRule type check default true
<< option name SyzygyProbeLimit type spin default 6 min 0 max 6
<< uciok
>> setoption name Hash value 128
>> setoption name Threads value 1
Pos 1
r1bq1r1k/p1pnbpp1/1p2p3/6p1/3PB3/5N2/PPPQ1PPP/2KR3R w - - bm g4; id "arasan18.1"; c0 "J. Polgar-Berkes, Budapest Hunguest Hotels 2003";
2015-12-30T05:58:48.253000 >> isready
2015-12-30T05:58:48.253000 << readyok
2015-12-30T05:58:48.253000 >> ucinewgame
2015-12-30T05:58:48.253000 >> position fen r1bq1r1k/p1pnbpp1/1p2p3/6p1/3PB3/5N2/PPPQ1PPP/2KR3R w - - 0 1
2015-12-30T05:58:48.253000 >> go movetime 200
2015-12-30T05:58:48.284000 << info depth 1 seldepth 1 multipv 1 score cp 120 nodes 41 nps 41000 tbhits 0 time 1 pv e4a8
2015-12-30T05:58:48.284000 << info depth 2 seldepth 2 multipv 1 score cp 68 nodes 90 nps 90000 tbhits 0 time 1 pv e4a8 g5g4
2015-12-30T05:58:48.284000 << info depth 3 seldepth 3 multipv 1 score cp -125 nodes 160 nps 160000 tbhits 0 time 1 pv e4a8 g5g4 c2c3 g4f3
2015-12-30T05:58:48.284000 << info depth 4 seldepth 4 multipv 1 score cp 49 nodes 351 nps 351000 tbhits 0 time 1 pv g2g4 d7f6 e4a8 f6g4
2015-12-30T05:58:48.284000 << info depth 5 seldepth 5 multipv 1 score cp 73 nodes 619 nps 309500 tbhits 0 time 2 pv e4a8 c8a6 a8c6 g5g4 f3e1
2015-12-30T05:58:48.284000 << info depth 6 seldepth 7 multipv 1 score cp 123 nodes 1178 nps 589000 tbhits 0 time 2 pv e4a8 e7f6 a8c6 g5g4 f3e1 a7a6
2015-12-30T05:58:48.284000 << info depth 7 seldepth 8 multipv 1 score cp -129 nodes 3152 nps 1050666 tbhits 0 time 3 pv h2h4 g5g4 e4a8 g4f3 a8f3 c8a6 g2g3
2015-12-30T05:58:48.284000 << info depth 8 seldepth 8 multipv 1 score cp -112 nodes 3993 nps 998250 tbhits 0 time 4 pv h2h4 g5g4 e4a8 c8a6 a8c6 g4f3 c6f3 e7h4
2015-12-30T05:58:48.284000 << info depth 9 seldepth 12 multipv 1 score cp -92 nodes 6562 nps 1312400 tbhits 0 time 5 pv e4a8 g5g4 d2e2 g4f3 a8f3 d7f6 c1b1 a7a6 h2h4
2015-12-30T05:58:48.299000 << info depth 10 seldepth 13 multipv 1 score cp -107 nodes 13477 nps 1497444 tbhits 0 time 9 pv e4a8 g5g4 d2e2 g4f3 a8f3 a7a5 c1b1 d7f6 h2h4 h8g8 g2g3 c8d7
2015-12-30T05:58:48.299000 << info depth 11 seldepth 16 multipv 1 score cp -106 nodes 21058 nps 1504142 tbhits 0 time 14 pv e4a8 g5g4 d2e2 g4f3 a8f3 d7f6 c1b1 h8g8 h2h4 c8d7 g2g4 f6d5 g4g5
2015-12-30T05:58:48.299000 << info depth 12 seldepth 18 multipv 1 score cp -99 nodes 31810 nps 1590500 tbhits 0 time 20 pv e4a8 g5g4 d2e2 g4f3 a8f3 d7f6 c1b1 c8d7 h2h4 a7a5 g2g4 f6d5 f3d5 e6d5
2015-12-30T05:58:48.315000 << info depth 13 seldepth 19 multipv 1 score cp -89 nodes 56711 nps 1667970 tbhits 0 time 34 pv e4a8 g5g4 d2e2 g4f3 a8f3 d7f6 c1b1 c8d7 h2h4 a7a5 g2g4 f6d5 g4g5 h8g8 a2a3
2015-12-30T05:58:48.377000 << info depth 14 seldepth 20 multipv 1 score cp -91 nodes 154867 nps 1683336 tbhits 0 time 92 pv e4a8 g5g4 d2e2 g4f3 a8f3 d7f6 c1b1 c8d7 h2h4 f6d5 g2g3 d7a4 a2a3 a7a5 h4h5 e7g5 h5h6 g5h6
2015-12-30T05:58:48.487000 << info nodes 346834 time 202
2015-12-30T05:58:48.487000 << bestmove e4a8 ponder g5g4
epd bm: g4
Engine bestmove in san: Bxa8
Engine bestmove does not match the epd bm??
Total position to be evaluated: 250
Correct: 0, Evaluated: 1, CorrectRate: 0.0%
Overwrite mode, some test details on test conditions and engine's score cp/mate and time elapsed calculations.
Code: Select all
AET - Arasan EPD Tester v1.0
Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
Physical Cores: 4, Hyper-Threading: ON
Physical Memory: Total = 12 GB, Available = 8 GB
Engine: Stockfish 7Beta1 64 POPCNT
Hash: 128, Threads: 1, Time: 0.2s/pos
Test file: arasan18.epd, TotalPos 250
AnalyzedPos : 250, Correct: 40 (16.00%)
Total time as reported by engine : 00h:00m:50s:448ms
Expected time based on time/pos : 00h:00m:50s:000ms
Engine start/quit wall time elapsed : 00h:00m:54s:509ms
Pos Correct EngineBM ScoreCP Mate EPD
1 0 Bxa8 -91 - r1bq1r1k/p1pnbpp1/1p2p3/6p1/3PB3/5N2/PPPQ1PPP/2KR3R w - - bm g4; id "arasan18.1"; c0 "J. Polgar-Berkes, Budapest Hunguest Hotels 2003";
2 0 Bb3 -21 - r1b2rk1/1p1nbppp/pq1p4/3B4/P2NP3/2N1p3/1PP3PP/R2Q1R1K w - - bm Rxf7; id "arasan18.2"; c0 "Van der Wiel-Ribli, IBM Amsterdam 1980";
3 1 g5 +136 - r1br2k1/pp2qpp1/1b2p2p/3nB3/6P1/3B4/PPPNQP1P/1K1R3R w - - bm g5; id "arasan18.3"; c0 "Victorious (Stockfish 191013SL)-AKIM(Houdini 3 Pro), playchess.com 2013";
5. <engine>_solved.epd
Command line options:
Code: Select all
ArasanEpdTester_v1 -f "arasan18.epd" -e "stockfish_15122720_x64_modern.exe" --movetime 200 --log --option "Hash value 128, Threads value 1"
The --log is for engine log only, solved, unsolved and others are always written.
The --option has the format
Code: Select all
-- option "<option name> value <option value>"
For two or more options separate it with a comma.
Code: Select all
ArasanEpdTester_v1 -f "arasan18.epd" -e "stockfish_15122720_x64_modern.exe" --movetime 200 --log --option "Hash value 128, Threads value 1"
Code: Select all
ArasanEpdTester_v1 -f "arasan18.epd" -e "H4.exe" --movetime 200 --log --option "Hash value 128, Threads value 1, Tactical Mode value true" --name "Houdini 4 x64 Tactical"
Limitation:
1. This tool is not capable of handling an epd with both am and bm in it. It is intended only for either all am or all bm in the epd. Epd's without bm or am will be skipped.
2. It does not interrupt the engine search even when the solution is seen early. It assumes that uci engine follows the command
go movetime <time in millisec>
and should stop searching once it reaches that time limit.
This was only tested on windows 7.
Test it perhaps you may find errors especially in the engine log output. Next version will output file in csv format for viewing in spreadsheet app.
This tool uses python-chess library for converting the engine uci move to SAN move. I can then compare the am and bm in epd to determine if the move has matched or not. The script was converted to exe using py2exe app.
I will release the messy script once this tool is usable and stable.
Download the exe file.
https://app.box.com/s/fm0pv5s9gfvymnek2loaggp7mjegeeg5
Download the sample batch file.
https://app.box.com/s/ses501locnbgaaytdjr6qjbua3g4q8tz
-
- Posts: 4368
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: Best EPD Testing Software
Can polyglot output multi-pv solutions for a test suite?
--Jon
--Jon
-
- Posts: 4840
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: Best EPD Testing Software
I can't understand that question. Can you describe a sample situation?jdart wrote:Can polyglot output multi-pv solutions for a test suite?
--Jon
-
- Posts: 12564
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Best EPD Testing Software
Perhaps like gradualtest.Ferdy wrote:I can't understand that question. Can you describe a sample situation?jdart wrote:Can polyglot output multi-pv solutions for a test suite?
--Jon
Some test sets have different scores for different move choices. Like Tony Hedlund's positional test suite:
http://privat.bahnhof.se/wb432434/fentest.htm
or STS:
https://sites.google.com/site/strategictestsuite/
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 4840
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: Best EPD Testing Software
I thought of something like this.Dann Corbit wrote:Perhaps like gradualtest.Ferdy wrote:I can't understand that question. Can you describe a sample situation?jdart wrote:Can polyglot output multi-pv solutions for a test suite?
--Jon
Some test sets have different scores for different move choices. Like Tony Hedlund's positional test suite:
http://privat.bahnhof.se/wb432434/fentest.htm
or STS:
https://sites.google.com/site/strategictestsuite/
Given epd with bm e2e4, let the engine run in multipv say 3.
If pvmove1 and epd bm e2e4 is not the same, then compare it
with pvmove2, then pvmove3, it gets points if any of the 3 pvmoves from multipv is a match. The pvmove1 gets a high score if it is a match, pvmove3 gets a lowest score if it is a match.
-
- Posts: 4368
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: Best EPD Testing Software
I mean, output the n best solutions, not just the single best, with scores.
--Jon
--Jon
-
- Posts: 4840
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: Best EPD Testing Software
What if the epd has only one bm?jdart wrote:I mean, output the n best solutions, not just the single best, with scores.
--Jon
I think polyglot can be revised to output n bestmove though.
-
- Posts: 4368
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: Best EPD Testing Software
by n best, I mean what the engine thinks are the best moves, regardless of the bm tag.
You really need to do this if you are not certain of the quality of the testsuite. Many tests are "busted" in that the allegedly best move is not actually the best, or have an alternate solution that is practically as good as the best one.
--Jon
You really need to do this if you are not certain of the quality of the testsuite. Many tests are "busted" in that the allegedly best move is not actually the best, or have an alternate solution that is practically as good as the best one.
--Jon