I have used the ECM.epd to test different versions of my engine at a fixed depth. I measure how many nodes were visited, how fast the test terminated and of course how many positions my engine solved.
I ran a test today of a promising new version which was considerably faster but it solved 5 less positions. Disappointed I looked closer into the positions in question and had them all analyzed by chess.com and to my dismay my new version often was either giving an equal or even better bestmove then what was supposed to be best according to the EPD. That means it "failed" tests even though it solved the positions adequately.
Because I never really looked into these details I may have discarded many version for solving less positions that didn't do anything wrong but just found valid alternatives instead. If two moves guarantee a draw (according to chess.com) why would the EPD only list one of them as best move??
Can anyone point me to a better testsuite for my engine where the given best move is actually best - or where there are multiple best moves given if there is not one clearly correct answer?
Here I found a repository with lots of test... but I don't know which ones are "good" to use?
Any Testsuites in EPD format you can recommend?
Moderators: hgm, Rebel, chrisw
-
- Posts: 5239
- Joined: Thu Mar 09, 2006 9:40 am
- Full name: Vincent Lejeune
Re: Any Testsuites in EPD format you can recommend?
Hi Lithander,
From the page https://www.chessprogramming.org/Test-P ... est_Suites
I see 2 interesting test suites :
1) The Win at Chess (aka "WAC") revision from 2018 (200 positions) where a lot of bad positions were deleted : http://www.talkchess.com/forum3/viewtop ... 80#p762480
2) The Strategic_Test_Suite (aka "STS") https://www.chessprogramming.org/Strategic_Test_Suite
a lot of positions, but you can use short time (5 to 10 seconds per positions).
From the page https://www.chessprogramming.org/Test-P ... est_Suites
I see 2 interesting test suites :
1) The Win at Chess (aka "WAC") revision from 2018 (200 positions) where a lot of bad positions were deleted : http://www.talkchess.com/forum3/viewtop ... 80#p762480
2) The Strategic_Test_Suite (aka "STS") https://www.chessprogramming.org/Strategic_Test_Suite
a lot of positions, but you can use short time (5 to 10 seconds per positions).
-
- Posts: 625
- Joined: Fri Mar 30, 2018 7:20 am
- Full name: Andreas Matthies
Re: Any Testsuites in EPD format you can recommend?
I'm using the Arasan test suite for benchmarking progress. Spending 60 seconds on every position.
But don't give too much on how many positions are solved. It can decrease although the engine got better.
What helped me using the test suites in several cases was debugging why in several positions the best move was not found (or found very late). This may lead to code or parameter changes that avoids bad pruning or extends important lines and if these changes pass a usual SPRT test... Elo!
But don't give too much on how many positions are solved. It can decrease although the engine got better.
What helped me using the test suites in several cases was debugging why in several positions the best move was not found (or found very late). This may lead to code or parameter changes that avoids bad pruning or extends important lines and if these changes pass a usual SPRT test... Elo!
-
- Posts: 4840
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: Any Testsuites in EPD format you can recommend?
You might want to try this revised STS. The sts_rating.py in that repo with revised STS can approximate an engine's rating based on CCRL rating list on different engine strengths. Sample output has summaries for top and worst test suite number.lithander wrote: ↑Thu Jul 22, 2021 4:45 pm I have used the ECM.epd to test different versions of my engine at a fixed depth. I measure how many nodes were visited, how fast the test terminated and of course how many positions my engine solved.
I ran a test today of a promising new version which was considerably faster but it solved 5 less positions. Disappointed I looked closer into the positions in question and had them all analyzed by chess.com and to my dismay my new version often was either giving an equal or even better bestmove then what was supposed to be best according to the EPD. That means it "failed" tests even though it solved the positions adequately.
Because I never really looked into these details I may have discarded many version for solving less positions that didn't do anything wrong but just found valid alternatives instead. If two moves guarantee a draw (according to chess.com) why would the EPD only list one of them as best move??
Can anyone point me to a better testsuite for my engine where the given best move is actually best - or where there are multiple best moves given if there is not one clearly correct answer?
Here I found a repository with lots of test... but I don't know which ones are "good" to use?
If you are just interested on the epd, it has fields with solutions in uci move format saving you from parsing the san format, see opcode c9. Example.
Code: Select all
1kr5/3n4/q3p2p/p2n2p1/PppB1P2/5BP1/1P2Q2P/3R2K1 w - - bm f5; id "STS(v1.0) Undermine.001"; c0 "f5=10, Be5+=2, Bf2=3, Bg4=2"; c7 "f5 Be5+ Bf2 Bg4"; c8 "10 2 3 2"; c9 "f4f5 d4e5 d4f2 f3g4";
Perhaps some solutions are no longer optimal if we are going to reanalyze this with sf14 for example. But overall the scoring methods can differentiate which one is strong and which one is weaker.
The regression slope and intercept is at https://github.com/fsmosca/STS-Rating/b ... ng.py#L694
in the sts_rating.py source. This was done on older chess engines.
Would be interesting to see the results comparing the old and new versions of your engine.
I have no time at the moment I plan to rescore this test suite with stockfish.
-
- Posts: 881
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: Any Testsuites in EPD format you can recommend?
Thanks for the input everyone!
Especially the detailed report of the "revised STS" sounds very promising. I'm away from my computer for a week now but I'll give it a try afterwards!
Especially the detailed report of the "revised STS" sounds very promising. I'm away from my computer for a week now but I'll give it a try afterwards!
-
- Posts: 881
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: Any Testsuites in EPD format you can recommend?
Searched 200 positions to depth 11. 709299K nodes visited. Took 984.981 seconds!
Best move found in 194 / 200 positions!
It seems like the tests are a bit on the easy end but maybe that's good because I can lower the depth (or introduce a mode that uses a time budget instead) and get results much quicker. In any case it's good to have another (and hopefully more correct) test suite available! Thanks
-
- Posts: 5239
- Joined: Thu Mar 09, 2006 9:40 am
- Full name: Vincent Lejeune
Re: Any Testsuites in EPD format you can recommend?
I didn't know the strength of your engine.lithander wrote: ↑Fri Jul 23, 2021 4:36 pmSearched 200 positions to depth 11. 709299K nodes visited. Took 984.981 seconds!
Best move found in 194 / 200 positions!
It seems like the tests are a bit on the easy end but maybe that's good because I can lower the depth (or introduce a mode that uses a time budget instead) and get results much quicker. In any case it's good to have another (and hopefully more correct) test suite available! Thanks
But yes, you can set the level to 1 second or 1 million nodes to get a fast result.
-
- Posts: 276
- Joined: Fri Mar 17, 2006 8:01 am
- Location: Russia
- Full name: Vladimir Medvedev
Re: Any Testsuites in EPD format you can recommend?
There was a tool for running test suites automatically in mutli-threaded mode on Windows. It was shipped with a large collection of positions (about 10K) and could estimate engine's rating after solving them in 0.1 sec/position. I found it somewhere on Rebel/ProDeo site (?), but can't find it anymore. I also can't remember tool's name (probably it was some short abbreviation, maybe 4 letters). Could anybody please help with identifying this tool?
-
- Posts: 2926
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: Any Testsuites in EPD format you can recommend?
MEA by Ferdinand Mosca?
https://github.com/fsmosca/Multiple-move-Epd-Analyzer/
Take a look into his repository, there are some tools present:
https://github.com/fsmosca/
And some epd files:
https://github.com/fsmosca/strategic-te ... ain/static
https://github.com/fsmosca/STS-Rating/tree/master/epd
https://github.com/fsmosca/EPD-Analyzer/tree/master/epd
EPD collection by Chris Whittington:
https://github.com/ChrisWhittington/Chess-EPDs
--
Srdja
-
- Posts: 276
- Joined: Fri Mar 17, 2006 8:01 am
- Location: Russia
- Full name: Vladimir Medvedev
Re: Any Testsuites in EPD format you can recommend?
Thank you, looks interesting, I will try this one. But the tool I look for was a standalone executable, not a python script. There was also a sort of pseudo-graphics GUI for running the test.