Testing with different EPD suits for search vs eval changes

Michael Sherwin · Post by **Michael Sherwin** » Sat Dec 24, 2016 5:56 pm

jdart wrote:I think you are wasting your time if your goal is to maximize engine strength and you are using test suites as a measure Test suites can provide a rough measure of engine strength, but with a large error bar. They are no good for measuring small changes. I still run test suites but only once in a while as a sanity check.

--Jon

I don't disagree given the state of the test suites that I have dealt with so far. For example out of the 1001 positions in wcsac.epd only 68 are suitable based on the criteria I superficially selected. If the criteria were tightened up further as they probably should be then very few positions would be left. This absolutely means that wcsac.epd is useless for strength testing.

If I were to design an EPD testing solution I would do it by rating class. For example, all engines rated 2000+ in the test group can solve this set 100%. All engines in the test group rated below 1600 solve them 0%.

Ferdy · Post by **Ferdy** » Sun Dec 25, 2016 2:53 am

Michael Sherwin wrote:If I were to design an EPD testing solution I would do it by rating class. For example, all engines rated 2000+ in the test group can solve this set 100%. All engines in the test group rated below 1600 solve them 0%.

This is similar to what sts suite is capable of. Strong engines would get most top 1 or 2 moves thereby getting higher points, while weaker engines would generally get only couple of top 1 or 2 moves but would get most of the top 3, 4 or nothing at all.

Merry Christmas to all

Testing with different EPD suits for search vs eval changes

Re: Testing with different EPD suits for search vs eval chan

Re: Testing with different EPD suits for search vs eval chan