jdart wrote:I think you are wasting your time if your goal is to maximize engine strength and you are using test suites as a measure Test suites can provide a rough measure of engine strength, but with a large error bar. They are no good for measuring small changes. I still run test suites but only once in a while as a sanity check.
--Jon
I don't disagree given the state of the test suites that I have dealt with so far. For example out of the 1001 positions in wcsac.epd only 68 are suitable based on the criteria I superficially selected. If the criteria were tightened up further as they probably should be then very few positions would be left. This absolutely means that wcsac.epd is useless for strength testing.
If I were to design an EPD testing solution I would do it by rating class. For example, all engines rated 2000+ in the test group can solve this set 100%. All engines in the test group rated below 1600 solve them 0%.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Michael Sherwin wrote:If I were to design an EPD testing solution I would do it by rating class. For example, all engines rated 2000+ in the test group can solve this set 100%. All engines in the test group rated below 1600 solve them 0%.
This is similar to what sts suite is capable of. Strong engines would get most top 1 or 2 moves thereby getting higher points, while weaker engines would generally get only couple of top 1 or 2 moves but would get most of the top 3, 4 or nothing at all.