Testing with different EPD suits for search vs eval changes

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Testing with different EPD suits for search vs eval chan

Post by Michael Sherwin »

jdart wrote:I think you are wasting your time if your goal is to maximize engine strength and you are using test suites as a measure Test suites can provide a rough measure of engine strength, but with a large error bar. They are no good for measuring small changes. I still run test suites but only once in a while as a sanity check.

--Jon
I don't disagree given the state of the test suites that I have dealt with so far. For example out of the 1001 positions in wcsac.epd only 68 are suitable based on the criteria I superficially selected. If the criteria were tightened up further as they probably should be then very few positions would be left. This absolutely means that wcsac.epd is useless for strength testing.

If I were to design an EPD testing solution I would do it by rating class. For example, all engines rated 2000+ in the test group can solve this set 100%. All engines in the test group rated below 1600 solve them 0%.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Testing with different EPD suits for search vs eval chan

Post by Ferdy »

Michael Sherwin wrote:If I were to design an EPD testing solution I would do it by rating class. For example, all engines rated 2000+ in the test group can solve this set 100%. All engines in the test group rated below 1600 solve them 0%.
This is similar to what sts suite is capable of. Strong engines would get most top 1 or 2 moves thereby getting higher points, while weaker engines would generally get only couple of top 1 or 2 moves but would get most of the top 3, 4 or nothing at all.

Merry Christmas to all :)