Problem suite testing - how to extract a useful number

Discussion of chess software programming and technical issues.

Moderator: Ras

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Problem suite testing - how to extract a useful number

Post by bob »

Don wrote:
bob wrote:
Don wrote:
bob wrote:Don, I have tried lots of ways to measure improvements, but I have never found anything that worked. Except for relying on playing _lots_ of games against a pool of opponents...
I do not use problem sets to tune or improve the play of the program.

That's not the issue here. I am building a suite of benchmarking tests of various kinds just to document the characteristics of the program over time as I improve it.

So this serves a similar purpose to "unit tests" except that you don't pass or fail. Before I commit a "release" version I will run a suite of various kinds of tests to document the state of the program, but I won't necessarily try to "interpret" the results, they will simply stand on their own.

It will be most useful if suddenly the score drops in a big way for instance. Then I am alerted to a possible problem and I must either justify it, or fix it.
I always have a few sanity tests (WAC at 1 sec/move, etc). But I also use our cluster to "sanity-test" against a suite of opponents as well, so that if the score suddenly drops against one or more, then it requires analysis...

I'm more interested in answering the question "is A' (new version) better than A (previous version)?" which is a tough one to answer with any accuracy...
You can do this if you are willing to test at fast levels, which amounts to about 1 second per game on average. The very best programs can probably approach master strength at this time control so it's not that ridiculous. Overnight, you can get tens of thousands of games in at this time control and come within approximately 5 ELO or something like that with high confidence. If you have several CPU's lying around, you can get results relatively quickly. If someone were to pay you a million dollars to build a strong PC program, the first thing you would want to do is go out and buy a bunch of quad cores just for testing.

Of course I realize this is only an approximation because you normally don't play at 1 second per game! But it might work especially well for things like evaluation improvements.

- Don
I can play 260 games at a time, so I don't have to go quite that fast. In fact I typically play 10 minute games. three per hour times 260 machines becomes about a thousand per hour since not all go to the full 10 minutes. Doesn't take long to add 'em up. :)
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Problem suite testing - how to extract a useful number

Post by Don »

260 games at a time is great. I think 10 minute games simulate tournament time controls pretty closely, without taking too much time.