STS 1.0 test result and nitpicking.

jesper_nielsen · Post by **jesper_nielsen** » Thu Aug 27, 2009 3:11 pm

Strategic Test Suite 1.0: Undermining

Result for Pupsi2 0.07

Fast: 10 seconds
57/100 solved
656/1000 points

Medium: 1 minute
67/100 solved
724/1000 points

Slow: 7 minutes
75/100 solved
809/1000 points

More results to follow...

Nitpicking: the id number for 42 is missing an "0": "42" should be "042" to be consistent.
More nitpicking: Position 86 has both "Rc8=1 Rc8=3" as points for Rc8.

And now a question:
How to enterpret the results?
Can the result changes by time be used to conclude presens or absence of chess knowledge in the engine?
In other words: What is needed to improve the result?

Kind regards,
Jesper

swami · Post by **swami** » Thu Aug 27, 2009 5:54 pm

Hi Jesper,

Thanks for posting the results and ofcourse very much thanks for pointing out bugs in the file. Feel free to let us know of any more bugs when you begin to test other test suites! This is the only way we can eradicate bugs.

I don't use Gradual test to find out the score of the engines out of the 1000 point scale. So I wouldn't have known that there was a bug concerning 1 and 3 points being awarded for the same move. It's not "nitpicking" as you termed it, It's a reasonable bug report.

jesper_nielsen wrote:Strategic Test Suite 1.0: Undermining

Result for Pupsi2 0.07

Fast: 10 seconds
57/100 solved
656/1000 points

Medium: 1 minute
67/100 solved
724/1000 points

Slow: 7 minutes
75/100 solved
809/1000 points

More results to follow...

Nitpicking: the id number for 42 is missing an "0": "42" should be "042" to be consistent.
More nitpicking: Position 86 has both "Rc8=1 Rc8=3" as points for Rc8.

And now a question:
How to enterpret the results?
Can the result changes by time be used to conclude presens or absence of chess knowledge in the engine?
In other words: What is needed to improve the result?

Kind regards,
Jesper

I believe this is the reasonable score for the engine that's around 2400. I'm sure you can tune the pawn structures related to undermining as you've better understanding of working of your engine... Perhaps Dann could have something to say regarding this as he had actual experience working on the engine, Beowulf.

I believe one can tune up the engine to the best performance in undermining (as well as other test suites) by changing values related to pawn structures, material values, mobility, king safety, and such - giving bonuses and values and experimenting each changes to see which one scores the most in test suites. Just keep tuning, It should be fun!

If you can impart knowledge like improving search, evaluation or speed up everything little more, that would be even better. Result changes by time as the new version comes along can be used to determine the improvement in the engine.

Feel free to report its results in other suites also, with your modified EPD's in case if you found any bugs. I believe Dann is on vacation and will return a week later.

jesper_nielsen · Post by **jesper_nielsen** » Fri Aug 28, 2009 9:16 am

Bugs or nitpicking.

It just seemed like overkill to call it a bug!

As for the interpretation of the results, I have done the fast and medium runs for the rest of the tests, and it looks like the results from 5.0 gains only very little from spending more time.

One possible interpretation of this is that Pupsi simply does not understand the positions arising from the Knight vs Bishop trades.

So maybe a lack of gain by time indicates a lack of knowledge?!

Anyway Pupsi does clearly best in STS 2.0 with 70/100 and 81/100 respectively for fast and medium tests. I will give the full results once the slow tests begins to finish.

Thanks for all the effort you have put into making these test sets!

Kind regards,
Jesper

P.S. An idea for a test set: Pawn race/pawn endgame. When to exchange to a pawn only endgame. And when NOT to.

swami · Post by **swami** » Fri Aug 28, 2009 9:38 am

jesper_nielsen wrote:Bugs or nitpicking.

It just seemed like overkill to call it a bug!

As for the interpretation of the results, I have done the fast and medium runs for the rest of the tests, and it looks like the results from 5.0 gains only very little from spending more time.

One possible interpretation of this is that Pupsi simply does not understand the positions arising from the Knight vs Bishop trades.

So maybe a lack of gain by time indicates a lack of knowledge?!

Anyway Pupsi does clearly best in STS 2.0 with 70/100 and 81/100 respectively for fast and medium tests. I will give the full results once the slow tests begins to finish.

Thanks for all the effort you have put into making these test sets!

Kind regards,
Jesper

P.S. An idea for a test set: Pawn race/pawn endgame. When to exchange to a pawn only endgame. And when NOT to.

Thanks for the report, and idea as well. I will add the pawn race into the strategical ideas list.

Yes, your observation about the 5th suite is honestly quite true. You might want to tune the values regarding knights/Bishops and see whether Pupsi's choice of trades is justified. Or you might want to implement more new code regarding this. This Bishop/Knight knowledge I believe is really important, I've seen Knight/Bishop trade - offs happening frequently in engine games. I believe Pupsi will improve tremendously if this is done right.

Good to hear that it has good knowledge of open files and diagonals, this is also another important test, it does happen frequently in games as well..

Best Regards,
Swami

STS 1.0 test result and nitpicking.

STS 1.0 test result and nitpicking.

Re: STS 1.0 test result and nitpicking.

Re: STS 1.0 test result and nitpicking.

Re: STS 1.0 test result and nitpicking.