swami wrote:Thanks for running the tests. Very interesting results, I would think there's need for 20 more STS suites in order to lessen the error probability.
swami wrote:Thanks for running the tests. Very interesting results, I would think there's need for 20 more STS suites in order to lessen the error probability.
Though 3.Naum 4.Stockfish 5.Shredder 6.Zappa
seems to be in correct order.
Not exactly
Stockfish seems to be stronger than Naum
Uri
Perhaps slightly stronger. There's still a need for 20 more suites for perfect comparison. Atleast partial idea can be gotten from these suites. For example, Shredder and Zappa are in right place, Amyan is the best in the Division 4 which I had organized. etc
Also, there's individual importance of certain test suite that plays a great role than an overall score. Stockfish scored a lot more than Naum in "Square Vacancy" which by the way is most important strategical theme than say, Knight outposts/Offer of Simplification etc...
swami wrote:Thanks for running the tests. Very interesting results, I would think there's need for 20 more STS suites in order to lessen the error probability.
Though 3.Naum 4.Stockfish 5.Shredder 6.Zappa
seems to be in correct order.
Not exactly
Stockfish seems to be stronger than Naum
Uri
Perhaps slightly stronger. There's still a need for 20 more suites for perfect comparison.
Atleast partial idea can be gotten from these suites. For example, Shredder and Zappa are in right place, Amyan is the best in the Division 4 which I had organized. etc
Also, there's individual importance that plays a great role than an overall score.
Stockfish scored a lot more than Naum in "Square Vacancy" which by the way is most important strategical theme than say, Knight outposts/Offer of Simplification etc...
Thanks for the strategy test suites--it must be a lot of work. I am a bit surprised by the scores...am not sure what could account for the final outcome. As for Stockfish 1.6 being stronger than Naum 4, it may seem so, but I think more games are needed. I am running a series of tournaments with no book, and Stockfish is showing great promise.
swami wrote:
Also, there's individual importance of certain test suite that plays a great role than an overall score.
Yes, this is a sensible point. If you really want to get an idea of engine strength from tests scores then I would think you need to weight the scores according to their importance.
Should be not difficult to find the weights because you can use the official rarting lists as reference and modify the score weights until the weighted tests results reflect (more or less) the official lists.
This could be also interesting to see what is more important in chess playing among the various subjects.
I am running this test again. I would like to see how consistent the results are. The only change is hash size from 64 MB to 128 MB. I should have the results in a few hours.
kingliveson wrote:I am running this test again. I would like to see how consistent the results are. The only change is hash size from 64 MB to 128 MB. I should have the results in a few hours.
Thank you for the test. The design is interesting.
The rank order remains the same with more hash but we learn that Naum tends to plays better with more hash values. It's most affected by hash sizes relative to the rest of the engines.
Other Points to consider:
8 tests obviously wouldn't give clear picture. Though it could give rough strength of various engines.
More hash values will have effect only if the time control is more. For 10 seconds something less hash values works as it makes the search faster.
These tests are obviously strategical. So perhaps Stockfish is better at Tactics than Strategy than Naum.