jwes wrote:Would it be possible for you to upload a file that has the results for each position in each test for each engine? Then I could run some statistics to see how well each position is correlated.
That would be difficult unless there's some automatic tool to do it.
If somebody knows the short cut to fetching such information, that would be of great help.
jwes wrote:Would it be possible for you to upload a file that has the results for each position in each test for each engine? Then I could run some statistics to see how well each position is correlated.
That would be difficult unless there's some automatic tool to do it.
If somebody knows the short cut to fetching such information, that would be of great help.
EDIT: whoops, Logfiles have been overwritten.
Do the ChessGUI debug files that are saved provide this information? Just a thought. Might be totally irrelevant for what you're doing though.
Yes, but these tests are done in Arena. I clicked "overwrite" long back when I was doing the tests because the logfile output of Arena was too big and it was taking a lot of time to scroll through to find the results of the engine on certain test suites. I should have saved the log before overwriting it.
swami wrote:I believe 9.8 is hundreds of elo stronger than 9.11? Besides there's not much rating information for 9.11 in testings site.
In the CCRL 40/40 ratings, Bison 9.11 has a rating of 2828 after 285 games, whereas Bison 9.8 has a rating of 2721 after 281 games.
Cheers,
Graham.
Oh I see. I used CCRL 40/4 list as the reference here. I tested 9.6a which I hope is weaker than Bison 9.8 else it wouldn't explain the lower STS scores.
Jouni wrote:I have only tested top engines. One problem with STS: Naum4 scores clealy better (20-30 more) than Stockfish 1.6! So I quess the reason is, that positions are checked only(?) with R3 and N4 so You cannot use suite to test 2 very best engines, what's a pity...
Yes there maybe cases like this. Naum and Stockfish are only few 50 elo apart but they are so close in strength that either of them can score better in given positional theme.
The problem is that 8 test suites can't tell you which engine is better. I do hope with more test suites (20 or more...) the strength difference can be assessed.
There maybe cases where Stockfish is better at "Tactics" than Naum. STS tests only strategy. Sacrifices and Tactics are beyond the scope of this test.
As of now, with the help of STS, you can only get the idea of "rough" strength of chess engines. Not their "exact" strength.
Such as....
Naum/Stock fish play at 3000+.
Crafty plays at 2700
Goliath plays at 2550
Romi plays at 2450
Lime plays at 2200