STS - List the Order of Importance

swami · Post by **swami** » Sat Jan 16, 2010 9:44 am

jwes wrote:Would it be possible for you to upload a file that has the results for each position in each test for each engine? Then I could run some statistics to see how well each position is correlated.

That would be difficult unless there's some automatic tool to do it.

If somebody knows the short cut to fetching such information, that would be of great help.

EDIT: whoops, Logfiles have been overwritten.

Graham Banks · Post by **Graham Banks** » Sat Jan 16, 2010 9:47 am

swami wrote:
jwes wrote:Would it be possible for you to upload a file that has the results for each position in each test for each engine? Then I could run some statistics to see how well each position is correlated.
That would be difficult unless there's some automatic tool to do it.

If somebody knows the short cut to fetching such information, that would be of great help.

EDIT: whoops, Logfiles have been overwritten.

Do the ChessGUI debug files that are saved provide this information? Just a thought. Might be totally irrelevant for what you're doing though.

Cheers,
Graham.

swami · Post by **swami** » Sat Jan 16, 2010 9:50 am

Carlos777 wrote:
swami wrote: I forgot to update Bison to 9.8.

I used Bison 9.6a for this test.

Now will test Bison 9.8 and see what the results are.
Hi Swami,

Why not Bison 9.11?

Best,
Carlos

I believe 9.8 is hundreds of elo stronger than 9.11? Besides there's not much rating information for 9.11 in testings site.

Bison 9.8
Ivan Bonkin, Russia.

Strategic Test Suite Conditions:

Core2Quad 32 bits, Q6600, 2 GB RAM, 2.4GHZ
10 seconds per position
800 positions
Engine uses 131 Mb Hash.
Arena GUI

Subject-wise Scores:

STS (v1.0) - Undermining:
77/100, Grade: A

STS (v2.1) - Open Files and Diagonals:
71/100, Grade: A-

STS (v3.0) - Knight Outposts/Centralization/Repositioning:
62/100, Grade: B

STS (v4.1) - Square Vacancy:
68/100, Grade: B+

STS (v5.0) - Bishop vs Knight:
68/100, Grade: B+

STS (v6.0) - Re-Capturing:
72/100, Grade: A-

STS (v7.0) - Offer of Simplification:
63/100, Grade: B

STS (v8.1) - Advancement of f/g/h Pawns:
52/100, Grade: C

Overall Performance:

Total Score: 533/800

Overall Average: 66.625 %

Grade: B+

Regards,
Swami

swami · Post by **swami** » Sat Jan 16, 2010 9:52 am

Yes, but these tests are done in Arena. I clicked "overwrite" long back when I was doing the tests because the logfile output of Arena was too big and it was taking a lot of time to scroll through to find the results of the engine on certain test suites. I should have saved the log before overwriting it.

Graham Banks · Post by **Graham Banks** » Sat Jan 16, 2010 9:55 am

swami wrote:I believe 9.8 is hundreds of elo stronger than 9.11? Besides there's not much rating information for 9.11 in testings site.

In the CCRL 40/40 ratings, Bison 9.11 has a rating of 2828 after 285 games, whereas Bison 9.8 has a rating of 2721 after 281 games.

Cheers,
Graham.

swami · Post by **swami** » Sat Jan 16, 2010 9:56 am

Edmund wrote: I uploaded the diagrams to: http://yfrog.com/2m20297269gx

Here the formulas:
STS 5: y = 0.0410x - 33.542
STS 8: y = 0.0407x - 58.458
STS 6: y = 0.0365x - 29.026
STS 4: y = 0.0357x - 28.025
STS 7: y = 0.0302x - 17.95
STS 1: y = 0.0239x + 1.7925
STS 3: y = 0.0236x + 2.6652
STS 2: y = 0.0155x + 28.274

Now, that the scores of Bison 9.8 are completely improved to match its rating. Would the values above change more if this was done again?

Perhaps I will now start testing engines in 2200 - 2500 range and update the excel file to make way for more information.

Thanks for your research and especially the graph here. Very Interesting to me.

swami · Post by **swami** » Sat Jan 16, 2010 9:57 am

Graham Banks wrote:
swami wrote:I believe 9.8 is hundreds of elo stronger than 9.11? Besides there's not much rating information for 9.11 in testings site.
In the CCRL 40/40 ratings, Bison 9.11 has a rating of 2828 after 285 games, whereas Bison 9.8 has a rating of 2721 after 281 games.

Cheers,
Graham.

Oh I see. I used CCRL 40/4 list as the reference here. I tested 9.6a which I hope is weaker than Bison 9.8 else it wouldn't explain the lower STS scores.

swami · Post by **swami** » Sat Jan 16, 2010 10:02 am

Carlos777 wrote:I just noticed this. I hope you can see it.

Carlos.

Very Nice. Thanks!

Edmund · Post by **Edmund** » Sat Jan 16, 2010 10:13 am

swami wrote:
Edmund wrote: I uploaded the diagrams to: http://yfrog.com/2m20297269gx

Here the formulas:
STS 5: y = 0.0410x - 33.542
STS 8: y = 0.0407x - 58.458
STS 6: y = 0.0365x - 29.026
STS 4: y = 0.0357x - 28.025
STS 7: y = 0.0302x - 17.95
STS 1: y = 0.0239x + 1.7925
STS 3: y = 0.0236x + 2.6652
STS 2: y = 0.0155x + 28.274
Now, that the scores of Bison 9.8 are completely improved to match its rating. Would the values above change more if this was done again?

Perhaps I will now start testing engines in 2200 - 2500 range and update the excel file to make way for more information.

Thanks for your research and especially the graph here. Very Interesting to me.

An increase of the slope on all graphs and STS 5 and 8 swap places:

Code: Select all

STS 8&#58; y = 0.0431x - 64.417
STS 5&#58; y = 0.0415x - 34.708
STS 6&#58; y = 0.0385x - 34.010
STS 4&#58; y = 0.0381x - 34.066
STS 7&#58; y = 0.0322x - 22.965
STS 1&#58; y = 0.0283x - 8.9554
STS 3&#58; y = 0.0257x - 2.6632
STS 2&#58; y = 0.0160x + 27.087

regards,
Edmund

swami · Post by **swami** » Sat Jan 16, 2010 10:22 am

Jouni wrote:I have only tested top engines. One problem with STS: Naum4 scores clealy better (20-30 more) than Stockfish 1.6! So I quess the reason is, that positions are checked only(?) with R3 and N4 so You cannot use suite to test 2 very best engines, what's a pity...

Yes there maybe cases like this. Naum and Stockfish are only few 50 elo apart but they are so close in strength that either of them can score better in given positional theme.

The problem is that 8 test suites can't tell you which engine is better. I do hope with more test suites (20 or more...) the strength difference can be assessed.

There maybe cases where Stockfish is better at "Tactics" than Naum. STS tests only strategy. Sacrifices and Tactics are beyond the scope of this test.

As of now, with the help of STS, you can only get the idea of "rough" strength of chess engines. Not their "exact" strength.

Such as....

Naum/Stock fish play at 3000+.
Crafty plays at 2700
Goliath plays at 2550
Romi plays at 2450
Lime plays at 2200

and so on.

STS - List the Order of Importance

Re: STS - List the Order of Importance

Re: STS - List the Order of Importance

Re: STS - List the Order of Importance

Re: STS - List the Order of Importance

Re: STS - List the Order of Importance

Re: STS - List the Order of Importance

Re: STS - List the Order of Importance

Re: STS - List the Order of Importance

Re: STS - List the Order of Importance

Re: STS - List the Order of Importance