STS - List the Order of Importance

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

swami
Posts: 6640
Joined: Thu Mar 09, 2006 4:21 am

Re: STS - List the Order of Importance

Post by swami »

jwes wrote:Would it be possible for you to upload a file that has the results for each position in each test for each engine? Then I could run some statistics to see how well each position is correlated.
That would be difficult unless there's some automatic tool to do it. :)

If somebody knows the short cut to fetching such information, that would be of great help.

EDIT: whoops, Logfiles have been overwritten. :?
User avatar
Graham Banks
Posts: 41455
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: STS - List the Order of Importance

Post by Graham Banks »

swami wrote:
jwes wrote:Would it be possible for you to upload a file that has the results for each position in each test for each engine? Then I could run some statistics to see how well each position is correlated.
That would be difficult unless there's some automatic tool to do it. :)

If somebody knows the short cut to fetching such information, that would be of great help.

EDIT: whoops, Logfiles have been overwritten. :?
Do the ChessGUI debug files that are saved provide this information? Just a thought. Might be totally irrelevant for what you're doing though.

Cheers,
Graham.
gbanksnz at gmail.com
swami
Posts: 6640
Joined: Thu Mar 09, 2006 4:21 am

Re: STS - List the Order of Importance

Post by swami »

Carlos777 wrote:
swami wrote: I forgot to update Bison to 9.8.

I used Bison 9.6a for this test.

Now will test Bison 9.8 and see what the results are.
Hi Swami,

Why not Bison 9.11?

Best,
Carlos
I believe 9.8 is hundreds of elo stronger than 9.11? Besides there's not much rating information for 9.11 in testings site.

Bison 9.8
Ivan Bonkin, Russia.

Strategic Test Suite Conditions:
Core2Quad 32 bits, Q6600, 2 GB RAM, 2.4GHZ
10 seconds per position
800 positions
Engine uses 131 Mb Hash.
Arena GUI
Subject-wise Scores:
STS (v1.0) - Undermining:
77/100, Grade: A

STS (v2.1) - Open Files and Diagonals:
71/100, Grade: A-

STS (v3.0) - Knight Outposts/Centralization/Repositioning:
62/100, Grade: B

STS (v4.1) - Square Vacancy:
68/100, Grade: B+

STS (v5.0) - Bishop vs Knight:
68/100, Grade: B+

STS (v6.0) - Re-Capturing:
72/100, Grade: A-

STS (v7.0) - Offer of Simplification:
63/100, Grade: B

STS (v8.1) - Advancement of f/g/h Pawns:
52/100, Grade: C
Overall Performance:
Total Score: 533/800

Overall Average: 66.625 %

Grade: B+
Regards,
Swami
swami
Posts: 6640
Joined: Thu Mar 09, 2006 4:21 am

Re: STS - List the Order of Importance

Post by swami »

Yes, but these tests are done in Arena. I clicked "overwrite" long back when I was doing the tests because the logfile output of Arena was too big and it was taking a lot of time to scroll through to find the results of the engine on certain test suites. I should have saved the log before overwriting it.
User avatar
Graham Banks
Posts: 41455
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: STS - List the Order of Importance

Post by Graham Banks »

swami wrote:I believe 9.8 is hundreds of elo stronger than 9.11? Besides there's not much rating information for 9.11 in testings site.
In the CCRL 40/40 ratings, Bison 9.11 has a rating of 2828 after 285 games, whereas Bison 9.8 has a rating of 2721 after 281 games.

Cheers,
Graham.
gbanksnz at gmail.com
swami
Posts: 6640
Joined: Thu Mar 09, 2006 4:21 am

Re: STS - List the Order of Importance

Post by swami »

Edmund wrote: I uploaded the diagrams to: http://yfrog.com/2m20297269gx


Here the formulas:
STS 5: y = 0.0410x - 33.542
STS 8: y = 0.0407x - 58.458
STS 6: y = 0.0365x - 29.026
STS 4: y = 0.0357x - 28.025
STS 7: y = 0.0302x - 17.95
STS 1: y = 0.0239x + 1.7925
STS 3: y = 0.0236x + 2.6652
STS 2: y = 0.0155x + 28.274
Now, that the scores of Bison 9.8 are completely improved to match its rating. Would the values above change more if this was done again?

Perhaps I will now start testing engines in 2200 - 2500 range and update the excel file to make way for more information.

Thanks for your research and especially the graph here. Very Interesting to me.
swami
Posts: 6640
Joined: Thu Mar 09, 2006 4:21 am

Re: STS - List the Order of Importance

Post by swami »

Graham Banks wrote:
swami wrote:I believe 9.8 is hundreds of elo stronger than 9.11? Besides there's not much rating information for 9.11 in testings site.
In the CCRL 40/40 ratings, Bison 9.11 has a rating of 2828 after 285 games, whereas Bison 9.8 has a rating of 2721 after 281 games.

Cheers,
Graham.
Oh I see. I used CCRL 40/4 list as the reference here. I tested 9.6a which I hope is weaker than Bison 9.8 else it wouldn't explain the lower STS scores.
swami
Posts: 6640
Joined: Thu Mar 09, 2006 4:21 am

Re: STS - List the Order of Importance

Post by swami »

Carlos777 wrote:I just noticed this. I hope you can see it.

Image

Carlos.
Very Nice. Thanks! :D
Edmund
Posts: 670
Joined: Mon Dec 03, 2007 3:01 pm
Location: Barcelona, Spain

Re: STS - List the Order of Importance

Post by Edmund »

swami wrote:
Edmund wrote: I uploaded the diagrams to: http://yfrog.com/2m20297269gx


Here the formulas:
STS 5: y = 0.0410x - 33.542
STS 8: y = 0.0407x - 58.458
STS 6: y = 0.0365x - 29.026
STS 4: y = 0.0357x - 28.025
STS 7: y = 0.0302x - 17.95
STS 1: y = 0.0239x + 1.7925
STS 3: y = 0.0236x + 2.6652
STS 2: y = 0.0155x + 28.274
Now, that the scores of Bison 9.8 are completely improved to match its rating. Would the values above change more if this was done again?

Perhaps I will now start testing engines in 2200 - 2500 range and update the excel file to make way for more information.

Thanks for your research and especially the graph here. Very Interesting to me.
An increase of the slope on all graphs and STS 5 and 8 swap places:

Code: Select all

STS 8: y = 0.0431x - 64.417
STS 5: y = 0.0415x - 34.708
STS 6: y = 0.0385x - 34.010
STS 4: y = 0.0381x - 34.066
STS 7: y = 0.0322x - 22.965
STS 1: y = 0.0283x - 8.9554
STS 3: y = 0.0257x - 2.6632
STS 2: y = 0.0160x + 27.087
regards,
Edmund
swami
Posts: 6640
Joined: Thu Mar 09, 2006 4:21 am

Re: STS - List the Order of Importance

Post by swami »

Jouni wrote:I have only tested top engines. One problem with STS: Naum4 scores clealy better (20-30 more) than Stockfish 1.6! So I quess the reason is, that positions are checked only(?) with R3 and N4 so You cannot use suite to test 2 very best engines, what's a pity...
Yes there maybe cases like this. Naum and Stockfish are only few 50 elo apart but they are so close in strength that either of them can score better in given positional theme.

The problem is that 8 test suites can't tell you which engine is better. I do hope with more test suites (20 or more...) the strength difference can be assessed.

There maybe cases where Stockfish is better at "Tactics" than Naum. STS tests only strategy. Sacrifices and Tactics are beyond the scope of this test.

As of now, with the help of STS, you can only get the idea of "rough" strength of chess engines. Not their "exact" strength.

Such as....

Naum/Stock fish play at 3000+.
Crafty plays at 2700
Goliath plays at 2550
Romi plays at 2450
Lime plays at 2200

and so on.