If we assume that value to be positive then STS 6.0 would rank as 6th important test suite. STS 3.0 being the least important as I had nearly guessed.mcostalba wrote:I think the coefficients should be constrained to be positive, otherwise it means that the higher score get an engine in STS 6 the weaker it isEdmund wrote:Code: Select all
sts60 | -2.137339 2.555934 -0.84 0.409 -7.321013 3.046336
STS - List the Order of Importance
Moderators: hgm, Rebel, chrisw
-
- Posts: 6640
- Joined: Thu Mar 09, 2006 4:21 am
Re: STS - List the Order of Importance
-
- Posts: 364
- Joined: Sun Oct 04, 2009 1:27 pm
- Location: Italy
Re: STS - List the Order of Importance
The problem is that this is a point estimate. If you look at the confidence interval, you see no reason to believe that it is not 0 , or some small positive value.
let's see on this site 15-16 hours from now
let's see on this site 15-16 hours from now
-
- Posts: 670
- Joined: Mon Dec 03, 2007 3:01 pm
- Location: Barcelona, Spain
Re: STS - List the Order of Importance
Another approach .. this time some quick work with excel
I uploaded the diagrams to: http://yfrog.com/2m20297269gx
for each test suit I ploted a graph with points for each score/elo pair.
Then I added a linear trendline.
Fortunatly all slopes of the trendlines were positive (definitly my last results were more misleading in that sense)
The slope of the tendlines should give an approximate rating on how well a certain testsuit is capable of presenting the engines strength.
Here the formulas:
STS 5: y = 0.0410x - 33.542
STS 8: y = 0.0407x - 58.458
STS 6: y = 0.0365x - 29.026
STS 4: y = 0.0357x - 28.025
STS 7: y = 0.0302x - 17.95
STS 1: y = 0.0239x + 1.7925
STS 3: y = 0.0236x + 2.6652
STS 2: y = 0.0155x + 28.274
I uploaded the diagrams to: http://yfrog.com/2m20297269gx
for each test suit I ploted a graph with points for each score/elo pair.
Then I added a linear trendline.
Fortunatly all slopes of the trendlines were positive (definitly my last results were more misleading in that sense)
The slope of the tendlines should give an approximate rating on how well a certain testsuit is capable of presenting the engines strength.
Here the formulas:
STS 5: y = 0.0410x - 33.542
STS 8: y = 0.0407x - 58.458
STS 6: y = 0.0365x - 29.026
STS 4: y = 0.0357x - 28.025
STS 7: y = 0.0302x - 17.95
STS 1: y = 0.0239x + 1.7925
STS 3: y = 0.0236x + 2.6652
STS 2: y = 0.0155x + 28.274
-
- Posts: 10307
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: STS - List the Order of Importance
looking at the results I see that Pharaon scored better than Bison in everyone of the tests when Bison has clearly higher rating (almost 100 elo difference) so your task should be to develop a test that is not based on games when Bison scores better than Pharaon(because I do not believe that Bison is better than Pharaon only because of factors like better time management.swami wrote:This is the excel file with results of nearly 50+ engines in case if anyone else is interested:
http://sites.google.com/site/strategict ... STS1-8.xls
PS: To Carlos, can you sort it in total scores order and make a Gif image of it and post it here? Thanks!
Uri
-
- Posts: 6640
- Joined: Thu Mar 09, 2006 4:21 am
Re: STS - List the Order of Importance
This is one of the rare exceptions. I don't think Bison is very well suited to this time control. The case maybe that it does constantly change the move while searching, within the 10 seconds span for the certain position or it sticks with the same move for too long without looking for an alternative and evaluating better scores for it.Uri Blass wrote:looking at the results I see that Pharaon scored better than Bison in everyone of the tests when Bison has clearly higher rating (almost 100 elo difference) so your task should be to develop a test that is not based on games when Bison scores better than Pharaon(because I do not believe that Bison is better than Pharaon only because of factors like better time management.swami wrote:This is the excel file with results of nearly 50+ engines in case if anyone else is interested:
http://sites.google.com/site/strategict ... STS1-8.xls
PS: To Carlos, can you sort it in total scores order and make a Gif image of it and post it here? Thanks!
Uri
As for other engines, the score given in STS is nearly similar to the ratings level. Of course, It's difficult to differentiate between set of 10 engines when they're so close in strength. I hope with the help of more test suites, it will be easier.
-
- Posts: 6640
- Joined: Thu Mar 09, 2006 4:21 am
Re: STS - List the Order of Importance
I forgot to update Bison to 9.8.Uri Blass wrote:looking at the results I see that Pharaon scored better than Bison in everyone of the tests when Bison has clearly higher rating (almost 100 elo difference) so your task should be to develop a test that is not based on games when Bison scores better than Pharaon(because I do not believe that Bison is better than Pharaon only because of factors like better time management.swami wrote:This is the excel file with results of nearly 50+ engines in case if anyone else is interested:
http://sites.google.com/site/strategict ... STS1-8.xls
PS: To Carlos, can you sort it in total scores order and make a Gif image of it and post it here? Thanks!
Uri
I used Bison 9.6a for this test.
Now will test Bison 9.8 and see what the results are.
-
- Posts: 1737
- Joined: Sun Dec 13, 2009 6:09 pm
Re: STS - List the Order of Importance
Hi Swami,swami wrote: I forgot to update Bison to 9.8.
I used Bison 9.6a for this test.
Now will test Bison 9.8 and see what the results are.
Why not Bison 9.11?
Best,
Carlos
-
- Posts: 778
- Joined: Sat Jul 01, 2006 7:11 am
Re: STS - List the Order of Importance
I ran another simple test. I took the correlations between the sts tests and the various computer rankings (I could not find a good way to combine the ratings though I did not try very hard) and got these correlation coefficients:swami wrote:So, according to this piece of information, one could conclude that STS 6.0 is the least important of all test suites which even has a negative co-efficient and that it is better not to do well in it? Looks little confusing and probably not true.Edmund wrote:Thats what I am getting after a linear regression:
r² = 0.6577
sqr(mse) = 68.912
sts10-80 are the coefficients
cons is the constant
the result is the elo in the CEGT scale
Code: Select all
------------------------------------------------------------------------------ elo | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- sts10 | 2.272232 2.96261 0.77 0.448 -3.736218 8.280683 sts20 | 3.20341 3.01614 1.06 0.295 -2.913606 9.320425 sts30 | .4575349 2.331547 0.20 0.846 -4.271063 5.186132 sts40 | 2.358127 2.152502 1.10 0.281 -2.00735 6.723604 sts50 | 7.583491 3.162585 2.40 0.022 1.16947 13.99751 sts60 | -2.137339 2.555934 -0.84 0.409 -7.321013 3.046336 sts70 | .8115352 2.652622 0.31 0.761 -4.568232 6.191302 sts80 | 4.939367 2.18094 2.26 0.030 .516216 9.362519 _cons | 1342.862 209.7266 6.40 0.000 917.5169 1768.207 ------------------------------------------------------------------------------
Also, with highest co-efficient value for STS 5.0 followed by STS 8.0 (?!) indicates they are the two most important according to this regression.
So this data gives the ranks for order of importance, it's something like:
STS 5
STS 8
STS 2
STS 4
STS 1
STS 7
STS 3
STS 6
Not a bad try at all. Since I expect all the middle ranks to be in right place except STS 6 and 8.
Code: Select all
STS 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 Total
WBEC 0.51 0.31 0.33 0.65 0.73 0.56 0.63 0.71 0.75
CEGT 0.53 0.40 0.51 0.45 0.60 0.48 0.47 0.59 0.72
CCRL 0.42 0.26 0.32 0.58 0.72 0.51 0.54 0.59 0.72
STS 5.0
STS 8.0
STS 4.0
STS 7.0
STS 6.0
STS 1.0
STS 3.0
STS 2.0
with STS 3.0 and STS 2.0 significantly less correlated than the others.
Would it be possible for you to upload a file that has the results for each position in each test for each engine? Then I could run some statistics to see how well each position is correlated.
-
- Posts: 1737
- Joined: Sun Dec 13, 2009 6:09 pm
Re: STS - List the Order of Importance
I just noticed this. I hope you can see it.swami wrote:This is the excel file with results of nearly 50+ engines in case if anyone else is interested:
http://sites.google.com/site/strategict ... STS1-8.xls
PS: To Carlos, can you sort it in total scores order and make a Gif image of it and post it here? Thanks!
Carlos.
-
- Posts: 3293
- Joined: Wed Mar 08, 2006 8:15 pm
Re: STS - List the Order of Importance
I have only tested top engines. One problem with STS: Naum4 scores clealy better (20-30 more) than Stockfish 1.6! So I quess the reason is, that positions are checked only(?) with R3 and N4 so You cannot use suite
to test 2 very best engines, what's a pity...
Jouni
to test 2 very best engines, what's a pity...
Jouni