Stockfish 1.7.1 in STS suite

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

Jouni
Posts: 3666
Joined: Wed Mar 08, 2006 8:15 pm
Full name: Jouni Uski

Stockfish 1.7.1 in STS suite

Post by Jouni »

In dual Pentium 2,6 Ghz (10s per position) it got 840/1000 which is almost same as 1.6.3. Rybka 3 got 901, big difference!

Jouni
Dann Corbit
Posts: 12792
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Stockfish 1.7.1 in STS suite

Post by Dann Corbit »

Jouni wrote:In dual Pentium 2,6 Ghz (10s per position) it got 840/1000 which is almost same as 1.6.3. Rybka 3 got 901, big difference!

Jouni
Sounds like opportunity for improvement then.
BrandonSi

Re: Stockfish 1.7.1 in STS suite

Post by BrandonSi »

Dann Corbit wrote:
Jouni wrote:In dual Pentium 2,6 Ghz (10s per position) it got 840/1000 which is almost same as 1.6.3. Rybka 3 got 901, big difference!

Jouni
Sounds like opportunity for improvement then.
Or perhaps indicative that the STS needs more work in order to accurately measure general engine performance. :)
Uri Blass
Posts: 10900
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Stockfish 1.7.1 in STS suite

Post by Uri Blass »

BrandonSi wrote:
Dann Corbit wrote:
Jouni wrote:In dual Pentium 2,6 Ghz (10s per position) it got 840/1000 which is almost same as 1.6.3. Rybka 3 got 901, big difference!

Jouni
Sounds like opportunity for improvement then.
Or perhaps indicative that the STS needs more work in order to accurately measure general engine performance. :)
I read the following:
"If difference in Rybka's analysis line from first best to second best is < 0.20 after about 1 minute of analysis then I'd not add it to the test suite."

In other words problems that rybka cannot clearly solve in 1 minute
are not in the test suite.
I am not surprised that in these conditions rybka get better results.
because positions that rybka does not understand in 1 minute are not in the test suite.

Uri
Dann Corbit
Posts: 12792
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Stockfish 1.7.1 in STS suite

Post by Dann Corbit »

BrandonSi wrote:
Dann Corbit wrote:
Jouni wrote:In dual Pentium 2,6 Ghz (10s per position) it got 840/1000 which is almost same as 1.6.3. Rybka 3 got 901, big difference!

Jouni
Sounds like opportunity for improvement then.
Or perhaps indicative that the STS needs more work in order to accurately measure general engine performance. :)
It does not measure general engine performance. There is no EPD test set that can do that.

It measures the ability to recognize certain strategic themes.

However, I guess that if Stockfish could instantly recognize these themes, it would perform somewhat better in general.

In any case, it seems to me to be low hanging fruit worthy of examination.
BrandonSi

Re: Stockfish 1.7.1 in STS suite

Post by BrandonSi »

Dann Corbit wrote:
BrandonSi wrote:
Dann Corbit wrote:
Jouni wrote:In dual Pentium 2,6 Ghz (10s per position) it got 840/1000 which is almost same as 1.6.3. Rybka 3 got 901, big difference!

Jouni
Sounds like opportunity for improvement then.
Or perhaps indicative that the STS needs more work in order to accurately measure general engine performance. :)
It does not measure general engine performance. There is no EPD test set that can do that.

It measures the ability to recognize certain strategic themes.

However, I guess that if Stockfish could instantly recognize these themes, it would perform somewhat better in general.

In any case, it seems to me to be low hanging fruit worthy of examination.
I agree with you Dan, I'm just unsure how to address the discrepancy between what seems to be Stockfish 1.7's large jump in performance relative to other engines, and what appears to to be a lack of any performance increase when measured by STS.

Perhaps STS is not testing specific themes where Stockfish is excelling with regards to engine testing. I don't know, and I have no solution to propose. Just pointing out that one result is not like the other, and curious as to the reasons for that discrepancy. :)
Dann Corbit
Posts: 12792
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Stockfish 1.7.1 in STS suite

Post by Dann Corbit »

BrandonSi wrote:
Dann Corbit wrote:
BrandonSi wrote:
Dann Corbit wrote:
Jouni wrote:In dual Pentium 2,6 Ghz (10s per position) it got 840/1000 which is almost same as 1.6.3. Rybka 3 got 901, big difference!

Jouni
Sounds like opportunity for improvement then.
Or perhaps indicative that the STS needs more work in order to accurately measure general engine performance. :)
It does not measure general engine performance. There is no EPD test set that can do that.

It measures the ability to recognize certain strategic themes.

However, I guess that if Stockfish could instantly recognize these themes, it would perform somewhat better in general.

In any case, it seems to me to be low hanging fruit worthy of examination.
I agree with you Dan, I'm just unsure how to address the discrepancy between what seems to be Stockfish 1.7's large jump in performance relative to other engines, and what appears to to be a lack of any performance increase when measured by STS.
I guess that Stockfish has not created evaluation terms related to the specific themes in the STS test set.
That does not mean inferior or superior evaluation. For instance, if the cost of evaluation slows down eval enough in the general case, it could mean a decrease in engine strength. However, it should be a simple matter to test it.
Perhaps STS is not testing specific themes where Stockfish is excelling with regards to engine testing. I don't know, and I have no solution to propose. Just pointing out that one result is not like the other, and curious as to the reasons for that discrepancy. :)
STS is designed by theme and not by engine.
swami
Posts: 6662
Joined: Thu Mar 09, 2006 4:21 am

Re: Stockfish 1.7.1 in STS suite

Post by swami »

Uri Blass wrote:
BrandonSi wrote:
Dann Corbit wrote:
Jouni wrote:In dual Pentium 2,6 Ghz (10s per position) it got 840/1000 which is almost same as 1.6.3. Rybka 3 got 901, big difference!

Jouni
Sounds like opportunity for improvement then.
Or perhaps indicative that the STS needs more work in order to accurately measure general engine performance. :)
I read the following:
"If difference in Rybka's analysis line from first best to second best is < 0.20 after about 1 minute of analysis then I'd not add it to the test suite."

In other words problems that rybka cannot clearly solve in 1 minute
are not in the test suite.
I am not surprised that in these conditions rybka get better results.
because positions that rybka does not understand in 1 minute are not in the test suite.

Uri
I used to use Rybka to do quick preliminary test to see whether the particular position is qualified to belong in the list (of 200) to be sent to Dann (who in turn would use multiple engines ranging from Zappa, Naum, Ivanhoe, Stockfish Rybka etc to see how many positions from the list 200 are qualified for the final list of 100)

So it's not solely Rybka's analysis that contribute to this suite.
swami
Posts: 6662
Joined: Thu Mar 09, 2006 4:21 am

Re: Stockfish 1.7.1 in STS suite

Post by swami »

Jouni wrote:In dual Pentium 2,6 Ghz (10s per position) it got 840/1000 which is almost same as 1.6.3. Rybka 3 got 901, big difference!

Jouni
From my tests, Stockfish 1.7.1 has improved better to reflect about 50 elo improvement:

Here's the results I have reported:

Stockfish 1.6.3 - 695/900
http://www.talkchess.com/forum/viewtopi ... =&start=30

Stockfish 1.7.1 - 796/1000
http://www.talkchess.com/forum/viewtopi ... 2&start=40

If you discounted the score of STS 10 from Stockfish 1.7.1, It still has 706/900.
Bob Yellow

Re: Stockfish 1.7.1 in STS suite

Post by Bob Yellow »

My STS 1-9 test(february)

Code: Select all

1CPU - 10sec by position STS1 - STS9

Rybka 3 Human              809/900 Grade:S    Total 8382/9000 Grade:S
Rybka 3 Dynamic            795/900 Grade:S    Total 8270/9000 Grade:S
Rybka 3 960                790/900 Grade:S    Total 8270/9000 Grade:S
Rybka 3                    789/900 Grade:S    Total 8274/9000 Grade:S
Rybka 2.2n2                763/900 Grade:A+   Total 8076/9000 Grade:S
Naum 4.1                   751/900 Grade:A+   total 8045/9000 Grade:S
Stockfish 1.6.2 S.James    732/900 Grade:A+   Total 7811/9000 Grade:S
Naum 4                     722/900 Grade:A+   Total 7822/9000 Grade:S
Stockfish 1.6.2            715/900 Grade:A    Total 7728/9000 Grade:S
Stockfish 1.6.2s DC        715/900 Grade:A    Total 7695/9000 Grade:S
Komodo 1.0                 697/900 Grade:A    Total 7593/9000 Grade:A+
Deep Shredder 12           682/900 Grade:A    Total 7435/9000 Grade:A+
Spark 0.3a                 678/900 Grade:A    Total 7443/9000 Grade:A+
Zappa Mexico II XIII       674/900 Grade:A-   Total 7474/9000 Grade:A+
Junior 2010                672/900 Grade:A-   Total 7378/9000 Grade:A+
Twisted Logic 20100131     672/900 Grade:A-   Total 7357/9000 Grade:A+
Deep Sjeng WC2008          664/900 Grade:A-   Total 7339/9000 Grade:A+
Critter 0.52 x64           663/900 Grade:A-   Total 7330/9000 Grade:A+
Ktulu 9                    656/900 Grade:A-   Total 7299/9000 Grade:A+
Zappa Mexico II            655/900 Grade:A-   Total 7396/9000 Grade:A+
Onno 1.1.1                 650/900 Grade:A-   Total 7073/9000 Grade:A
Tornado 3.42a              585/900 Grade:B+   Total 6663/9000 Grade:A-
Result Rybka 2.2n2 very good.