In dual Pentium 2,6 Ghz (10s per position) it got 840/1000 which is almost same as 1.6.3. Rybka 3 got 901, big difference!
Jouni
Stockfish 1.7.1 in STS suite
Moderator: Ras
-
- Posts: 3667
- Joined: Wed Mar 08, 2006 8:15 pm
- Full name: Jouni Uski
-
- Posts: 12792
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Stockfish 1.7.1 in STS suite
Sounds like opportunity for improvement then.Jouni wrote:In dual Pentium 2,6 Ghz (10s per position) it got 840/1000 which is almost same as 1.6.3. Rybka 3 got 901, big difference!
Jouni
Re: Stockfish 1.7.1 in STS suite
Or perhaps indicative that the STS needs more work in order to accurately measure general engine performance.Dann Corbit wrote:Sounds like opportunity for improvement then.Jouni wrote:In dual Pentium 2,6 Ghz (10s per position) it got 840/1000 which is almost same as 1.6.3. Rybka 3 got 901, big difference!
Jouni

-
- Posts: 10900
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Stockfish 1.7.1 in STS suite
I read the following:BrandonSi wrote:Or perhaps indicative that the STS needs more work in order to accurately measure general engine performance.Dann Corbit wrote:Sounds like opportunity for improvement then.Jouni wrote:In dual Pentium 2,6 Ghz (10s per position) it got 840/1000 which is almost same as 1.6.3. Rybka 3 got 901, big difference!
Jouni
"If difference in Rybka's analysis line from first best to second best is < 0.20 after about 1 minute of analysis then I'd not add it to the test suite."
In other words problems that rybka cannot clearly solve in 1 minute
are not in the test suite.
I am not surprised that in these conditions rybka get better results.
because positions that rybka does not understand in 1 minute are not in the test suite.
Uri
-
- Posts: 12792
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Stockfish 1.7.1 in STS suite
It does not measure general engine performance. There is no EPD test set that can do that.BrandonSi wrote:Or perhaps indicative that the STS needs more work in order to accurately measure general engine performance.Dann Corbit wrote:Sounds like opportunity for improvement then.Jouni wrote:In dual Pentium 2,6 Ghz (10s per position) it got 840/1000 which is almost same as 1.6.3. Rybka 3 got 901, big difference!
Jouni
It measures the ability to recognize certain strategic themes.
However, I guess that if Stockfish could instantly recognize these themes, it would perform somewhat better in general.
In any case, it seems to me to be low hanging fruit worthy of examination.
Re: Stockfish 1.7.1 in STS suite
I agree with you Dan, I'm just unsure how to address the discrepancy between what seems to be Stockfish 1.7's large jump in performance relative to other engines, and what appears to to be a lack of any performance increase when measured by STS.Dann Corbit wrote:It does not measure general engine performance. There is no EPD test set that can do that.BrandonSi wrote:Or perhaps indicative that the STS needs more work in order to accurately measure general engine performance.Dann Corbit wrote:Sounds like opportunity for improvement then.Jouni wrote:In dual Pentium 2,6 Ghz (10s per position) it got 840/1000 which is almost same as 1.6.3. Rybka 3 got 901, big difference!
Jouni
It measures the ability to recognize certain strategic themes.
However, I guess that if Stockfish could instantly recognize these themes, it would perform somewhat better in general.
In any case, it seems to me to be low hanging fruit worthy of examination.
Perhaps STS is not testing specific themes where Stockfish is excelling with regards to engine testing. I don't know, and I have no solution to propose. Just pointing out that one result is not like the other, and curious as to the reasons for that discrepancy.

-
- Posts: 12792
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Stockfish 1.7.1 in STS suite
I guess that Stockfish has not created evaluation terms related to the specific themes in the STS test set.BrandonSi wrote:I agree with you Dan, I'm just unsure how to address the discrepancy between what seems to be Stockfish 1.7's large jump in performance relative to other engines, and what appears to to be a lack of any performance increase when measured by STS.Dann Corbit wrote:It does not measure general engine performance. There is no EPD test set that can do that.BrandonSi wrote:Or perhaps indicative that the STS needs more work in order to accurately measure general engine performance.Dann Corbit wrote:Sounds like opportunity for improvement then.Jouni wrote:In dual Pentium 2,6 Ghz (10s per position) it got 840/1000 which is almost same as 1.6.3. Rybka 3 got 901, big difference!
Jouni
It measures the ability to recognize certain strategic themes.
However, I guess that if Stockfish could instantly recognize these themes, it would perform somewhat better in general.
In any case, it seems to me to be low hanging fruit worthy of examination.
That does not mean inferior or superior evaluation. For instance, if the cost of evaluation slows down eval enough in the general case, it could mean a decrease in engine strength. However, it should be a simple matter to test it.
STS is designed by theme and not by engine.Perhaps STS is not testing specific themes where Stockfish is excelling with regards to engine testing. I don't know, and I have no solution to propose. Just pointing out that one result is not like the other, and curious as to the reasons for that discrepancy.
-
- Posts: 6662
- Joined: Thu Mar 09, 2006 4:21 am
Re: Stockfish 1.7.1 in STS suite
I used to use Rybka to do quick preliminary test to see whether the particular position is qualified to belong in the list (of 200) to be sent to Dann (who in turn would use multiple engines ranging from Zappa, Naum, Ivanhoe, Stockfish Rybka etc to see how many positions from the list 200 are qualified for the final list of 100)Uri Blass wrote:I read the following:BrandonSi wrote:Or perhaps indicative that the STS needs more work in order to accurately measure general engine performance.Dann Corbit wrote:Sounds like opportunity for improvement then.Jouni wrote:In dual Pentium 2,6 Ghz (10s per position) it got 840/1000 which is almost same as 1.6.3. Rybka 3 got 901, big difference!
Jouni
"If difference in Rybka's analysis line from first best to second best is < 0.20 after about 1 minute of analysis then I'd not add it to the test suite."
In other words problems that rybka cannot clearly solve in 1 minute
are not in the test suite.
I am not surprised that in these conditions rybka get better results.
because positions that rybka does not understand in 1 minute are not in the test suite.
Uri
So it's not solely Rybka's analysis that contribute to this suite.
-
- Posts: 6662
- Joined: Thu Mar 09, 2006 4:21 am
Re: Stockfish 1.7.1 in STS suite
From my tests, Stockfish 1.7.1 has improved better to reflect about 50 elo improvement:Jouni wrote:In dual Pentium 2,6 Ghz (10s per position) it got 840/1000 which is almost same as 1.6.3. Rybka 3 got 901, big difference!
Jouni
Here's the results I have reported:
Stockfish 1.6.3 - 695/900
http://www.talkchess.com/forum/viewtopi ... =&start=30
Stockfish 1.7.1 - 796/1000
http://www.talkchess.com/forum/viewtopi ... 2&start=40
If you discounted the score of STS 10 from Stockfish 1.7.1, It still has 706/900.
Re: Stockfish 1.7.1 in STS suite
My STS 1-9 test(february)
Result Rybka 2.2n2 very good.
Code: Select all
1CPU - 10sec by position STS1 - STS9
Rybka 3 Human 809/900 Grade:S Total 8382/9000 Grade:S
Rybka 3 Dynamic 795/900 Grade:S Total 8270/9000 Grade:S
Rybka 3 960 790/900 Grade:S Total 8270/9000 Grade:S
Rybka 3 789/900 Grade:S Total 8274/9000 Grade:S
Rybka 2.2n2 763/900 Grade:A+ Total 8076/9000 Grade:S
Naum 4.1 751/900 Grade:A+ total 8045/9000 Grade:S
Stockfish 1.6.2 S.James 732/900 Grade:A+ Total 7811/9000 Grade:S
Naum 4 722/900 Grade:A+ Total 7822/9000 Grade:S
Stockfish 1.6.2 715/900 Grade:A Total 7728/9000 Grade:S
Stockfish 1.6.2s DC 715/900 Grade:A Total 7695/9000 Grade:S
Komodo 1.0 697/900 Grade:A Total 7593/9000 Grade:A+
Deep Shredder 12 682/900 Grade:A Total 7435/9000 Grade:A+
Spark 0.3a 678/900 Grade:A Total 7443/9000 Grade:A+
Zappa Mexico II XIII 674/900 Grade:A- Total 7474/9000 Grade:A+
Junior 2010 672/900 Grade:A- Total 7378/9000 Grade:A+
Twisted Logic 20100131 672/900 Grade:A- Total 7357/9000 Grade:A+
Deep Sjeng WC2008 664/900 Grade:A- Total 7339/9000 Grade:A+
Critter 0.52 x64 663/900 Grade:A- Total 7330/9000 Grade:A+
Ktulu 9 656/900 Grade:A- Total 7299/9000 Grade:A+
Zappa Mexico II 655/900 Grade:A- Total 7396/9000 Grade:A+
Onno 1.1.1 650/900 Grade:A- Total 7073/9000 Grade:A
Tornado 3.42a 585/900 Grade:B+ Total 6663/9000 Grade:A-