swami wrote:
Stockfish 1.6.2 was tested when STS suites had a partial credit moves and Arena erroneously awarded points for certain moves. I wasn't aware of this bug until Wesley pointed it out in a thread later on.
Stockfish 1.6.3 is tested with no-partial scoring-STS suites which consists of only best moves.
Hi Swami,
I considered again this statement and I think there is something that I don't understand.
You said that previously Arena granted some points more to sf 1.6.2 but you also say that sf 1.6.3 has better test results, so this cannot be because otherwise 1.6.2 should have had better results of 1.6.3 because took some extra point as a gift from Arena, considering that functionality is the same between 1.6.2 and 1.6.3 I still believe your testing procedure is not reproducible at 100% and different runs on the _same_ engine yields to different results.
swami wrote:
Stockfish 1.6.2 was tested when STS suites had a partial credit moves and Arena erroneously awarded points for certain moves. I wasn't aware of this bug until Wesley pointed it out in a thread later on.
Stockfish 1.6.3 is tested with no-partial scoring-STS suites which consists of only best moves.
Hi Swami,
thanks for the explanation. It would be interesting, as a verification if you could rerun SF 1.6.2 with the lastest no-partial scoring-STS setup so to verify your test gives no difference from 1.6.3
Thanks
Marco
Hi Marco,
I ran the same test again with same version, I notice little difference. It maybe due to the fact that I had Ivanhoe running 2 cores in background during the first two tests. When Stockfish was still using 1 CPU, I thought it wouldn't affect the results when I ran Ivanhoe for analysing games in another window.
Now, I'm going to run 1.6.2 and 1.6.3 (again) without any other programs in the background. Will let you know about the results.
Stockfish 1.6.3 JA
by Marco Costalba, Tord Romstad, Joona Kiiski, Europe.
Strategic Test Suite Conditions:
Core2Quad 32 bits, Q6600, 2 GB RAM, 2.4GHZ
10 seconds per position
900 positions
Engine uses 156 Mb Hash.
Single CPU
Arena GUI
Overall Performance:
Total Score: 695/900 [.....] Average : 77.22% [.....] Grade: A [.....] Total Rated Time: 41.37/150 minutes [2482 Seconds/9000 Seconds]
Subject-wise Scores:
STS (v1.0) - Undermining:
82/100, Grade: A+
STS (v2.1) - Open Files and Diagonals:
80/100, Grade: A+
swami wrote:
Stockfish 1.6.2 was tested when STS suites had a partial credit moves and Arena erroneously awarded points for certain moves. I wasn't aware of this bug until Wesley pointed it out in a thread later on.
Stockfish 1.6.3 is tested with no-partial scoring-STS suites which consists of only best moves.
Hi Swami,
I considered again this statement and I think there is something that I don't understand.
You said that previously Arena granted some points more to sf 1.6.2 but you also say that sf 1.6.3 has better test results, so this cannot be because otherwise 1.6.2 should have had better results of 1.6.3 because took some extra point as a gift from Arena, considering that functionality is the same between 1.6.2 and 1.6.3 I still believe your testing procedure is not reproducible at 100% and different runs on the _same_ engine yields to different results.
What do you think ?
Yes, Arena did award the marks for newly introduced move in some of the tests (which would be usually within 3 points range)
Not all tests were subject to erroneous moves from Arena. It's usually STS 2 and 3.
I will have to try different GUI's and see if it's constant. It would be a good experiment
I will try the test with Stockfish as the base in GUI's such as
Chessbase
ChessGUI
Arena
Gradual Test
We will then see which one is more closest to being more static.
As i sad a few times, STS 8 seems to give counterintuitive results, in my rough model.
I wonder why.. with the new results I'll try to use a more exotic technique, let's see..