Stockfish 1.6.3 JA update available

mcostalba · Post by **mcostalba** » Wed Feb 03, 2010 8:08 pm

swami wrote: Stockfish 1.6.2 was tested when STS suites had a partial credit moves and Arena erroneously awarded points for certain moves. I wasn't aware of this bug until Wesley pointed it out in a thread later on.

Stockfish 1.6.3 is tested with no-partial scoring-STS suites which consists of only best moves.

Hi Swami,

I considered again this statement and I think there is something that I don't understand.

You said that previously Arena granted some points more to sf 1.6.2 but you also say that sf 1.6.3 has better test results, so this cannot be because otherwise 1.6.2 should have had better results of 1.6.3 because took some extra point as a gift from Arena, considering that functionality is the same between 1.6.2 and 1.6.3 I still believe your testing procedure is not reproducible at 100% and different runs on the _same_ engine yields to different results.

What do you think ?

swami · Post by **swami** » Thu Feb 04, 2010 1:46 am

mcostalba wrote:
swami wrote: Stockfish 1.6.2 was tested when STS suites had a partial credit moves and Arena erroneously awarded points for certain moves. I wasn't aware of this bug until Wesley pointed it out in a thread later on.

Stockfish 1.6.3 is tested with no-partial scoring-STS suites which consists of only best moves.
Hi Swami,

thanks for the explanation. It would be interesting, as a verification if you could rerun SF 1.6.2 with the lastest no-partial scoring-STS setup so to verify your test gives no difference from 1.6.3

Thanks
Marco

Hi Marco,

I ran the same test again with same version, I notice little difference. It maybe due to the fact that I had Ivanhoe running 2 cores in background during the first two tests. When Stockfish was still using 1 CPU, I thought it wouldn't affect the results when I ran Ivanhoe for analysing games in another window.

Now, I'm going to run 1.6.2 and 1.6.3 (again) without any other programs in the background. Will let you know about the results.

Stockfish 1.6.3 JA
by Marco Costalba, Tord Romstad, Joona Kiiski, Europe.

Strategic Test Suite Conditions:

Core2Quad 32 bits, Q6600, 2 GB RAM, 2.4GHZ
10 seconds per position
900 positions
Engine uses 156 Mb Hash.
Single CPU
Arena GUI

Overall Performance:

Total Score: 695/900 [.....] Average : 77.22% [.....] Grade: A [.....] Total Rated Time: 41.37/150 minutes [2482 Seconds/9000 Seconds]
Subject-wise Scores:

STS (v1.0) - Undermining:
82/100, Grade: A+

STS (v2.1) - Open Files and Diagonals:
80/100, Grade: A+

STS (v3.0) - Knight Outposts/Centralization/Repositioning:
81/100, Grade: A+

STS (v4.1) - Square Vacancy:
83/100, Grade: S

STS (v5.0) - Bishop vs Knight:
79/100, Grade: A

STS (v6.0) - Re-Capturing:
78/100, Grade: A

STS (v7.0) - Offer of Simplification:
74/100, Grade: A-

STS (v8.1) - Advancement of f/g/h Pawns:
63 /100, Grade: B

STS (v9.0) - Advancement of a/b/c Pawns:
75/100, Grade: A

Best Wishes,
Swami

swami · Post by **swami** » Thu Feb 04, 2010 1:50 am

mcostalba wrote:
swami wrote: Stockfish 1.6.2 was tested when STS suites had a partial credit moves and Arena erroneously awarded points for certain moves. I wasn't aware of this bug until Wesley pointed it out in a thread later on.

Stockfish 1.6.3 is tested with no-partial scoring-STS suites which consists of only best moves.
Hi Swami,

I considered again this statement and I think there is something that I don't understand.

You said that previously Arena granted some points more to sf 1.6.2 but you also say that sf 1.6.3 has better test results, so this cannot be because otherwise 1.6.2 should have had better results of 1.6.3 because took some extra point as a gift from Arena, considering that functionality is the same between 1.6.2 and 1.6.3 I still believe your testing procedure is not reproducible at 100% and different runs on the _same_ engine yields to different results.

What do you think ?

Yes, Arena did award the marks for newly introduced move in some of the tests (which would be usually within 3 points range)

Not all tests were subject to erroneous moves from Arena. It's usually STS 2 and 3.

I will have to try different GUI's and see if it's constant. It would be a good experiment

I will try the test with Stockfish as the base in GUI's such as

Chessbase
ChessGUI
Arena
Gradual Test

We will then see which one is more closest to being more static.

Dann Corbit · Post by **Dann Corbit** » Thu Feb 04, 2010 1:59 am

Since we have a gradualtest converter, maybe it is best just to use gradualtest to perform the analysis

noctiferus · Post by **noctiferus** » Thu Feb 04, 2010 2:53 am

As i sad a few times, STS 8 seems to give counterintuitive results, in my rough model.
I wonder why.. with the new results I'll try to use a more exotic technique, let's see..

swami · Post by **swami** » Thu Feb 04, 2010 7:03 pm

Now with Stockfish 1.6.2

I didn't use Gradual Test for testing this as I didn't know how to limit the number of cores used.

Very little difference in both the scores and Total Rated time. So it nearly matches in every case.

Stockfish 1.6.2 JA
by Marco Costalba, Tord Romstad, Joona Kiiski, Europe.

Strategic Test Suite Conditions:

Core2Quad 32 bits, Q6600, 2 GB RAM, 2.4GHZ
10 seconds per position
900 positions
Engine uses 156 Mb Hash.
Single CPU
Arena GUI

Overall Performance:

Total Score: 699/900 [.....] Average : 77.67% [.....] Grade: A [.....] Total Rated Time: 40.98/150 minutes [2459 Seconds/9000 Seconds]
Subject-wise Scores:

STS (v1.0) - Undermining:
82/100, Grade: A+

STS (v2.1) - Open Files and Diagonals:
80/100, Grade: A+

STS (v3.0) - Knight Outposts/Centralization/Repositioning:
81/100, Grade: A+

STS (v4.1) - Square Vacancy:
84/100, Grade: S

STS (v5.0) - Bishop vs Knight:
79/100, Grade: A

STS (v6.0) - Re-Capturing:
78/100, Grade: A

STS (v7.0) - Offer of Simplification:
76/100, Grade: A-

STS (v8.1) - Advancement of f/g/h Pawns:
64 /100, Grade: B

STS (v9.0) - Advancement of a/b/c Pawns:
75/100, Grade: A

Best Wishes,
Swami

maxchgr · Post by **maxchgr** » Thu Feb 04, 2010 8:34 pm

What does the square vacancy test?

swami · Post by **swami** » Thu Feb 04, 2010 8:56 pm

maxchgr wrote:What does the square vacancy test?

Not sure what you just asked. Anyway, if you want to know what it means or what it does, please take a look here:

http://sites.google.com/site/strategict ... re-vacancy

David Dahlem · Post by **David Dahlem** » Thu Feb 04, 2010 9:16 pm

swami wrote:I didn't use Gradual Test for testing this as I didn't know how to limit the number of cores used.

Hi Swami

Using the GradualTest "/s" switch, i think this will work -

/s "setoption name Hash value 512\nsetoption name Ponder value false\nsetoption name Threads value 4"

Frank Quisinsky · Post by **Frank Quisinsky** » Fri Feb 05, 2010 2:48 am

Hi Swami,

I don't try Arena in the latest 3 1/2 years. But for test suits you should try the Fritz GUI. The best possibilities I think for automatic test suits.

The Arena versions I know

made nothing own things with engines. So the results should be the same.

Could be easy to test.

Made the same test with Stockfish 1.6.3 again under Arena.

Best
Frank

Stockfish 1.6.3 JA update available

Re: Stockfish 1.6.3 JA update available

Re: Stockfish 1.6.3 JA update available

Re: Stockfish 1.6.3 JA update available

Re: Stockfish 1.6.3 JA update available

Re: Stockfish 1.6.3 JA update available

Re: Stockfish 1.6.3 JA update available

Re: Stockfish 1.6.3 JA update available

Re: Stockfish 1.6.3 JA update available

Re: Stockfish 1.6.3 JA update available

Re: Stockfish 1.6.3 JA update available