Stockfish 070114 Vs Stockfish 070114 8 logical cores vs 4

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Stockfish 070114 Vs Stockfish 070114 8 logical cores vs 4

Post by mwyoung »

I am testing Stockfish Vs Stockfish. In the hope we can answer the question. What is the best setting for Stockfish. Does 8 logical cores really out perform 4 real cores, when Stockfish is playing Stockfish? As it does against other programs. Is stockfish better because of better MP?

I will run this test to a unknown length, until one setting can get outside the error bars of the other setting at 99.7% and shows clear superiority.

i7 840 cpu 4 cores
512MB hash
5 stone TB
GM book to 8 moves, same book for both stockfish versions.
Stockfish 070114 (a) 8CPU Idle sleeping threads enabled.
Stockfish 070114 4CPU Idle sleeping threads enabled.

First results update:

Code: Select all

Blitz, Blitz 1m+1s  0

                                        
1   Stockfish 070114 64 SSE4.2 a   +52  +6/=19/-2 57.41%   15.5/27
2   Stockfish 070114 64 SSE4.2     -52  +2/=19/-6 42.59%   11.5/27

"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Update after 151 Games Error bar 99.7% [-6, +107]

Post by mwyoung »

Code: Select all


Blitz, Blitz 1m+1s  0

                                        
1   Stockfish 070114 64 SSE4.2 a   +25  +25/=112/-14 53.64%   81.0/151
2   Stockfish 070114 64 SSE4.2     -25  +14/=112/-25 46.36%   70.0/151

"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
ernest
Posts: 2053
Joined: Wed Mar 08, 2006 8:30 pm

Re: Update after 151 Games Error bar 99.7% [-6, +107]

Post by ernest »

Whether you like it or not, I find this test interesting!
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Update after 151 Games Error bar 99.7% [-6, +107]

Post by mwyoung »

ernest wrote:Whether you like it or not, I find this test interesting!
That is ok, I find it interesting also.
There is no problem between us.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Update: after 228 Game 99.7%-> [-1 , +91] TPR +24 Elo

Post by mwyoung »

Code: Select all

Blitz, Blitz 1m+1s  0

                                        
1   Stockfish 070114 64 SSE4.2 a   +24  +37/=170/-21 53.51%  122.0/228
2   Stockfish 070114 64 SSE4.2     -24  +21/=170/-37 46.49%  106.0/228

"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Final update 99.7%->[ +6, +83] TPR +27

Post by mwyoung »

Best setting for Stockfish on a i7 4 core system. All testing with stockfish in my testing indicates that Stockfish can use logical cores successfully up to 8 threads with measurable gains.

Code: Select all

Blitz, Blitz 1m+1s  0

                                        
1   Stockfish 070114 64 SSE4.2 a   +27  +55/=235/-30 53.91%  172.5/320
2   Stockfish 070114 64 SSE4.2     -27  +30/=235/-55 46.09%  147.5/320

"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
ernest
Posts: 2053
Joined: Wed Mar 08, 2006 8:30 pm

Re: Final update 99.7%->[ +6, +83] TPR +27

Post by ernest »

mwyoung wrote: 99.7%->[ +6, +83] TPR +27
Hi,

I have never understood how the Fritz GUI arrives to such indications!

This 99.7%->[ +6, +83] or 3SD error-bar is completely skewed with respect to its center, which is +27

Actually, my calculation
from +55/=235/-30 53.91% 172.5/320
is:
3.91 x 7 = +27 Elo indeed
and SD = [sqrt (55+30)]/2/320 = 1.44% or 10 Elo

So for me, the 3SD error-bar is: [-3, +57] of course symmetric with respect to +27

Am I wrong?

Note: the approximations used in my calculation are valid because the score is not far from 50%
User avatar
Ajedrecista
Posts: 2179
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Final update 99.7%->[ +6, +83] TPR +27.

Post by Ajedrecista »

Hi Ernest:
ernest wrote:
mwyoung wrote: 99.7%->[ +6, +83] TPR +27
Hi,

I have never understood how the Fritz GUI arrives to such indications!

This 99.7%->[ +6, +83] or 3SD error-bar is completely skewed with respect to its center, which is +27

Actually, my calculation
from +55/=235/-30 53.91% 172.5/320
is:
3.91 x 7 = +27 Elo indeed
and SD = [sqrt (55+30)]/2/320 = 1.44% or 10 Elo

So for me, the 3SD error-bar is: [-3, +57] of course symmetric with respect to +27

Am I wrong?

Note: the approximations used in my calculation are valid because the score is not far from 50%
I also never understand how ChessBase GUI reach those results... it probably does not use a normal distribution but other one. I get the following result with my own tool:

Code: Select all

LOS_and_Elo_uncertainties_calculator, ® 2012-2013.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Maximum number of games supported: 2147483647.

Write down the number of wins (up to 1825361100):

55

Write down the number of loses (up to 1825361100):

30

Write down the number of draws (up to 2147483562):

235

 Write down the confidence level (in percentage) between 65% and 99.9% (it will be rounded up to 0.01%):

99.73

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

---------------------------------------
Elo interval for 99.73 % confidence:

Elo rating difference:     27.20 Elo

Lower rating difference:   -2.59 Elo
Upper rating difference:   57.39 Elo

Lower bound uncertainty:  -29.78 Elo
Upper bound uncertainty:   30.19 Elo
Average error:        +/-  29.99 Elo

K = (average error)*[sqrt(n)] =  536.43

Elo interval: ]  -2.59,   57.39[
---------------------------------------

Number of games of the match:       320
Score: 53.91 %
Elo rating difference:   27.20 Elo
Draw ratio: 73.44 %

************************************************************************
        Sample standard deviation:  1.4261 % of the points of the match.
3.0000 sample standard deviations:  4.2784 % of the points of the match.

                 (Corresponding to 99.73 % confidence).
************************************************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------

LOS (taking into account draws) is always calculated, if possible.

LOS (not taking into account draws) is only calculated if wins + loses < 16001.

LOS (average value) is calculated only when LOS (not taking into account draws) is calculated.
______________________________________________

LOS:  99.69 % (taking into account draws).
LOS:  99.67 % (not taking into account draws).
LOS:  99.68 % (average value).
______________________________________________

These values of LOS are rounded up to 0.01%

End of the calculations. Approximated elapsed time:   97 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
That is, circa (+27.2 ± 30) Elo for 3-sigma confidence. I get a little less than your 1.44% of sigma, surely due to a score of near 54%-46% and not 50%-50%. But I agree with your result: if I round my bounds to the closest integers, our bounds match perfectly (-3 and +57).

Regards from Spain.

Ajedrecista.
ernest
Posts: 2053
Joined: Wed Mar 08, 2006 8:30 pm

Re: Final update 99.7%->[ +6, +83] TPR +27.

Post by ernest »

Thanks, Jesus !!! :)
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Final update 99.7%->[ +6, +83] TPR +27.

Post by Milos »

Ajedrecista wrote:Hi Ernest:
ernest wrote:
mwyoung wrote: 99.7%->[ +6, +83] TPR +27
Hi,

I have never understood how the Fritz GUI arrives to such indications!

This 99.7%->[ +6, +83] or 3SD error-bar is completely skewed with respect to its center, which is +27

Actually, my calculation
from +55/=235/-30 53.91% 172.5/320
is:
3.91 x 7 = +27 Elo indeed
and SD = [sqrt (55+30)]/2/320 = 1.44% or 10 Elo

So for me, the 3SD error-bar is: [-3, +57] of course symmetric with respect to +27

Am I wrong?

Note: the approximations used in my calculation are valid because the score is not far from 50%
I also never understand how ChessBase GUI reach those results... it probably does not use a normal distribution but other one. I get the following result with my own tool:

Code: Select all

LOS_and_Elo_uncertainties_calculator, ® 2012-2013.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Maximum number of games supported: 2147483647.

Write down the number of wins (up to 1825361100):

55

Write down the number of loses (up to 1825361100):

30

Write down the number of draws (up to 2147483562):

235

 Write down the confidence level (in percentage) between 65% and 99.9% (it will be rounded up to 0.01%):

99.73

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

---------------------------------------
Elo interval for 99.73 % confidence:

Elo rating difference:     27.20 Elo

Lower rating difference:   -2.59 Elo
Upper rating difference:   57.39 Elo

Lower bound uncertainty:  -29.78 Elo
Upper bound uncertainty:   30.19 Elo
Average error:        +/-  29.99 Elo

K = (average error)*[sqrt(n)] =  536.43

Elo interval: ]  -2.59,   57.39[
---------------------------------------

Number of games of the match:       320
Score: 53.91 %
Elo rating difference:   27.20 Elo
Draw ratio: 73.44 %

************************************************************************
        Sample standard deviation:  1.4261 % of the points of the match.
3.0000 sample standard deviations:  4.2784 % of the points of the match.

                 (Corresponding to 99.73 % confidence).
************************************************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------

LOS (taking into account draws) is always calculated, if possible.

LOS (not taking into account draws) is only calculated if wins + loses < 16001.

LOS (average value) is calculated only when LOS (not taking into account draws) is calculated.
______________________________________________

LOS:  99.69 % (taking into account draws).
LOS:  99.67 % (not taking into account draws).
LOS:  99.68 % (average value).
______________________________________________

These values of LOS are rounded up to 0.01%

End of the calculations. Approximated elapsed time:   97 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
That is, circa (+27.2 ± 30) Elo for 3-sigma confidence. I get a little less than your 1.44% of sigma, surely due to a score of near 54%-46% and not 50%-50%. But I agree with your result: if I round my bounds to the closest integers, our bounds match perfectly (-3 and +57).

Regards from Spain.

Ajedrecista.
I noticed you have different LOS value with or without draws. Draws don't affect LOS at all, so your calculation with draws is probably wrong.
Exact value of 1SD is 1.423907% and you also have a small error in its calculation.