Stockfish 070114 Vs Stockfish 070114 8 logical cores vs 4

Ajedrecista · Post by **Ajedrecista** » Mon Jan 13, 2014 8:06 am

Hello Milos:

Milos wrote:
Ajedrecista wrote:Hi Ernest:

ernest wrote:
mwyoung wrote: 99.7%->[ +6, +83] TPR +27
Hi,

I have never understood how the Fritz GUI arrives to such indications!

This 99.7%->[ +6, +83] or 3SD error-bar is completely skewed with respect to its center, which is +27

Actually, my calculation
from +55/=235/-30 53.91% 172.5/320
is:
3.91 x 7 = +27 Elo indeed
and SD = [sqrt (55+30)]/2/320 = 1.44% or 10 Elo

So for me, the 3SD error-bar is: [-3, +57] of course symmetric with respect to +27

Am I wrong?

Note: the approximations used in my calculation are valid because the score is not far from 50%
I also never understand how ChessBase GUI reach those results... it probably does not use a normal distribution but other one. I get the following result with my own tool:
Code: Select all
LOS_and_Elo_uncertainties_calculator, ® 2012-2013.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Maximum number of games supported: 2147483647.

Write down the number of wins (up to 1825361100):

55

Write down the number of loses (up to 1825361100):

30

Write down the number of draws (up to 2147483562):

235

 Write down the confidence level (in percentage) between 65% and 99.9% (it will be rounded up to 0.01%):

99.73

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

---------------------------------------
Elo interval for 99.73 % confidence:

Elo rating difference:     27.20 Elo

Lower rating difference:   -2.59 Elo
Upper rating difference:   57.39 Elo

Lower bound uncertainty:  -29.78 Elo
Upper bound uncertainty:   30.19 Elo
Average error:        +/-  29.99 Elo

K = (average error)*[sqrt(n)] =  536.43

Elo interval: ]  -2.59,   57.39[
---------------------------------------

Number of games of the match:       320
Score: 53.91 %
Elo rating difference:   27.20 Elo
Draw ratio: 73.44 %

************************************************************************
        Sample standard deviation:  1.4261 % of the points of the match.
3.0000 sample standard deviations:  4.2784 % of the points of the match.

                 (Corresponding to 99.73 % confidence).
************************************************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------

LOS (taking into account draws) is always calculated, if possible.

LOS (not taking into account draws) is only calculated if wins + loses < 16001.

LOS (average value) is calculated only when LOS (not taking into account draws) is calculated.
______________________________________________

LOS:  99.69 % (taking into account draws).
LOS:  99.67 % (not taking into account draws).
LOS:  99.68 % (average value).
______________________________________________

These values of LOS are rounded up to 0.01%

End of the calculations. Approximated elapsed time:   97 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
That is, circa (+27.2 ± 30) Elo for 3-sigma confidence. I get a little less than your 1.44% of sigma, surely due to a score of near 54%-46% and not 50%-50%. But I agree with your result: if I round my bounds to the closest integers, our bounds match perfectly (-3 and +57).

Regards from Spain.

Ajedrecista.
I noticed you have different LOS value with or without draws. Draws don't affect LOS at all, so your calculation with draws is probably wrong.
Exact value of 1SD is 1.423907% and you also have a small error in its calculation.

My tool divides by (n - 1) games instead of n games: I want to note it with the name 'sample standard deviation' instead of 'standard deviation'. I do not have a calculator at hand but I am sure that the ratio between our SD is sqrt(319/320).

Regarding LOS: I use two different methods for calculate it. The first one uses the z-score inside a normal distribution using sigma = sqrt{[mu*(1 - mu) - D/4]/(n - 1)}. I do not remember right now how exactly I implemented the calculation in the second case, but should be similar to WhoIsBest by Rémi Coulom or something like that. You can browse the source code through the link of my signature.

Both values converge with a big number of games (some thousands are enough).

Regards from Spain.

Ajedrecista.

Ajedrecista · Post by **Ajedrecista** » Mon Jan 13, 2014 3:40 pm

Hello again:

What I call LOS without draws is found in the next Rémi's post:

Re: Likelihood of superiority

I use the last equation of this image.

------------

I have just checked the ratio of our SD: it is sqrt(319/320) indeed (or its inverse, whatever you prefer). Thanks for your interest in my post.

Regards from Spain.

Ajedrecista.

Milos · Post by **Milos** » Mon Jan 13, 2014 4:24 pm

Ajedrecista wrote:Hello again:

What I call LOS without draws is found in the next Rémi's post:

Re: Likelihood of superiority

I use the last equation of this image.

------------

I have just checked the ratio of our SD: it is sqrt(319/320) indeed (or its inverse, whatever you prefer). Thanks for your interest in my post.

Regards from Spain.

Ajedrecista.

I see, now is more clear. Remi's formula is the exact one and the other one is normal distribution based which is quite good estimation of the exact one (except for very low number of games). So I was just confused with terminology since I was also getting the same value as your "LOS without draws"

.
For SD I realized you use the same formula for single game variance (w+d)(1-w-d)-d/4 so it is just 1/n vs. 1/(n-1) which made the difference which in the end is really negligible.

Stockfish 070114 Vs Stockfish 070114 8 logical cores vs 4

Re: Final update 99.7%->[ +6, +83] TPR +27.

Re: Final update 99.7%->[ +6, +83] TPR +27.

Re: Final update 99.7%->[ +6, +83] TPR +27.