My tool divides by (n - 1) games instead of n games: I want to note it with the name 'sample standard deviation' instead of 'standard deviation'. I do not have a calculator at hand but I am sure that the ratio between our SD is sqrt(319/320).Milos wrote:I noticed you have different LOS value with or without draws. Draws don't affect LOS at all, so your calculation with draws is probably wrong.Ajedrecista wrote:Hi Ernest:
I also never understand how ChessBase GUI reach those results... it probably does not use a normal distribution but other one. I get the following result with my own tool:ernest wrote:Hi,mwyoung wrote: 99.7%->[ +6, +83] TPR +27
I have never understood how the Fritz GUI arrives to such indications!
This 99.7%->[ +6, +83] or 3SD error-bar is completely skewed with respect to its center, which is +27
Actually, my calculation
from +55/=235/-30 53.91% 172.5/320
is:
3.91 x 7 = +27 Elo indeed
and SD = [sqrt (55+30)]/2/320 = 1.44% or 10 Elo
So for me, the 3SD error-bar is: [-3, +57] of course symmetric with respect to +27
Am I wrong?
Note: the approximations used in my calculation are valid because the score is not far from 50%
That is, circa (+27.2 ± 30) Elo for 3-sigma confidence. I get a little less than your 1.44% of sigma, surely due to a score of near 54%-46% and not 50%-50%. But I agree with your result: if I round my bounds to the closest integers, our bounds match perfectly (-3 and +57).Code: Select all
LOS_and_Elo_uncertainties_calculator, ® 2012-2013. ---------------------------------------------------------------- Calculation of Elo uncertainties in a match between two engines: ---------------------------------------------------------------- (The input and output data is referred to the first engine). Please write down non-negative integers. Maximum number of games supported: 2147483647. Write down the number of wins (up to 1825361100): 55 Write down the number of loses (up to 1825361100): 30 Write down the number of draws (up to 2147483562): 235 Write down the confidence level (in percentage) between 65% and 99.9% (it will be rounded up to 0.01%): 99.73 Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations: 3 --------------------------------------- Elo interval for 99.73 % confidence: Elo rating difference: 27.20 Elo Lower rating difference: -2.59 Elo Upper rating difference: 57.39 Elo Lower bound uncertainty: -29.78 Elo Upper bound uncertainty: 30.19 Elo Average error: +/- 29.99 Elo K = (average error)*[sqrt(n)] = 536.43 Elo interval: ] -2.59, 57.39[ --------------------------------------- Number of games of the match: 320 Score: 53.91 % Elo rating difference: 27.20 Elo Draw ratio: 73.44 % ************************************************************************ Sample standard deviation: 1.4261 % of the points of the match. 3.0000 sample standard deviations: 4.2784 % of the points of the match. (Corresponding to 99.73 % confidence). ************************************************************************ Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K. ------------------------------------------------------------------- Calculation of likelihood of superiority (LOS) in a one-sided test: ------------------------------------------------------------------- LOS (taking into account draws) is always calculated, if possible. LOS (not taking into account draws) is only calculated if wins + loses < 16001. LOS (average value) is calculated only when LOS (not taking into account draws) is calculated. ______________________________________________ LOS: 99.69 % (taking into account draws). LOS: 99.67 % (not taking into account draws). LOS: 99.68 % (average value). ______________________________________________ These values of LOS are rounded up to 0.01% End of the calculations. Approximated elapsed time: 97 ms. Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
Regards from Spain.
Ajedrecista.
Exact value of 1SD is 1.423907% and you also have a small error in its calculation.
Regarding LOS: I use two different methods for calculate it. The first one uses the z-score inside a normal distribution using sigma = sqrt{[mu*(1 - mu) - D/4]/(n - 1)}. I do not remember right now how exactly I implemented the calculation in the second case, but should be similar to WhoIsBest by Rémi Coulom or something like that. You can browse the source code through the link of my signature.
Both values converge with a big number of games (some thousands are enough).
Regards from Spain.
Ajedrecista.

