Error margin

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Rebel
Posts: 7023
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Error margin

Post by Rebel »

I did some research with Elostat and Bayeselo how they calculate the elo error-margin. They use about the same formula. Some snippets.

Code: Select all

GAMES  ERROR
       MARGIN
1000     17
2000     12
4000      9
5000      8
10000     5
15000     4
20000     4
25000     3
30000     3
50000     3
100000    2
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Error margin

Post by Laskos »

Rebel wrote:I did some research with Elostat and Bayeselo how they calculate the elo error-margin. They use about the same formula. Some snippets.

Code: Select all

GAMES  ERROR
       MARGIN
1000     17
2000     12
4000      9
5000      8
10000     5
15000     4
20000     4
25000     3
30000     3
50000     3
100000    2
Just use 560/sqrt(N_games) for 2SD Elo margins. Or, if you want to be more precise, 700*sqrt(4*win_ratio*(1-win_ratio) - draw_ratio)/sqrt(N_games).

Kai
User avatar
Ajedrecista
Posts: 1972
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Error margin

Post by Ajedrecista »

Hi Ed:

The approximation of Kai is very good. I calculated error bars using my method, but with simplifications: I fixed the draw ratio to 32% and I supposed scores of 50% for each engine. I wrote this Fortran 95 code for calculate error bars from 1000 games up to 200000 in intervals of 100 games; I calculated error bars for different confidence intervals (95%, 98%, 99%, 99.5%, 99.8% and 99.9%) and each calculation plus print in notepad took around 125 ms. Here is the code:

Code: Select all

program Error_bars

implicit none 

integer, parameter :: parts = 2000, iterations = 60
integer :: n, i, j
real(KIND=3) :: sigma, S1, S2, x, t0, t1
real(KIND=3) :: error, three_sqrt_of_two_pi, confidence
real(KIND=3) :: a, b, h_a, h2_a, h_b, h2_b, z, h_z, h2_z, S_a, S_b, S_z, S1z, S2z, function_a, function_b, function_z

write(*,*)
write(*,*) 'Write down the confidence level (in percentage) between 65% and 99.9% (it will be rounded up to 0.01%):'
write(*,*)
read(*,*) confidence  ! Confidence for a two-sided test.
write(*,*)

t0=cpu_clock@()

confidence = 1d-2*nint(1d2*confidence,KIND=3)  ! Rounded up to 0.01%.

if (&#40;confidence < 6.5d1&#41; .or. &#40;confidence > 9.99001d1&#41;)  then
  write&#40;*,'&#40;A&#41;') 'LOS_and_Elo_uncertainties_calculator will not work with a confidence level outside a range of 65% - 99.9%'
  write&#40;*,*)
  write&#40;*,'&#40;A&#41;') 'Please close and try again. Press Enter to exit.'
  read&#40;*,'()')
  stop
end if

if &#40;confidence < 7d1&#41; then  ! It splits into smaller intervals for later doing less iterations; less time is consumed.
  a = 9.345d-1; b = 1.0365d0
  else if (&#40;confidence >= 7d1&#41; .and. &#40;confidence < 7.5d1&#41;) then
  a = 1.0364d0; b = 1.1504d0
  else if (&#40;confidence >= 7.5d1&#41; .and. &#40;confidence < 8d1&#41;) then
  a = 1.1503d0; b = 1.2816d0
  else if (&#40;confidence >= 8d1&#41; .and. &#40;confidence < 8.5d-1&#41;) then
  a = 1.2815d0; b = 1.4396d0
  else if (&#40;confidence >= 8.5d-1&#41; .and. &#40;confidence < 9d1&#41;) then
  a = 1.4395d-1; b = 1.6449d0
  else if (&#40;confidence >= 9d1&#41; .and. &#40;confidence < 9.25d1&#41;) then
  a = 1.6448d0; b = 1.7805d0
  else if (&#40;confidence >= 9.25d1&#41; .and. &#40;confidence < 9.5d1&#41;) then
  a = 1.7804d0; b = 1.96d0
  else if (&#40;confidence >= 9.5d1&#41; .and. &#40;confidence < 9.75d1&#41;) then
  a = 1.9599d0; b = 2.2415d0
  else if (&#40;confidence >= 9.75d1&#41; .and. &#40;confidence < 9.9d1&#41;) then
  a = 2.2414d0; b = 2.5759d0
  else if (&#40;confidence >= 9.9d1&#41; .and. &#40;confidence < 9.95d1&#41;) then
  a = 2.5758d0; b = 2.8071d0
  else if (&#40;confidence >= 9.95d1&#41; .and. &#40;confidence < 9.975d1&#41;) then
  a = 2.807d0; b = 3.0234d0
  else if (&#40;confidence >= 9.975d1&#41; .and. &#40;confidence < 9.984d1&#41;) then
  a = 3.0233d0; b = 3.156d0
  else if &#40;confidence >= 9.984d1&#41; then
  a = 3.1559d0; b = 3.2906d0
end if

three_sqrt_of_two_pi = 3d0*sqrt&#40;2d0*acos&#40;-1d0&#41;)

S_a = 0d0
h_a = a/parts
h2_a = h_a + h_a

x = -h_a
S1 = 0d0
do i = 1, parts-1, 2
  x = x + h2_a
  S1 = S1 + exp&#40;-5d-1*x*x&#41;
end do

x = 0d0
S2 = 0d0
do i = 2, parts-2, 2
  x = x + h2_a
  S2 = S2 + exp&#40;-5d-1*x*x&#41;
end do

S_a = h2_a*&#40;1d0 + 4d0*S1 + 2d0*S2 + exp&#40;-5d-1*a*a&#41;)/three_sqrt_of_two_pi  ! This line prepares a two-sided test.
function_a = S_a - 1d-2*confidence

S_b = 0d0
h_b = b/parts
h2_b = h_b + h_b

x = -h_b
S1 = 0d0
do i = 1, parts-1, 2
  x = x + h2_b
  S1 = S1 + exp&#40;-5d-1*x*x&#41;
end do

x = 0d0
S2 = 0d0
do i = 2, parts-2, 2
  x = x + h2_b
  S2 = S2 + exp&#40;-5d-1*x*x&#41;
end do

S_b = h2_b*&#40;1d0 + 4d0*S1 + 2d0*S2 + exp&#40;-5d-1*b*b&#41;)/three_sqrt_of_two_pi  ! This line prepares a two-sided test.
function_b = S_b - 1d-2*confidence

do j = 1, iterations  ! Solve the parameter z by Regula Falsi method&#58;
  z = a - &#40;b - a&#41;*function_a/&#40;function_b - function_a&#41;
  ! The following is the original&#58;
  ! z = a + &#40;b - a&#41;*abs&#40;function_a&#41;/&#40;abs&#40;function_a&#41; + abs&#40;function_b&#41;)
  ! But given the fact that function_a < 0 and function_b > 0, the other form to calculate z requires less operations, so is faster.
  S_z = 0d0
  h_z = z/parts
  h2_z = h_z + h_z

  x = -h_z
  S1z = 0d0
  do i = 1, parts-1, 2
    x = x + h2_z
    S1z = S1z + exp&#40;-5d-1*x*x&#41;
  end do

  x = 0d0
  S2z = 0d0
  do i = 2, parts-2, 2
    x = x + h2_z
    S2z = S2z + exp&#40;-5d-1*x*x&#41;
  end do

  S_z = h2_z*&#40;1d0 + 4d0*S1z + 2d0*S2z + exp&#40;-5d-1*z*z&#41;)/three_sqrt_of_two_pi  ! This line prepares a two-sided test.
  function_z = S_z - 1d-2*confidence

  if &#40;function_a*function_z < 0d0&#41; then
    b = z
    function_b = function_z
  else if &#40;function_b*function_z < 0d0&#41; then
    a = z
    function_a = function_z
  end if
end do

open&#40;unit=111, file='log.txt', status='unknown', action='write') 
write&#40;111,'&#40;A,F5.2,A&#41;') 'Confidence level&#58; ', confidence, '% confidence.' 
write&#40;111,'&#40;A&#41;') 'Draw ratio&#58; 32% &#40;fixed&#41;.'
write&#40;111,'&#40;A&#41;') 'Simplification&#58; score = 50%.'
write&#40;111,*)
write&#40;111,'&#40;A&#41;') 'Games&#58;      Error bar&#58;'
write&#40;111,*) 

do n = 1000, 200000, 100
  sigma = sqrt&#40;1.7d-1/n&#41;  ! Draw ratio = 32%; score = 50%.
  ! The formula for calculating sigma was taken from this thread &#40;first seen in post #22&#41;&#58; 
  ! http&#58;//immortalchess.net/forum/showthread.php?t=2237

  error = 4d2*log10&#40;&#40;5d-1 + z*sigma&#41;/&#40;5d-1 - z*sigma&#41;)
  error = 1d-2*nint&#40;1d2*error,KIND=3&#41;  ! Rounded up to 0.01 Elo.

  write&#40;111,'&#40;I6,A,F5.2,A,F6.2&#41;') n, '     ± ', error, ' Elo'
end do 

close&#40;111&#41;
 
t1=cpu_clock@()

write&#40;*,'&#40;A,I3,A&#41;') 'End of the calculations. Time&#58; ', nint&#40;&#40;t1-t0&#41;/3d6,KIND=3&#41;, ' ms.'  ! 3 GHz in my PC.
write&#40;*,*)

end program Error_bars
I took a lot of code from other programme (LOS_and_Elo_uncertainties_calculator) that I wrote some months ago.

My results are very similar to yours and also to Kai's results (I rounded them up to 0.01 Elo):

Error_bars_with_simplifications (0.02 MB)

Regards from Spain.

Ajedrecista.
User avatar
Rebel
Posts: 7023
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Error margin

Post by Rebel »

Thanks for the help. I also think Kai's formula is very good, at least good enough for its purpose which is the MATCH utility I am working on.

There is a new version that besides the LOS now also displays the elo error margin and an estimated elo performance.

Example:

Code: Select all

 2879-2452-2668 &#40;7999&#41;  match score 4105.0 - 3894.0 &#40;51.3%)

 Won-loss 2879-2668 = 211 &#40;7999 games&#41; draws 30.7%

 LOS = 99.8%  Elo Error Margin +6 -6

 Engine MICKEY &#40;elo 2500&#41; vs Engine MOUSE &#40;elo 2500&#41; estimated TPR 2508 (+8&#41;
Download with source code at: http://www.top-5000.nl/match.htm
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Error margin

Post by lucasart »

Rebel wrote:Thanks for the help. I also think Kai's formula is very good, at least good enough for its purpose which is the MATCH utility I am working on.

There is a new version that besides the LOS now also displays the elo error margin and an estimated elo performance.

Example:

Code: Select all

 2879-2452-2668 &#40;7999&#41;  match score 4105.0 - 3894.0 &#40;51.3%)

 Won-loss 2879-2668 = 211 &#40;7999 games&#41; draws 30.7%

 LOS = 99.8%  Elo Error Margin +6 -6

 Engine MICKEY &#40;elo 2500&#41; vs Engine MOUSE &#40;elo 2500&#41; estimated TPR 2508 (+8&#41;
Download with source code at: http://www.top-5000.nl/match.htm
I don't agree with your calculation. From these results, the (unbiaised) empirical mean and stdev are:
* mu = E_hat(Xi) = (X1+...+Xn)/n = (#win + #draw/2) / n = 0.52669083635454
* V_hat(Xi) = [#win.(1-mu)^2 + #loss.(0-mu)^2 + #draw.(.5-mu)^2] / (n-1) = 0.16592291903455
* sigma = stdev(E_hat(Xi)) = sqrt(V_hat(Xi) / n) = 0.00455444373651
* LOS = P(N(mu,sigma) > .5) = P(N(0,1) > (.5-mu)/sigma) = P(N(0,1) < (mu-.5)/sigma) = 0.99999999769115. Don't trust too many decimals there, I used the function normsdist(x) = P(N(0,1)>x) from Gnumeric spreadsheet program. But still more than 99.8%.

Note that I'm using the gaussian approximation of the real distribution (which is a trinomial, rescaled). I haven't calculated the exact multinomial, but I doubt it would be very different.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
User avatar
Ajedrecista
Posts: 1972
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Error margin.

Post by Ajedrecista »

Hello Ed and Lucas:
lucasart wrote:
Rebel wrote:Thanks for the help. I also think Kai's formula is very good, at least good enough for its purpose which is the MATCH utility I am working on.

There is a new version that besides the LOS now also displays the elo error margin and an estimated elo performance.

Example:

Code: Select all

 2879-2452-2668 &#40;7999&#41;  match score 4105.0 - 3894.0 &#40;51.3%)

 Won-loss 2879-2668 = 211 &#40;7999 games&#41; draws 30.7%

 LOS = 99.8%  Elo Error Margin +6 -6

 Engine MICKEY &#40;elo 2500&#41; vs Engine MOUSE &#40;elo 2500&#41; estimated TPR 2508 (+8&#41;
Download with source code at: http://www.top-5000.nl/match.htm
I don't agree with your calculation. From these results, the (unbiaised) empirical mean and stdev are:
* mu = E_hat(Xi) = (X1+...+Xn)/n = (#win + #draw/2) / n = 0.52669083635454
* V_hat(Xi) = [#win.(1-mu)^2 + #loss.(0-mu)^2 + #draw.(.5-mu)^2] / (n-1) = 0.16592291903455
* sigma = stdev(E_hat(Xi)) = sqrt(V_hat(Xi) / n) = 0.00455444373651
* LOS = P(N(mu,sigma) > .5) = P(N(0,1) > (.5-mu)/sigma) = P(N(0,1) < (mu-.5)/sigma) = 0.99999999769115. Don't trust too many decimals there, I used the function normsdist(x) = P(N(0,1)>x) from Gnumeric spreadsheet program. But still more than 99.8%.

Note that I'm using the gaussian approximation of the real distribution (which is a trinomial, rescaled). I haven't calculated the exact multinomial, but I doubt it would be very different.
Please note that Ed wrote wins, draws and loses instead of wins, loses and draws, as you thought. I know it due to the extra info of 4105 points out of 7999: draws = 2*(4105 - 2879) = 2452. Furthermore, Ed wrote:

Code: Select all

Won-loss 2879-2668 = 211 &#40;7999 games&#41; draws 30.7%
Running LOS_and_Elo_uncertainties_calculator with 95% confidence:

Code: Select all

LOS_and_Elo_uncertainties_calculator, ® 2012.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines&#58;
----------------------------------------------------------------

&#40;The input and output data is referred to the first engine&#41;.

Please write down non-negative integers.

Maximum number of games supported&#58; 2147483647.

Write down the number of wins &#40;up to 1825361100&#41;&#58;

2879

Write down the number of loses &#40;up to 1825361100&#41;&#58;

2668

Write down the number of draws &#40;up to 2147478100&#41;&#58;

2452

 Write down the confidence level &#40;in percentage&#41; between 65% and 99.9% &#40;it will be rounded up to 0.01%)&#58;

95

Write down the clock rate of the CPU &#40;in GHz&#41;, only for timing the elapsed time of the calculations&#58;

3

---------------------------------------
Elo interval for 95.00 % confidence&#58;

Elo rating difference&#58;      9.17 Elo

Lower rating difference&#58;    2.83 Elo
Upper rating difference&#58;   15.51 Elo

Lower bound uncertainty&#58;   -6.34 Elo
Upper bound uncertainty&#58;    6.35 Elo
Average error&#58;        +/-   6.34 Elo

K = &#40;average error&#41;*&#91;sqrt&#40;n&#41;&#93; =  567.24

Elo interval&#58; &#93;   2.83,   15.51&#91;
---------------------------------------

Number of games of the match&#58;      7999
Score&#58; 51.32 %
Elo rating difference&#58;    9.17 Elo
Draw ratio&#58; 30.65 %

*********************************************************
Standard deviation&#58;  0.9120 % of the points of the match.
*********************************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority &#40;LOS&#41; in a one-sided test&#58;
-------------------------------------------------------------------

LOS &#40;taking into account draws&#41; is always calculated, if possible.

LOS &#40;not taking into account draws&#41; is only calculated if wins + loses < 16001.

LOS &#40;average value&#41; is calculated only when LOS &#40;not taking into account draws&#41; is calculated.
______________________________________________

LOS&#58;  99.77 % &#40;taking into account draws&#41;.
LOS&#58;  99.77 % &#40;not taking into account draws&#41;.
LOS&#58;  99.77 % &#40;average value&#41;.
______________________________________________

These values of LOS are rounded up to 0.01%

End of the calculations. Approximated elapsed time&#58;   85 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
Your calculations are right for the other case (+2879 -2452 =2668) which is not the case that Ed exposed (+2879 -2668 =2452). Please note that when my programme prints 'standard deviation', it is referring to z*sigma, in this case 1.96*sigma more less.

I think that Ed's calculations are good.

Regards from Spain.

Ajedrecista.
User avatar
Rebel
Posts: 7023
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Error margin

Post by Rebel »

lucasart wrote:
Rebel wrote:Thanks for the help. I also think Kai's formula is very good, at least good enough for its purpose which is the MATCH utility I am working on.

There is a new version that besides the LOS now also displays the elo error margin and an estimated elo performance.

Example:

Code: Select all

 2879-2452-2668 &#40;7999&#41;  match score 4105.0 - 3894.0 &#40;51.3%)

 Won-loss 2879-2668 = 211 &#40;7999 games&#41; draws 30.7%

 LOS = 99.8%  Elo Error Margin +6 -6

 Engine MICKEY &#40;elo 2500&#41; vs Engine MOUSE &#40;elo 2500&#41; estimated TPR 2508 (+8&#41;
Download with source code at: http://www.top-5000.nl/match.htm
I don't agree with your calculation. From these results, the (unbiaised) empirical mean and stdev are:
* mu = E_hat(Xi) = (X1+...+Xn)/n = (#win + #draw/2) / n = 0.52669083635454
* V_hat(Xi) = [#win.(1-mu)^2 + #loss.(0-mu)^2 + #draw.(.5-mu)^2] / (n-1) = 0.16592291903455
* sigma = stdev(E_hat(Xi)) = sqrt(V_hat(Xi) / n) = 0.00455444373651
* LOS = P(N(mu,sigma) > .5) = P(N(0,1) > (.5-mu)/sigma) = P(N(0,1) < (mu-.5)/sigma) = 0.99999999769115. Don't trust too many decimals there, I used the function normsdist(x) = P(N(0,1)>x) from Gnumeric spreadsheet program. But still more than 99.8%.

Note that I'm using the gaussian approximation of the real distribution (which is a trinomial, rescaled). I haven't calculated the exact multinomial, but I doubt it would be very different.
Note that the LOS is created without draws. This might do not well after 10 games or 100 games for that matter but so it is for the match score. In the end volume is supposed to weed out most of the randomness and this also counts for the draw ratio as it becomes almost irrelevant.
ernest
Posts: 2041
Joined: Wed Mar 08, 2006 8:30 pm

Re: Error margin

Post by ernest »

Laskos wrote:for 2SD Elo margins...
700*sqrt(4*win_ratio*(1-win_ratio) - draw_ratio)/sqrt(N_games).
Hi Kai,

I prefer the term "score_ratio" to win_ratio (which could be ambiguous)

Note that when the engines are of similar strength (score_ratio close to 0.5, or say 40% to 60%),
4*score_ratio*(1-score_ratio) is close to 1 and you get:

2SD_ratio = sqrt(1 - draw_ratio)/sqrt(N_games)
and of course
2SD Elo margin = 700*sqrt(1 - draw_ratio)/sqrt(N_games)

...hence your 560/sqrt(N_games) corresponds to draw_ratios close to 36% 8-)
User avatar
hgm
Posts: 27837
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Error margin

Post by hgm »

Yes, that is a good approximation. I always use that too, in the form SD = 40%/sqrt(N). Which amounts to the same if you realize that 1% = 7Elo, and the 95% confidence interval is 2*SD. (40*2*7 = 560). I usually think in terms of percentages rather than Elo, because the percentages is what I directly see from the match results, and I am usually interested to know how many points on that match-result are uncertainty (0.4*sqrt(N)).