Elo difference and statistical confidence for matches

Uri Blass · Post by **Uri Blass** » Fri Feb 08, 2008 5:16 pm

omid_dt wrote:Assuming normal distribution may not be totally accurate, but it is widely used in many fields (and has resulted in some horrible results, e.g., the economic model of LTCM hedge fund, which lost billions).

But anyway, in our case, is there a better alternative?

The binomical or in this case 3 probabilities instead 2 is a better alternative
that you can use at least for n that is small(and it seems to me possible to get exact value at least for number of games that is not bigger than 200
and I believe that even for 1000 games it is possible if it does not need to be done in one second and you are ready to wait some hours).

The main problem to do it is that you need to have probability with hundreds of digits(or even thousands of digits in case of 1000)

I do not know how to calculate numbers like 200!/(100!*50!*50!) in the C language and also calculating numbers like 0.4^100*0.31^50*0.29^50 when you need accurate number may be a problem but if you solve this problem then you may only need to calculate sum of a million probabilities like 0.4^(200-W-D)*0.31^D*0.29^W for
all the values that W+2D=0,W+2D=1,W+2D=2,....so you get probability distribution of W+2D and based on probability distribution you can find better confidence interval.

I think that the main technical problem is to write function to calculate all of this.
I do not know how much time every calculation takes and I assume that you may need more than regular calculation on integers but I believe that something in the order of million calculation is practically not a problem(it may be a problem if n is in the order of thousands).

I know that people tend to assume that the normal distribution is good enough for big N*P but I think that it is always better to get exact value and not approximated value if getting an exact value is possible.

Uri

Uri Blass · Post by **Uri Blass** » Fri Feb 08, 2008 5:26 pm

hgm wrote:No, but if I would have to do it, I would simply use the bi-nomial formula

P(W wins out of N) = N!/(W! * L!) * P_win^W * P_loss^L

to calculate the probability for W wins and L losses (W+L=N)with the P_win and P_loss derived from the actual result for each W, and use those numbers to tabulate the cumulative distribution, and interpolate that linearly. And then find the 16% and 84% points (or 2.5% and 97.5%, or ...) on the W axis.

Or, to allow for draws, use

P(W,D,L) = N!/(W!*D!*L!) * P_win^W * P_draw^D * P_loss^L

for all possible combinations of W, D , L with W+D+L=N, and add those combinations that result in the same score S = W+D/2, and calculate the cumulative probability as a function of S.

If the number of games is not extremely small, linear interpolation would be good enough.

I can add that it is logical to have some apriory distribution about P_win and P_draw and P_loss and use this information because if you get one draw then it is not logical to give P_draw=100%

A possible simple alternative that may be to add 1 to the number of wins number of draws and number of losses(and add 3 to N) so you never get 100% and even for small number of N you may get some interval.

For small N it is certainly better than the method that give interval of 0 after 2 draws between engines because of assuming 100% probability for draw and when N is big it does not change much.

If you have apriory knowledge that one of the engines is better it may be better to start by doing something like adding 3 to W and only adding 1 to D and 1 to L and 5 to N.

Uri

omid_dt · Post by **omid_dt** » Fri Feb 08, 2008 5:44 pm

I don't care about giving confidence intervals when N is very low. That is nonsense anyway.

All I want to make sure is that for large number of games (N > 100), the calculation is valid.

hgm · Post by **hgm** » Fri Feb 08, 2008 6:07 pm

For N>100 games, the calculation is valid, provided the score is not extreme. If the result would be 190-3, the calculation woud not be valid. A rating based on a 190-3 result (193 games!) would actually be less accurate (i.e. have a wider confidence interval) than one based on a 3-3 result, even without making any approximations. So the condition is that at least two of the quantities W, L and D are large (say > 40).

Elo difference and statistical confidence for matches

Re: Elo difference and statistical confidence for matches

Re: Elo difference and statistical confidence for matches

Re: Elo difference and statistical confidence for matches

Re: Elo difference and statistical confidence for matches