Estimating ELO difference

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

mihaiv

Estimating ELO difference

Post by mihaiv »

I would like to know the formula for the error in estimating the ELO difference with 95% confidence as commonly used by chess engine authors.
If I have 10 games between 2 engines and I get a 200 ELO difference which is the likely error? But for 100 games?
uaf
Posts: 98
Joined: Sat Jul 31, 2010 8:48 pm
Full name: Ubaldo Andrea Farina

Re: Estimating ELO difference

Post by uaf »

User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Estimating ELO difference

Post by hgm »

Roughly it is 560/sqrt(N), if N is the total number of game, and the result is in the 25-75% range.

For extreme scores, it becomes dependent on the exact Elo model you use, but the above formula can still be used as a rough estimate when you take for N the number of non-wins or non-losses (whichever is smaller). So if an engine scores 3 out of 100, the error is as if you played about 3 games.
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: Estimating ELO difference

Post by AlvaroBegue »

This is how I would think about it. ELO difference is a number D such that E:=1/(1+10^(-D/400)) is the number of points one player is expected to get when playing the other.

We can concentrate on computing E instead of D. We are trying to estimate E, using the results of some games as evidence. Bayes's formula is the right thing to use, and this requires that we have a prior distribution for E. If we don't know anything else, a uniform distribution in [0,1] is a natural prior to use.

It turns out that after W wins and L losses, the posterior distribution for E follows a beta distribution with parameters W+1 and L+1.

The mean of a beta(W+1,L+1) distribution is mu:=(W+1)/(W+L+2) and its standard deviation is sigma:=sqrt((W+1)*(L+1)/((W+L+2)^2*(W+L+3))). You can try to convert the mean to an ELO score like this:

D = 400*log(1-1/mu)/log(10)

What to do about the standard deviation is trickier, but if you use a linear approximation to this formula (which should work well if the standard deviation is small), you just have to multiply sigma times the derivative of D as a function of mu. I get

sigma * 400/((1-1/mu)*mu^2*log(10))

In order to get a simpler formula, you can assume W and L are similar and large, and then I get

(1600/log(10))/sqrt(N) ~= 695/sqrt(N)

When N is the total number of games played.

It's not too too far from what hgm posted, and it is likely that I made a mistake somewhere. Does anyone have a derivation of the 560/sqrt(N) formula?
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: Estimating ELO difference

Post by AlvaroBegue »

I actually see I made a mistake, and now I get a different result, where the final approximation ends up being (800/log(10))/sqrt(N), i.e. half of what I computed earlier. I'll redo things more carefully tonight and post what I get.
User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Estimating ELO difference

Post by hgm »

I did not really calculate it from any model, I just used the rule of thumb that excess 1% score corresponds to 7 Elo, and that with a (quite typical) draw rate of 32% the standard deviation is 40%/sqrt(N). And that a 95% interval is about 2 sigma wde. So the 560 came about as 40 x 2 x 7. So its all very course estimates.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Estimating ELO difference

Post by Laskos »

AlvaroBegue wrote:
(1600/log(10))/sqrt(N) ~= 695/sqrt(N)

When N is the total number of games played.

It's not too too far from what hgm posted, and it is likely that I made a mistake somewhere. Does anyone have a derivation of the 560/sqrt(N) formula?

It's very close. For 95.45% confidence (2 standard deviations), one has

(1600/ln(10))/sqrt(N) ~= 695/sqrt(N) times
sqrt(4*score*(1-score) - DrawFraction)

score = number of points / N
DrawFraction = number of draws / N

The final formula is

Error (2SD) in Elo points = 695 * sqrt(4*score*(1-score) - DrawFraction) / sqrt(N)
User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Estimating ELO difference

Post by hgm »

And that is very close to what I had, as for Chess I assume a draw fraction of 1/3. So there would be a multiplier sqrt(0.66) ~ 0.8, which times 695 is about 560.
ernest
Posts: 2041
Joined: Wed Mar 08, 2006 8:30 pm

Re: Estimating ELO difference

Post by ernest »

hgm wrote:as for Chess I assume a draw fraction of 1/3.
That's the major error term of your formula: in engine-engine tests you can often see 2/3 draws.
Then you have to divide your number by sqrt(2)
User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Estimating ELO difference

Post by hgm »

I have never seen such a high draw fraction in any engine-engine testing I did. For Chess. (In Shogi the draw fraction is of course close to 0%.)