Rule of thumb error formula

Discussion of chess software programming and technical issues.

Moderator: Ras

zamar
Posts: 613
Joined: Sun Jan 18, 2009 7:03 am

Rule of thumb error formula

Post by zamar »

I think the rule-of-thumb Error = 40%/sqrt(numberOfGames) is accurate enough in practice, for scores in the 65%-35% range. (This is for the 1-sigma or 84% confidence level; for 95% confidence, double it.)
H.G. Muller posted this very important formula in one thread and I just want to make sure I got it right. So let's take just one example.

Match Result:
A - B: 460 - 440

Score percentage for A: 460 / 900 = 51.1%.

Error margin: 40% / sqrt(900) = 1.3%. Now where should I apply this error margin? Is it calculated directly for score percentage?

So the correct result is 51.1% +- 1.3% (with 84% confidence). Did I got this right?


Now if we are improving the engine through the self-play, the truly interesting question is
"With given match result 460-440 what is the confidence level that the correct score percentage is >=50%?". I know there can't be such an easy rule thumb formula here, but if someone has already figured out the more complicate one, please post it here :)
Joona Kiiski
User avatar
hgm
Posts: 28360
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Rule of thumb error formula

Post by hgm »

Correct, the error applies to the score percentage.

Beware with the confidence level; the 84% that I quoted for 1-sigma error bars was actually the one-sided confidence. So there is an 84% likelihood that the actual score percentage of the engine (i.e. th one you would get after infinitely many games ) is between 51.1%-1.3% and 100% (or between 0% and 51.1%+1.3%). The confidence you can have that the true score percentage will be between 51.1%-1.3% and 51.1%+1.3% (the two-sided confidence) is only 68%. So 68% of the results is normally between -sigma and +sigma, and 16% of the results on either side beyond sigma.

For 2-sigma (1.96-sigma, for the purists) the two-sided confidence is 95%, the one-sided confidence 97.5%. (I was a bit sloppy on this in my earlier remarks.)

The >=50% question asks for a one-sided confidence: you want to know how likely it is that the true score is between 50% and 100%. To know the exact cofidence, you would need a table of the 'error function'.
ernest
Posts: 2048
Joined: Wed Mar 08, 2006 8:30 pm

Re: Rule of thumb error formula

Post by ernest »

I think the rule-of-thumb Error = 40%/sqrt(numberOfGames) is accurate enough in practice
Nice approximation!
Exact value means replacing the 40% by 41% if the draw ratio is 1/3 and by 35% if the draw ratio is 1/2, so 40% is good enough, especially for an error formula :)

Just multiply by 7, and you have the error in Elo points...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Rule of thumb error formula

Post by bob »

ernest wrote:
I think the rule-of-thumb Error = 40%/sqrt(numberOfGames) is accurate enough in practice
Nice approximation!
Exact value means replacing the 40% by 41% if the draw ratio is 1/3 and by 35% if the draw ratio is 1/2, so 40% is good enough, especially for an error formula :)

Just multiply by 7, and you have the error in Elo points...
For a reference, I am seeing drawing rates just under 30% in my longer-game cluster testing. For very fast games it drops to 22-23%.
MattieShoes
Posts: 718
Joined: Fri Mar 20, 2009 8:59 pm

Re: Rule of thumb error formula

Post by MattieShoes »

Since we're assuming a normal curve, the 1-sigma would be ~1 standard deviation and 95% confidence interval is ~2 standard deviations, yes?

This stuff is so cool! :-)
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Rule of thumb error formula

Post by Laskos »

hgm wrote:Correct, the error applies to the score percentage.

Beware with the confidence level; the 84% that I quoted for 1-sigma error bars was actually the one-sided confidence. So there is an 84% likelihood that the actual score percentage of the engine (i.e. th one you would get after infinitely many games ) is between 51.1%-1.3% and 100% (or between 0% and 51.1%+1.3%). The confidence you can have that the true score percentage will be between 51.1%-1.3% and 51.1%+1.3% (the two-sided confidence) is only 68%. So 68% of the results is normally between -sigma and +sigma, and 16% of the results on either side beyond sigma.

For 2-sigma (1.96-sigma, for the purists) the two-sided confidence is 95%, the one-sided confidence 97.5%. (I was a bit sloppy on this in my earlier remarks.)

The >=50% question asks for a one-sided confidence: you want to know how likely it is that the true score is between 50% and 100%. To know the exact cofidence, you would need a table of the 'error function'.
There is an additional subtlety if the score is very one sided, say 90:10. The 68% confidence intervals will be +X% -Y%, with X<Y, and 95% confidence intervals will deviate from +2X% -2Y%.

Kai
ernest
Posts: 2048
Joined: Wed Mar 08, 2006 8:30 pm

Re: Rule of thumb error formula

Post by ernest »

Bob wrote:I am seeing drawing rates just under 30% in my longer-game cluster testing
Here 42% is the exact value in the formula, instead of 40%
Bob wrote:For very fast games it drops to 22-23%.
Here 44% is the exact value in the formula, instead of 40%