Rule of thumb error formula

zamar · Post by **zamar** » Wed Apr 08, 2009 7:15 pm

I think the rule-of-thumb Error = 40%/sqrt(numberOfGames) is accurate enough in practice, for scores in the 65%-35% range. (This is for the 1-sigma or 84% confidence level; for 95% confidence, double it.)

H.G. Muller posted this very important formula in one thread and I just want to make sure I got it right. So let's take just one example.

Match Result:
A - B: 460 - 440

Score percentage for A: 460 / 900 = 51.1%.

Error margin: 40% / sqrt(900) = 1.3%. Now where should I apply this error margin? Is it calculated directly for score percentage?

So the correct result is 51.1% +- 1.3% (with 84% confidence). Did I got this right?

Now if we are improving the engine through the self-play, the truly interesting question is
"With given match result 460-440 what is the confidence level that the correct score percentage is >=50%?". I know there can't be such an easy rule thumb formula here, but if someone has already figured out the more complicate one, please post it here

hgm · Post by **hgm** » Wed Apr 08, 2009 7:39 pm

Correct, the error applies to the score percentage.

Beware with the confidence level; the 84% that I quoted for 1-sigma error bars was actually the one-sided confidence. So there is an 84% likelihood that the actual score percentage of the engine (i.e. th one you would get after infinitely many games ) is between 51.1%-1.3% and 100% (or between 0% and 51.1%+1.3%). The confidence you can have that the true score percentage will be between 51.1%-1.3% and 51.1%+1.3% (the two-sided confidence) is only 68%. So 68% of the results is normally between -sigma and +sigma, and 16% of the results on either side beyond sigma.

For 2-sigma (1.96-sigma, for the purists) the two-sided confidence is 95%, the one-sided confidence 97.5%. (I was a bit sloppy on this in my earlier remarks.)

The >=50% question asks for a one-sided confidence: you want to know how likely it is that the true score is between 50% and 100%. To know the exact cofidence, you would need a table of the 'error function'.

ernest · Post by **ernest** » Wed Apr 08, 2009 10:43 pm

I think the rule-of-thumb Error = 40%/sqrt(numberOfGames) is accurate enough in practice

Nice approximation!
Exact value means replacing the 40% by 41% if the draw ratio is 1/3 and by 35% if the draw ratio is 1/2, so 40% is good enough, especially for an error formula

Just multiply by 7, and you have the error in Elo points...

bob · Post by **bob** » Wed Apr 08, 2009 11:01 pm

ernest wrote:
I think the rule-of-thumb Error = 40%/sqrt(numberOfGames) is accurate enough in practice
Nice approximation!
Exact value means replacing the 40% by 41% if the draw ratio is 1/3 and by 35% if the draw ratio is 1/2, so 40% is good enough, especially for an error formula

Just multiply by 7, and you have the error in Elo points...

For a reference, I am seeing drawing rates just under 30% in my longer-game cluster testing. For very fast games it drops to 22-23%.

MattieShoes · Post by **MattieShoes** » Thu Apr 09, 2009 5:14 am

Since we're assuming a normal curve, the 1-sigma would be ~1 standard deviation and 95% confidence interval is ~2 standard deviations, yes?

This stuff is so cool!

Laskos · Post by **Laskos** » Thu Apr 09, 2009 10:34 am

hgm wrote:Correct, the error applies to the score percentage.

Beware with the confidence level; the 84% that I quoted for 1-sigma error bars was actually the one-sided confidence. So there is an 84% likelihood that the actual score percentage of the engine (i.e. th one you would get after infinitely many games ) is between 51.1%-1.3% and 100% (or between 0% and 51.1%+1.3%). The confidence you can have that the true score percentage will be between 51.1%-1.3% and 51.1%+1.3% (the two-sided confidence) is only 68%. So 68% of the results is normally between -sigma and +sigma, and 16% of the results on either side beyond sigma.

For 2-sigma (1.96-sigma, for the purists) the two-sided confidence is 95%, the one-sided confidence 97.5%. (I was a bit sloppy on this in my earlier remarks.)

The >=50% question asks for a one-sided confidence: you want to know how likely it is that the true score is between 50% and 100%. To know the exact cofidence, you would need a table of the 'error function'.

There is an additional subtlety if the score is very one sided, say 90:10. The 68% confidence intervals will be +X% -Y%, with X<Y, and 95% confidence intervals will deviate from +2X% -2Y%.

Kai

ernest · Post by **ernest** » Thu Apr 09, 2009 2:59 pm

Bob wrote:I am seeing drawing rates just under 30% in my longer-game cluster testing

Here 42% is the exact value in the formula, instead of 40%

Bob wrote:For very fast games it drops to 22-23%.

Here 44% is the exact value in the formula, instead of 40%

Rule of thumb error formula

Rule of thumb error formula

Re: Rule of thumb error formula

Re: Rule of thumb error formula

Re: Rule of thumb error formula

Re: Rule of thumb error formula

Re: Rule of thumb error formula

Re: Rule of thumb error formula