Doesn't matter in the discussion, it's Fulvio's example.Sven Schüle wrote:How would it ever be possible to get +/- 10 after only 40 games?Laskos wrote:40 games played out of 4000 [...] The result in Cutechess-Cli is 6 +/- 10 ELO points.
A remark on rigid testing
Moderators: hgm, Rebel, chrisw
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: A remark on rigid testing
-
- Posts: 395
- Joined: Fri Aug 12, 2016 8:43 pm
-
- Posts: 395
- Joined: Fri Aug 12, 2016 8:43 pm
Re: A remark on rigid testing
Just for the records.Laskos wrote: the expected sigma (or 95% confidence intervals) in deviation from our current result (for 1 - F fraction of games) for a rigid test with fixed number of games is:
expected deviation (sigma') from our current result after the completion of the test = sigma * sqrt(F)
The sigma (square root of variance)
https://en.wikipedia.org/wiki/Variance
is a different thing from the confidence interval:
https://en.wikipedia.org/wiki/Confidence_interval
It's possible to assume that the variance will not change (the ratios of wins, losses and draws will not change) and the remaining games to be played will confirm the actual result.
In that case:
final_confidence ~= actual_confidence / sqrt(total_games / played_games)
In your example
10 / sqrt(4000 / 3600) = 10 / 1,054 = 9,49
https://wandbox.org/permlink/uvc1pY2fMIPaq3Ja
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: A remark on rigid testing
That is what Cutechess-Cli will show after 4000/4000 games, and is trivial. That's what I wrote for your case (40/4000 played) : "But it also means that the expected "true" (as shown in Cutechess-Cli) error margins after 4000 games are 10*sqrt(1-F) ~ 1 ELO point." What I was talking about is a different issue, error margins with a finite rigid number of games. Applying LOS and p-values as stop can be done if one is "disciplined".Fulvio wrote:Just for the records.Laskos wrote: the expected sigma (or 95% confidence intervals) in deviation from our current result (for 1 - F fraction of games) for a rigid test with fixed number of games is:
expected deviation (sigma') from our current result after the completion of the test = sigma * sqrt(F)
The sigma (square root of variance)
https://en.wikipedia.org/wiki/Variance
is a different thing from the confidence interval:
https://en.wikipedia.org/wiki/Confidence_interval
It's possible to assume that the variance will not change (the ratios of wins, losses and draws will not change) and the remaining games to be played will confirm the actual result.
In that case:
final_confidence ~= actual_confidence / sqrt(total_games / played_games)
In your example
10 / sqrt(4000 / 3600) = 10 / 1,054 = 9,49
https://wandbox.org/permlink/uvc1pY2fMIPaq3Ja
After 4000/4000 games these described by me error margins are 0 ELO points, the test is finished.