A remark on rigid testing

Laskos · Post by **Laskos** » Sun Jul 02, 2017 9:22 pm

Sven Schüle wrote:
Laskos wrote:40 games played out of 4000 [...] The result in Cutechess-Cli is 6 +/- 10 ELO points.
How would it ever be possible to get +/- 10 after only 40 games?

Doesn't matter in the discussion, it's Fulvio's example.

Fulvio · Post by **Fulvio** » Sun Jul 02, 2017 9:57 pm

20 wins and 20 losses gives +-111.159

https://wandbox.org/permlink/vzU1VIIpkpKPXxB3

Fulvio · Post by **Fulvio** » Mon Jul 03, 2017 2:31 pm

Laskos wrote: the expected sigma (or 95% confidence intervals) in deviation from our current result (for 1 - F fraction of games) for a rigid test with fixed number of games is:

expected deviation (sigma') from our current result after the completion of the test = sigma * sqrt(F)

Just for the records.

The sigma (square root of variance)
https://en.wikipedia.org/wiki/Variance

is a different thing from the confidence interval:
https://en.wikipedia.org/wiki/Confidence_interval

It's possible to assume that the variance will not change (the ratios of wins, losses and draws will not change) and the remaining games to be played will confirm the actual result.
In that case:
final_confidence ~= actual_confidence / sqrt(total_games / played_games)

In your example
10 / sqrt(4000 / 3600) = 10 / 1,054 = 9,49

https://wandbox.org/permlink/uvc1pY2fMIPaq3Ja

Laskos · Post by **Laskos** » Mon Jul 03, 2017 5:02 pm

Fulvio wrote:
Laskos wrote: the expected sigma (or 95% confidence intervals) in deviation from our current result (for 1 - F fraction of games) for a rigid test with fixed number of games is:

expected deviation (sigma') from our current result after the completion of the test = sigma * sqrt(F)
Just for the records.

The sigma (square root of variance)
https://en.wikipedia.org/wiki/Variance

is a different thing from the confidence interval:
https://en.wikipedia.org/wiki/Confidence_interval

It's possible to assume that the variance will not change (the ratios of wins, losses and draws will not change) and the remaining games to be played will confirm the actual result.
In that case:
final_confidence ~= actual_confidence / sqrt(total_games / played_games)

In your example
10 / sqrt(4000 / 3600) = 10 / 1,054 = 9,49

https://wandbox.org/permlink/uvc1pY2fMIPaq3Ja

That is what Cutechess-Cli will show after 4000/4000 games, and is trivial. That's what I wrote for your case (40/4000 played) : "But it also means that the expected "true" (as shown in Cutechess-Cli) error margins after 4000 games are 10*sqrt(1-F) ~ 1 ELO point." What I was talking about is a different issue, error margins with a finite rigid number of games. Applying LOS and p-values as stop can be done if one is "disciplined".

After 4000/4000 games these described by me error margins are 0 ELO points, the test is finished.

A remark on rigid testing

Re: A remark on rigid testing

Re: A remark on rigid testing

Re: A remark on rigid testing

Re: A remark on rigid testing