A remark on rigid testing

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: A remark on rigid testing

Post by Laskos »

Sven Schüle wrote:
Laskos wrote:40 games played out of 4000 [...] The result in Cutechess-Cli is 6 +/- 10 ELO points.
How would it ever be possible to get +/- 10 after only 40 games?
Doesn't matter in the discussion, it's Fulvio's example.
Fulvio
Posts: 395
Joined: Fri Aug 12, 2016 8:43 pm

Re: A remark on rigid testing

Post by Fulvio »

20 wins and 20 losses gives +-111.159

https://wandbox.org/permlink/vzU1VIIpkpKPXxB3
Fulvio
Posts: 395
Joined: Fri Aug 12, 2016 8:43 pm

Re: A remark on rigid testing

Post by Fulvio »

Laskos wrote: the expected sigma (or 95% confidence intervals) in deviation from our current result (for 1 - F fraction of games) for a rigid test with fixed number of games is:

expected deviation (sigma') from our current result after the completion of the test = sigma * sqrt(F)
Just for the records.

The sigma (square root of variance)
https://en.wikipedia.org/wiki/Variance

is a different thing from the confidence interval:
https://en.wikipedia.org/wiki/Confidence_interval

It's possible to assume that the variance will not change (the ratios of wins, losses and draws will not change) and the remaining games to be played will confirm the actual result.
In that case:
final_confidence ~= actual_confidence / sqrt(total_games / played_games)

In your example
10 / sqrt(4000 / 3600) = 10 / 1,054 = 9,49


https://wandbox.org/permlink/uvc1pY2fMIPaq3Ja
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: A remark on rigid testing

Post by Laskos »

Fulvio wrote:
Laskos wrote: the expected sigma (or 95% confidence intervals) in deviation from our current result (for 1 - F fraction of games) for a rigid test with fixed number of games is:

expected deviation (sigma') from our current result after the completion of the test = sigma * sqrt(F)
Just for the records.

The sigma (square root of variance)
https://en.wikipedia.org/wiki/Variance

is a different thing from the confidence interval:
https://en.wikipedia.org/wiki/Confidence_interval

It's possible to assume that the variance will not change (the ratios of wins, losses and draws will not change) and the remaining games to be played will confirm the actual result.
In that case:
final_confidence ~= actual_confidence / sqrt(total_games / played_games)

In your example
10 / sqrt(4000 / 3600) = 10 / 1,054 = 9,49


https://wandbox.org/permlink/uvc1pY2fMIPaq3Ja
That is what Cutechess-Cli will show after 4000/4000 games, and is trivial. That's what I wrote for your case (40/4000 played) : "But it also means that the expected "true" (as shown in Cutechess-Cli) error margins after 4000 games are 10*sqrt(1-F) ~ 1 ELO point." What I was talking about is a different issue, error margins with a finite rigid number of games. Applying LOS and p-values as stop can be done if one is "disciplined".

After 4000/4000 games these described by me error margins are 0 ELO points, the test is finished.