Testing this weekend opening suites (trying to optimize them) for say finite rigid number of 4000 games, I am often bothered by the fact that say at game 3500 errors are large, and although the result seems bad, I have to finish the whole 4000 games test as though I will play infinite number of games. But I will stop anyway at 4000 games. So, the gut feeling is that the result will not change by much anyway in the remaining 500 games, by much smaller amount than the error margins shown in Cutechess-Cli.
Let's call F the fraction of the games still to be played out of total planned. We have provisional results in ELO and sigma (95% confidence intervals in the case of Cutechess-Cli). Then the expected sigma (or 95% confidence intervals) in deviation from our current result (for 1 - F fraction of games) for a rigid test with fixed number of games is:
expected deviation (sigma') from our current result after the completion of the test = sigma * sqrt(F)
So, if Cutechess-Cli shows after 3600/4000 games 10 ELO points 95% confidence intervals, we have this 95% confidence interval for final result after 4000/4000 games to be within 10*sqrt(0.1) ~ 3.16 ELO points of the current result (after 3600 games). This does not replace the "true" sigma (infinite number of games and "true" value of difference), but at least I have a simple expression for abandoning tests with rigid fixed number of games having current "bad results". Maybe a useful number to show in Cutechess-Cli, for those who use it this, a bit artistic way (often playing with confidences intervals and LOS for stopping).
A remark on rigid testing
Moderators: hgm, Rebel, chrisw
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
-
- Posts: 395
- Joined: Fri Aug 12, 2016 8:43 pm
Re: A remark on rigid testing
Interesting, even if i didn't understand the math
Can you pls explain the formula used by Cutechess to calculate the sigma?
Can you pls explain the formula used by Cutechess to calculate the sigma?
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: A remark on rigid testing
I doubt ELO calculation of error is very illuminating. What it happened to me, and can be used as the rule of thumb (if you can compute square root):Fulvio wrote:Interesting, even if i didn't understand the math
Can you pls explain the formula used by Cutechess to calculate the sigma?
In Cutechess-Cli I had -6 +/-10 ELO points after 3600 games out of planned 4000. The fraction of unplayed yet games is (4000-3600)/4000 = 0.1 . So, although the result shows a possible 4 ELO points improvement, this test in all probability won't finish like that as central value goes. The test will finish in -6 +/- 10*sqrt(0.1) or 6 +/- 3 ELO points central value (the same confidence as before, 95%). As I wanted a clear improvement, and not a worsening or pretty equal result, this test failed for me after 3600 games and I abandoned it, because I will anyway not play more than 4000 games.
-
- Posts: 395
- Joined: Fri Aug 12, 2016 8:43 pm
Re: A remark on rigid testing
I asked because if it's a function of win/draw/loss:Laskos wrote:I doubt ELO calculation of error is very illuminating.
sigma = f(n_win, n_draw, n_loss)
in that case the final sigma is:
sigma_final = f(n_win + dx_win, n_draw + dx_draw, n_loss + dx_loss)
where:
dx_win + dx_draw + dx_loss = games_to_be_played
With that it should possible to calculate the boundaries of the final result and the ratio sigma/sigma_final_maxmin as a margin of error from the current sigma.
-
- Posts: 395
- Joined: Fri Aug 12, 2016 8:43 pm
Re: A remark on rigid testing
If you played only 40 games you get (4000-40)/4000 = 0.99Laskos wrote:What it happened to me, and can be used as the rule of thumb (if you can compute square root)
If you assume that the test will finish in -6 +/- 10*sqrt(0.99) there is no point in playing the games at all
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: A remark on rigid testing
If your current result result is 6 +/- 10 ELO points after 40/4000 games, this means that the error margins in this test are very close to asymptotic values ("true" sigma), and central value will be about 6 +/- 9.95 ELO points with 95% confidence (instead of shown 6 +/- 10), which is playable.Fulvio wrote:If you played only 40 games you get (4000-40)/4000 = 0.99Laskos wrote:What it happened to me, and can be used as the rule of thumb (if you can compute square root)
If you assume that the test will finish in -6 +/- 10*sqrt(0.99) there is no point in playing the games at all
-
- Posts: 395
- Joined: Fri Aug 12, 2016 8:43 pm
Re: A remark on rigid testing
I probably misunderstood the meaning of the calculated confidence interval.
I've found the Cutechess math:
https://github.com/cutechess/cutechess/ ... rc/elo.cpp
and I'll try to figure how what it really means.
I've found the Cutechess math:
https://github.com/cutechess/cutechess/ ... rc/elo.cpp
and I'll try to figure how what it really means.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: A remark on rigid testing
I don't think this math is needed, but it is instructive anyway. I maybe expressed myself clumsily in my first post, but let's take your example:Fulvio wrote:I probably misunderstood the meaning of the calculated confidence interval.
I've found the Cutechess math:
https://github.com/cutechess/cutechess/ ... rc/elo.cpp
and I'll try to figure how what it really means.
40 games played out of 4000 (unplayed fraction F=0.99). The result in Cutechess-Cli is 6 +/- 10 ELO points. That means that the deviation from central value of 4000 games test is 10*sqrt(0.99) ~ 9.95 ELO points (the same confidence interval of 95%), very close to shown 10 ELO points. But it also means that the expected "true" (as shown in Cutechess-Cli) error margins after 4000 games are 10*sqrt(1-F) ~ 1 ELO point.
-
- Posts: 4052
- Joined: Thu May 15, 2008 9:57 pm
- Location: Berlin, Germany
- Full name: Sven Schüle
Re: A remark on rigid testing
How would it ever be possible to get +/- 10 after only 40 games?Laskos wrote:40 games played out of 4000 [...] The result in Cutechess-Cli is 6 +/- 10 ELO points.
-
- Posts: 395
- Joined: Fri Aug 12, 2016 8:43 pm
Re: A remark on rigid testing
The code implement this:
http://onlinestatbook.com/2/estimation/mean.html
tot_point:= 1 point for every win + 0.5 points for every draw and 0 points for every loss
mean_value(m_mu):= (tot_points)/(number_of_games)
variance:= devW + devL + devD;
standard_error_of_the_mean(m_stdev):= sqrt(variance) / sqrt(number_of_games)
phiInv(0.025):= -1.95716
phiInv(0.975):= +1.95716
confidence_interval = mean_value +/- 1.95716 * standard_error_of_the_mean
Now if I understood correctly the math, we can assume 3 interesting cases and calculate the corrispondent final confidence_interval:
we lose all the remaining games -> lower bound of the final confidence_interval
the mean_value do not change -> estimated final confidence_interval
we win all the remaining games -> upper bound of the final confidence_interval
http://onlinestatbook.com/2/estimation/mean.html
tot_point:= 1 point for every win + 0.5 points for every draw and 0 points for every loss
mean_value(m_mu):= (tot_points)/(number_of_games)
variance:= devW + devL + devD;
standard_error_of_the_mean(m_stdev):= sqrt(variance) / sqrt(number_of_games)
phiInv(0.025):= -1.95716
phiInv(0.975):= +1.95716
confidence_interval = mean_value +/- 1.95716 * standard_error_of_the_mean
Now if I understood correctly the math, we can assume 3 interesting cases and calculate the corrispondent final confidence_interval:
we lose all the remaining games -> lower bound of the final confidence_interval
the mean_value do not change -> estimated final confidence_interval
we win all the remaining games -> upper bound of the final confidence_interval