A remark on rigid testing

Laskos · Post by **Laskos** » Sun Jul 02, 2017 3:34 pm

Testing this weekend opening suites (trying to optimize them) for say finite rigid number of 4000 games, I am often bothered by the fact that say at game 3500 errors are large, and although the result seems bad, I have to finish the whole 4000 games test as though I will play infinite number of games. But I will stop anyway at 4000 games. So, the gut feeling is that the result will not change by much anyway in the remaining 500 games, by much smaller amount than the error margins shown in Cutechess-Cli.

Let's call F the fraction of the games still to be played out of total planned. We have provisional results in ELO and sigma (95% confidence intervals in the case of Cutechess-Cli). Then the expected sigma (or 95% confidence intervals) in deviation from our current result (for 1 - F fraction of games) for a rigid test with fixed number of games is:

expected deviation (sigma') from our current result after the completion of the test = sigma * sqrt(F)

So, if Cutechess-Cli shows after 3600/4000 games 10 ELO points 95% confidence intervals, we have this 95% confidence interval for final result after 4000/4000 games to be within 10*sqrt(0.1) ~ 3.16 ELO points of the current result (after 3600 games). This does not replace the "true" sigma (infinite number of games and "true" value of difference), but at least I have a simple expression for abandoning tests with rigid fixed number of games having current "bad results". Maybe a useful number to show in Cutechess-Cli, for those who use it this, a bit artistic way (often playing with confidences intervals and LOS for stopping).

Fulvio · Post by **Fulvio** » Sun Jul 02, 2017 4:48 pm

Interesting, even if i didn't understand the math

Can you pls explain the formula used by Cutechess to calculate the sigma?

Laskos · Post by **Laskos** » Sun Jul 02, 2017 5:29 pm

Fulvio wrote:Interesting, even if i didn't understand the math

Can you pls explain the formula used by Cutechess to calculate the sigma?

I doubt ELO calculation of error is very illuminating. What it happened to me, and can be used as the rule of thumb (if you can compute square root):

In Cutechess-Cli I had -6 +/-10 ELO points after 3600 games out of planned 4000. The fraction of unplayed yet games is (4000-3600)/4000 = 0.1 . So, although the result shows a possible 4 ELO points improvement, this test in all probability won't finish like that as central value goes. The test will finish in -6 +/- 10*sqrt(0.1) or 6 +/- 3 ELO points central value (the same confidence as before, 95%). As I wanted a clear improvement, and not a worsening or pretty equal result, this test failed for me after 3600 games and I abandoned it, because I will anyway not play more than 4000 games.

Fulvio · Post by **Fulvio** » Sun Jul 02, 2017 6:11 pm

Laskos wrote:I doubt ELO calculation of error is very illuminating.

I asked because if it's a function of win/draw/loss:
sigma = f(n_win, n_draw, n_loss)

in that case the final sigma is:
sigma_final = f(n_win + dx_win, n_draw + dx_draw, n_loss + dx_loss)

where:
dx_win + dx_draw + dx_loss = games_to_be_played

With that it should possible to calculate the boundaries of the final result and the ratio sigma/sigma_final_maxmin as a margin of error from the current sigma.

Fulvio · Post by **Fulvio** » Sun Jul 02, 2017 6:50 pm

Laskos wrote:What it happened to me, and can be used as the rule of thumb (if you can compute square root)

If you played only 40 games you get (4000-40)/4000 = 0.99
If you assume that the test will finish in -6 +/- 10*sqrt(0.99) there is no point in playing the games at all

Laskos · Post by **Laskos** » Sun Jul 02, 2017 7:25 pm

Fulvio wrote:
Laskos wrote:What it happened to me, and can be used as the rule of thumb (if you can compute square root)
If you played only 40 games you get (4000-40)/4000 = 0.99
If you assume that the test will finish in -6 +/- 10*sqrt(0.99) there is no point in playing the games at all

If your current result result is 6 +/- 10 ELO points after 40/4000 games, this means that the error margins in this test are very close to asymptotic values ("true" sigma), and central value will be about 6 +/- 9.95 ELO points with 95% confidence (instead of shown 6 +/- 10), which is playable.

Fulvio · Post by **Fulvio** » Sun Jul 02, 2017 8:02 pm

I probably misunderstood the meaning of the calculated confidence interval.
I've found the Cutechess math:
https://github.com/cutechess/cutechess/ ... rc/elo.cpp

and I'll try to figure how what it really means.

Laskos · Post by **Laskos** » Sun Jul 02, 2017 8:14 pm

Fulvio wrote:I probably misunderstood the meaning of the calculated confidence interval.
I've found the Cutechess math:
https://github.com/cutechess/cutechess/ ... rc/elo.cpp

and I'll try to figure how what it really means.

I don't think this math is needed, but it is instructive anyway. I maybe expressed myself clumsily in my first post, but let's take your example:

40 games played out of 4000 (unplayed fraction F=0.99). The result in Cutechess-Cli is 6 +/- 10 ELO points. That means that the deviation from central value of 4000 games test is 10*sqrt(0.99) ~ 9.95 ELO points (the same confidence interval of 95%), very close to shown 10 ELO points. But it also means that the expected "true" (as shown in Cutechess-Cli) error margins after 4000 games are 10*sqrt(1-F) ~ 1 ELO point.

Sven · Post by **Sven** » Sun Jul 02, 2017 9:15 pm

Laskos wrote:40 games played out of 4000 [...] The result in Cutechess-Cli is 6 +/- 10 ELO points.

How would it ever be possible to get +/- 10 after only 40 games?

Fulvio · Post by **Fulvio** » Sun Jul 02, 2017 9:19 pm

The code implement this:
http://onlinestatbook.com/2/estimation/mean.html

tot_point:= 1 point for every win + 0.5 points for every draw and 0 points for every loss
mean_value(m_mu):= (tot_points)/(number_of_games)
variance:= devW + devL + devD;
standard_error_of_the_mean(m_stdev):= sqrt(variance) / sqrt(number_of_games)
phiInv(0.025):= -1.95716
phiInv(0.975):= +1.95716
confidence_interval = mean_value +/- 1.95716 * standard_error_of_the_mean

Now if I understood correctly the math, we can assume 3 interesting cases and calculate the corrispondent final confidence_interval:
we lose all the remaining games -> lower bound of the final confidence_interval
the mean_value do not change -> estimated final confidence_interval
we win all the remaining games -> upper bound of the final confidence_interval

A remark on rigid testing

A remark on rigid testing

Re: A remark on rigid testing

Re: A remark on rigid testing

Re: A remark on rigid testing

Re: A remark on rigid testing

Re: A remark on rigid testing

Re: A remark on rigid testing

Re: A remark on rigid testing

Re: A remark on rigid testing

Re: A remark on rigid testing