Properties of unbalanced openings using Bayeselo model
Posted: Sat Aug 27, 2016 10:31 am
With the risk of boring people, I will post this, just to have it somewhere. With Bayeselo model, which defines ELO model, draw model, bias of the openings (unbalancedness) model:
f(Delta) = 1 / (1 + 10^(Delta/400)),
PW = f(-eloDelta - eloBias + eloDraw),
PB = f(eloDelta + eloBias + eloDraw),
PD = 1 - PW - PB,
and using 5-nomial variance for side-and-reversed pair of games on an opening position, one can derive the important properties of the t-value (in direct relation with p-value or LOS) defined as (w-l)/sqrt(Variance), where w, l are win, respectively loss ratios in games between 2 closely matched engines (eloDelta assumed small, and the limit is taken at the end eloDelta -> 0). 5-nomial variance is computed "naively", without taking into account impossible to account for in an ELO model correlations between side-and-reversed openings. The results are:
1/ For eloDraw above 222.4 ELO points, unbalanced openings of the order eloBias ~ eloDraw show a better t-value (sensitivity or resolution) for the same number of games than the balanced ones.
2/ The optimum of eloBias for the sensitivity converges towards eloDraw for large values of eloDraw.
3/ With increasing eloDraw, unbalanced case shows a convergence of the sensitivity to a constant, while balanced case shows a drastic decrease of sensitivity with larger and larger eloDraw. The plot is here:
For unbalanced case we have asymptotically:
t-value = sqrt(Number of Games)*(Elo Difference)*log(10)/(400*sqrt(3))
or
Number of Games = 3*[400*(t-value)/{(Elo Difference)*(log(10))}]^2
It is independent of eloDraw. For example, for t-value = 2 (or LOS of 97.5%) and Elo Difference of 3 Bayeselo points, number of expected to be necessary games is ~ 40,000.
For balanced case, the sensitivity or t-value decreases with increased eloDraw, to the point that one cannot measure a 3 Bayeselo points difference with balanced openings for eloDraw say above 1000. This will be relevant in the future, when draw rates will increase. I placed where SF testing and TCEC final stand as eloDraw. It is apparent that already the SF testing can be shortened by a factor of 2, while the TCEC match can be shortened by a factor of 4 for the same resolution (t-value or sensitivity) by using unbalanced openings.
4/ The draw rate for balanced case with increasing eloDraw tends to 100%. For unbalanced case and eloBias=eloDraw = large, the draw rate tends to 50%.
5/ Including correlations between the openings pairwise to 5-nomial variance only seems to accentuate the effect, and unbalanced are slightly even more favored for large eloDraw.
6/ Would be useful if someone checks the results, it was done maybe too quickly .
f(Delta) = 1 / (1 + 10^(Delta/400)),
PW = f(-eloDelta - eloBias + eloDraw),
PB = f(eloDelta + eloBias + eloDraw),
PD = 1 - PW - PB,
and using 5-nomial variance for side-and-reversed pair of games on an opening position, one can derive the important properties of the t-value (in direct relation with p-value or LOS) defined as (w-l)/sqrt(Variance), where w, l are win, respectively loss ratios in games between 2 closely matched engines (eloDelta assumed small, and the limit is taken at the end eloDelta -> 0). 5-nomial variance is computed "naively", without taking into account impossible to account for in an ELO model correlations between side-and-reversed openings. The results are:
1/ For eloDraw above 222.4 ELO points, unbalanced openings of the order eloBias ~ eloDraw show a better t-value (sensitivity or resolution) for the same number of games than the balanced ones.
2/ The optimum of eloBias for the sensitivity converges towards eloDraw for large values of eloDraw.
3/ With increasing eloDraw, unbalanced case shows a convergence of the sensitivity to a constant, while balanced case shows a drastic decrease of sensitivity with larger and larger eloDraw. The plot is here:
For unbalanced case we have asymptotically:
t-value = sqrt(Number of Games)*(Elo Difference)*log(10)/(400*sqrt(3))
or
Number of Games = 3*[400*(t-value)/{(Elo Difference)*(log(10))}]^2
It is independent of eloDraw. For example, for t-value = 2 (or LOS of 97.5%) and Elo Difference of 3 Bayeselo points, number of expected to be necessary games is ~ 40,000.
For balanced case, the sensitivity or t-value decreases with increased eloDraw, to the point that one cannot measure a 3 Bayeselo points difference with balanced openings for eloDraw say above 1000. This will be relevant in the future, when draw rates will increase. I placed where SF testing and TCEC final stand as eloDraw. It is apparent that already the SF testing can be shortened by a factor of 2, while the TCEC match can be shortened by a factor of 4 for the same resolution (t-value or sensitivity) by using unbalanced openings.
4/ The draw rate for balanced case with increasing eloDraw tends to 100%. For unbalanced case and eloBias=eloDraw = large, the draw rate tends to 50%.
5/ Including correlations between the openings pairwise to 5-nomial variance only seems to accentuate the effect, and unbalanced are slightly even more favored for large eloDraw.
6/ Would be useful if someone checks the results, it was done maybe too quickly .