That is 95% confidence from bayeselo default.Laskos wrote:Thanks, so it seems than there is a real ELO benefit from egbb in late middlegames. Your error margins are 1SD, right? 15 ELO points are a lot, even if it reduces with more games to 5 ELO points, it's highly significant. Would be even better if you wait until SPRT stop, to have clear uncertainties.Ferdy wrote:I start testing the latest released Deuterium with egbb at tc 40moves/60sec, with different positions and color reversed.Yes, it seems a Houdini problem. Is the ELO benefit from egbb measurable in Deuterium?
(1) From positions with mixed pieces on late middle game but more than 5-men. Target games is 1600
Bayeselo:SPRT:Code: Select all
Rank Name Elo Diff + - Games Score Oppo. Draws Win W-L-D 1 Deuterium-113-egbb 7.85 0.00 7.06 7.06 1091 52.25% -7.85 57.93% 23.28% 254-205-632 2 Deuterium-v13.1.31.113-64bit -7.85 -15.69 7.06 7.06 1091 47.75% 7.85 57.93% 18.79% 205-254-632
Next will be the following, positions generated from Ed's protools.Code: Select all
Engine: Deuterium-113-egbb SPRT: elo0 = -1.5, elo1 = +4.5, a = +0.05, b = +0.05 LLR = +1.25278 (-2.94444, +2.94444) T = +1091, W = +254, L = +205, D = +632, WNet = +49
(2) Rook and pawn ending but more than 5-men. Target games is 200
(3) Pawn ending but more than 5-men. Target games is 200
(4) Queen and pawn ending but more than 5-men. Target games is 200
(5) Bishop and knight and pawn ending but more than 5-men, Target games is 200
Update:
Code: Select all
SPRT: elo0 = -1.5, elo1 = +4.5, a = +0.05, b = +0.05
LLR = +1.42909 (-2.94444, +2.94444)
T = +1448, W = +305, L = +250, D = +893, WNet = +55