Hello Alexandru:
brtzsnr wrote:I was working to add SPRT to my evaluation framework and noticed a strange difference between how LLR is computed in cutechess-cli and fishtest.
[...]
print SPRT({'wins': 716, 'losses': 591, 'draws': 2163}, 0, 0.05, 6, 0.05, 200)
fishtests prints LLR as 2.9948445563125237
while cutechess prints LLR as 4.373536
Which one is correct? Does cutechess's test allow one to run fewer tests?
I am late into the thread but I write my explanation now:
Code: Select all
Games: 3470
Wins: 716 (20.63 %).
Loses: 591 (17.03 %).
Draws: 2163 (62.33 %).
bayeselo: 20.5207
drawelo: 254.5410
Bayeselo and
drawelo are estimated from the sample of 3470 games in the following way:
Code: Select all
games = wins + draws + loses
W = wins/games
D = draws/games
L = loses/games
bayeselo = 200*log10{W*(1 - L)/[L*(1 - W)]}
drawelo = 200*log10[(1 - L)*(1 - W)/(L*W)]
And the conversion between logistic Elo and Bayeselo is (at least I use this one):
Code: Select all
x = 10^(drawelo/400)
K = 4x/(1 + x)²
Bayeselo = (logistic Elo)/K
Please correct me if there are typos.
I use alpha = 1/20 and beta = alpha in this case. Then, for SPRT(0, 6) (0 to 6 Bayeselo):
Code: Select all
LLR(wins): 20.0233
LLR(loses): -16.6351
LLR(draws): -0.3934
LLR: 2.9948
Of course LLR = LLR(wins) + LLR(loses) + LLR(draws). So Fishtest agrees with my numbers... but cutechess-cli has not say the last word yet. As Michel and Lucas suggested, the bounds of SPRT could be written in logistic Elo. Using the parameter K that I wrote above with
drawelo ~ 254.541:
Code: Select all
I keep all the digits of a Casio calculator but I only round up to 1e-4 when writting:
+716 -591 =2163 (3470 games).
bayeselo: 20.5207
drawelo: 254.5410
x ~ 4.3287
K ~ 0.6098
Bounds:
0 Elo = 0/K = 0 Bayeselo.
6 Elo = 6/K ~ 9.8395 Bayeselo.
I run my tool again. This time SPRT(0, 6) (0 to 6 logistic Elo) ~ SPRT(0, 9.8395) (0 to 9.8395 Bayeselo). Results:
Code: Select all
LLR(wins): 32.7669
LLR(loses): -27.3355
LLR(draws): -1.0579
LLR: 4.3735
Which agrees with cutechess-cli output.
Basically I agree with Michel and Lucas. I hope that my numerical check will be useful to you, Alexandru.
Summary:
·
Fishtest: SPRT(Bayeselo_0, Bayeselo_1).
·
cutechess-cli: SPRT(Elo_0, Elo_1).
Regards from Spain.
Ajedrecista.