Hello Alexandru:

brtzsnr wrote:I was working to add SPRT to my evaluation framework and noticed a strange difference between how LLR is computed in cutechess-cli and fishtest.

[...]

print SPRT({'wins': 716, 'losses': 591, 'draws': 2163}, 0, 0.05, 6, 0.05, 200)

fishtests prints LLR as 2.9948445563125237

while cutechess prints LLR as 4.373536

Which one is correct? Does cutechess's test allow one to run fewer tests?

I am late into the thread but I write my explanation now:

Code: Select all

```
Games: 3470
Wins: 716 (20.63 %).
Loses: 591 (17.03 %).
Draws: 2163 (62.33 %).
bayeselo: 20.5207
drawelo: 254.5410
```

Bayeselo and

drawelo are estimated from the sample of 3470 games in the following way:

Code: Select all

```
games = wins + draws + loses
W = wins/games
D = draws/games
L = loses/games
bayeselo = 200*log10{W*(1 - L)/[L*(1 - W)]}
drawelo = 200*log10[(1 - L)*(1 - W)/(L*W)]
```

And the conversion between logistic Elo and Bayeselo is (at least I use this one):

Code: Select all

```
x = 10^(drawelo/400)
K = 4x/(1 + x)²
Bayeselo = (logistic Elo)/K
```

Please correct me if there are typos.

I use alpha = 1/20 and beta = alpha in this case. Then, for SPRT(0, 6) (0 to 6 Bayeselo):

Code: Select all

```
LLR(wins): 20.0233
LLR(loses): -16.6351
LLR(draws): -0.3934
LLR: 2.9948
```

Of course LLR = LLR(wins) + LLR(loses) + LLR(draws). So Fishtest agrees with my numbers... but cutechess-cli has not say the last word yet. As Michel and Lucas suggested, the bounds of SPRT could be written in logistic Elo. Using the parameter K that I wrote above with

drawelo ~ 254.541:

Code: Select all

```
I keep all the digits of a Casio calculator but I only round up to 1e-4 when writting:
+716 -591 =2163 (3470 games).
bayeselo: 20.5207
drawelo: 254.5410
x ~ 4.3287
K ~ 0.6098
Bounds:
0 Elo = 0/K = 0 Bayeselo.
6 Elo = 6/K ~ 9.8395 Bayeselo.
```

I run my tool again. This time SPRT(0, 6) (0 to 6 logistic Elo) ~ SPRT(0, 9.8395) (0 to 9.8395 Bayeselo). Results:

Code: Select all

```
LLR(wins): 32.7669
LLR(loses): -27.3355
LLR(draws): -1.0579
LLR: 4.3735
```

Which agrees with cutechess-cli output.

Basically I agree with Michel and Lucas. I hope that my numerical check will be useful to you, Alexandru.

Summary:
·

**Fishtest:** SPRT(Bayeselo_0, Bayeselo_1).

·

**cutechess-cli:** SPRT(Elo_0, Elo_1).

Regards from Spain.

Ajedrecista.