sprt tourney manager

Michel · Post by **Michel** » Tue Feb 07, 2017 8:00 am

There seems to be a puzzling deviation in the reported 3-nomial LLR's. E.g. for the last experiment I get

wdl: w: 231, d: 352, l: 187

LLR=1.02

whereas the reported LLR is 1.1.

The 5-nomial case is ok.

Laskos · Post by **Laskos** » Tue Feb 07, 2017 8:08 am

Michel wrote:There seems to be a puzzling deviation in the reported 3-nomial LLR's. E.g. for the last experiment I get

wdl: w: 231, d: 352, l: 187

LLR=1.02

whereas the reported LLR is 1.1.

The 5-nomial case is ok.

It is probable that it's my fault. This stop (where test in not yet accepted) was my ad-hoc stop, the test was still running (it stops when both 3- and 5-nomial reach desired LLR), but I wanted to see what 3-nomial LLR is when 5-nomial stops. 2 others were stopped regularly (tests were accepted), should be ok to fine details.

Laskos · Post by **Laskos** » Tue Feb 07, 2017 8:36 am

Laskos wrote:
Michel wrote:There seems to be a puzzling deviation in the reported 3-nomial LLR's. E.g. for the last experiment I get

wdl: w: 231, d: 352, l: 187

LLR=1.02

whereas the reported LLR is 1.1.

The 5-nomial case is ok.
It is probable that it's my fault. This stop (where test in not yet accepted) was my ad-hoc stop, the test was still running (it stops when both 3- and 5-nomial reach desired LLR), but I wanted to see what 3-nomial LLR is when 5-nomial stops. 2 others were stopped regularly (tests were accepted), should be ok to fine details.

By the way, I observed that the test sometimes doesn't stop at exactly the desired ln(19), but at a higher value, dozens of games later. I don't know if this is intended.

Michel · Post by **Michel** » Tue Feb 07, 2017 11:43 am

Laskos wrote:
Michel wrote:There seems to be a puzzling deviation in the reported 3-nomial LLR's. E.g. for the last experiment I get

wdl: w: 231, d: 352, l: 187

LLR=1.02

whereas the reported LLR is 1.1.

The 5-nomial case is ok.
It is probable that it's my fault. This stop (where test in not yet accepted) was my ad-hoc stop, the test was still running (it stops when both 3- and 5-nomial reach desired LLR), but I wanted to see what 3-nomial LLR is when 5-nomial stops. 2 others were stopped regularly (tests were accepted), should be ok to fine details.

Thanks for the explanation. I put a link for the script computing LLR's here.

http://hardy.uhasselt.be/Toga/computeLLR.py

Now that there is an implementation in the Amoebe tourney manager perhaps cutechess-cli can follow? The code is really trivial. Much easier than the current code in cutechess-cli.

abulmo2 · Post by **abulmo2** » Tue Feb 07, 2017 12:29 pm

Laskos wrote:By the way, I observed that the test sometimes doesn't stop at exactly the desired ln(19), but at a higher value, dozens of games later. I don't know if this is intended.

Several possibilities:

- I requested to do at least 100 pairs of games before stopping.

- By default the tournament stops when both LLR from 3-nomial and LLR from 5-nomial distributions are both > ln(19). Thus, you may see the 5-nomial test succeeded before the 3-nomial test. It is possible to use only 3-nomial or 5-nomial test by using the argument '-V 3nomial or '-V 5nomial'.

- When concurrency is used, it will finish all games already started before stopping. So it is possible to continue playing a few more games than necessary and even that the stopping rule is no more verified in the last game.

abulmo2 · Post by **abulmo2** » Tue Feb 07, 2017 12:43 pm

[quote="Michel"]Now that there is an implementation in ... /tourney.d
The sprt part is between line 175 & 300.

It is written in D language which is not very popular yet. Nevertheless, for people used to C-like language, I hope my code to be quite readable. (I like & abuse UTF-8 characters, so you need an UTF-8 compatible editor though).

abulmo2 · Post by **abulmo2** » Tue Feb 07, 2017 1:11 pm

Laskos wrote:Excellent, with all the stats in the output, I love it and I saw no flaws. The setting of H0 and H1 doesn't work for me, I just use defaults, maybe I am doing something wrong, it gives:
Code: Select all
std.conv.ConvException@C:\ldc\bin\..\import\std\conv.d(1876): Can't parse string: bool should be case-insensitive 'true' or 'false'

Thank you for reporting this. I probably misunderstood how the argument parser on the command line works, and I forgot to test this part thoroughly. I think the long option name --elo0 and --elo1 should work nonetheless. I will try to correct this in a future release.

Laskos · Post by **Laskos** » Tue Feb 07, 2017 2:40 pm

abulmo2 wrote:
Laskos wrote:Excellent, with all the stats in the output, I love it and I saw no flaws. The setting of H0 and H1 doesn't work for me, I just use defaults, maybe I am doing something wrong, it gives:
Code: Select all
std.conv.ConvException@C:\ldc\bin\..\import\std\conv.d(1876): Can't parse string: bool should be case-insensitive 'true' or 'false'
Thank you for reporting this. I probably misunderstood how the argument parser on the command line works, and I forgot to test this part thoroughly. I think the long option name --elo0 and --elo1 should work nonetheless. I will try to correct this in a future release.

Great, --elo0 and --elo1 work!

Laskos · Post by **Laskos** » Tue Feb 07, 2017 2:47 pm

Michel wrote:
Laskos wrote:
Michel wrote:There seems to be a puzzling deviation in the reported 3-nomial LLR's. E.g. for the last experiment I get

wdl: w: 231, d: 352, l: 187

LLR=1.02

whereas the reported LLR is 1.1.

The 5-nomial case is ok.
It is probable that it's my fault. This stop (where test in not yet accepted) was my ad-hoc stop, the test was still running (it stops when both 3- and 5-nomial reach desired LLR), but I wanted to see what 3-nomial LLR is when 5-nomial stops. 2 others were stopped regularly (tests were accepted), should be ok to fine details.
Thanks for the explanation. I put a link for the script computing LLR's here.

http://hardy.uhasselt.be/Toga/computeLLR.py

Now that there is an implementation in the Amoebe tourney manager perhaps cutechess-cli can follow? The code is really trivial. Much easier than the current code in cutechess-cli.

Thanks for the script! Yes, this Richard's implementation is the most straightforward use of SPRT for 3- and 5-nomials, maybe Cutechess-Cli will follow. I tried to compute 3-nomial expression for LLR in closed form without substitutions, I hoped it will simplify a bit, but the most I get is still very long, and 5-nomial would be even longer. Just for fun, LLR for 3-nomial:

Laskos · Post by **Laskos** » Tue Feb 07, 2017 5:50 pm

abulmo2 wrote:
Laskos wrote:By the way, I observed that the test sometimes doesn't stop at exactly the desired ln(19), but at a higher value, dozens of games later. I don't know if this is intended.
Several possibilities:

- I requested to do at least 100 pairs of games before stopping.

- By default the tournament stops when both LLR from 3-nomial and LLR from 5-nomial distributions are both > ln(19). Thus, you may see the 5-nomial test succeeded before the 3-nomial test. It is possible to use only 3-nomial or 5-nomial test by using the argument '-V 3nomial or '-V 5nomial'.

- When concurrency is used, it will finish all games already started before stopping. So it is possible to continue playing a few more games than necessary and even that the stopping rule is no more verified in the last game.

It seems that I have cases when nothing of that happens, but the stop is late:

Code: Select all

Stockfish 210117 64 BMI2 vs Stockfish 140916 64 BMI2
results: 428 games
wdl:    w: 47, d: 356, l: 25
pair:   0: 0, 0.5: 15, 1: 163, 1.5: 35, 2: 1
Using variance of the pentanomial distribution of game pairs:
Elo: 17.9 [13.0, 22.8]
LOS: 99.89 %
LLR: 6.756 [-2.944, 2.944]
test accepted

Using variance of the trinomial distribution of single games:
Elo: 17.9 [12.2, 23.6]
LOS: 99.55 %
LLR: 4.933 [-2.944, 2.944]
test accepted

sprt tourney manager

Re: sprt tourney manager

Re: sprt tourney manager

Re: sprt tourney manager

Re: sprt tourney manager

Re: sprt tourney manager

Re: sprt tourney manager

Re: sprt tourney manager

Re: sprt tourney manager

Re: sprt tourney manager

Re: sprt tourney manager