There seems to be a puzzling deviation in the reported 3-nomial LLR's. E.g. for the last experiment I get
wdl: w: 231, d: 352, l: 187
LLR=1.02
whereas the reported LLR is 1.1.
The 5-nomial case is ok.
sprt tourney manager
Moderators: hgm, Rebel, chrisw
-
- Posts: 2273
- Joined: Mon Sep 29, 2008 1:50 am
Re: sprt tourney manager
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: sprt tourney manager
It is probable that it's my fault. This stop (where test in not yet accepted) was my ad-hoc stop, the test was still running (it stops when both 3- and 5-nomial reach desired LLR), but I wanted to see what 3-nomial LLR is when 5-nomial stops. 2 others were stopped regularly (tests were accepted), should be ok to fine details.Michel wrote:There seems to be a puzzling deviation in the reported 3-nomial LLR's. E.g. for the last experiment I get
wdl: w: 231, d: 352, l: 187
LLR=1.02
whereas the reported LLR is 1.1.
The 5-nomial case is ok.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: sprt tourney manager
By the way, I observed that the test sometimes doesn't stop at exactly the desired ln(19), but at a higher value, dozens of games later. I don't know if this is intended.Laskos wrote:It is probable that it's my fault. This stop (where test in not yet accepted) was my ad-hoc stop, the test was still running (it stops when both 3- and 5-nomial reach desired LLR), but I wanted to see what 3-nomial LLR is when 5-nomial stops. 2 others were stopped regularly (tests were accepted), should be ok to fine details.Michel wrote:There seems to be a puzzling deviation in the reported 3-nomial LLR's. E.g. for the last experiment I get
wdl: w: 231, d: 352, l: 187
LLR=1.02
whereas the reported LLR is 1.1.
The 5-nomial case is ok.
-
- Posts: 2273
- Joined: Mon Sep 29, 2008 1:50 am
Re: sprt tourney manager
Thanks for the explanation. I put a link for the script computing LLR's here.Laskos wrote:It is probable that it's my fault. This stop (where test in not yet accepted) was my ad-hoc stop, the test was still running (it stops when both 3- and 5-nomial reach desired LLR), but I wanted to see what 3-nomial LLR is when 5-nomial stops. 2 others were stopped regularly (tests were accepted), should be ok to fine details.Michel wrote:There seems to be a puzzling deviation in the reported 3-nomial LLR's. E.g. for the last experiment I get
wdl: w: 231, d: 352, l: 187
LLR=1.02
whereas the reported LLR is 1.1.
The 5-nomial case is ok.
http://hardy.uhasselt.be/Toga/computeLLR.py
Now that there is an implementation in the Amoebe tourney manager perhaps cutechess-cli can follow? The code is really trivial. Much easier than the current code in cutechess-cli.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 434
- Joined: Fri Dec 16, 2016 11:04 am
- Location: France
- Full name: Richard Delorme
Re: sprt tourney manager
Several possibilities:Laskos wrote:By the way, I observed that the test sometimes doesn't stop at exactly the desired ln(19), but at a higher value, dozens of games later. I don't know if this is intended.
- I requested to do at least 100 pairs of games before stopping.
- By default the tournament stops when both LLR from 3-nomial and LLR from 5-nomial distributions are both > ln(19). Thus, you may see the 5-nomial test succeeded before the 3-nomial test. It is possible to use only 3-nomial or 5-nomial test by using the argument '-V 3nomial or '-V 5nomial'.
- When concurrency is used, it will finish all games already started before stopping. So it is possible to continue playing a few more games than necessary and even that the stopping rule is no more verified in the last game.
Richard Delorme
-
- Posts: 434
- Joined: Fri Dec 16, 2016 11:04 am
- Location: France
- Full name: Richard Delorme
Re: sprt tourney manager
Michel wrote:Now that there is an implementation in ... /tourney.d
The sprt part is between line 175 & 300.
It is written in D language which is not very popular yet. Nevertheless, for people used to C-like language, I hope my code to be quite readable. (I like & abuse UTF-8 characters, so you need an UTF-8 compatible editor though).
Richard Delorme
-
- Posts: 434
- Joined: Fri Dec 16, 2016 11:04 am
- Location: France
- Full name: Richard Delorme
Re: sprt tourney manager
Thank you for reporting this. I probably misunderstood how the argument parser on the command line works, and I forgot to test this part thoroughly. I think the long option name --elo0 and --elo1 should work nonetheless. I will try to correct this in a future release.Laskos wrote:Excellent, with all the stats in the output, I love it and I saw no flaws. The setting of H0 and H1 doesn't work for me, I just use defaults, maybe I am doing something wrong, it gives:Code: Select all
std.conv.ConvException@C:\ldc\bin\..\import\std\conv.d(1876): Can't parse string: bool should be case-insensitive 'true' or 'false'
Richard Delorme
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: sprt tourney manager
Great, --elo0 and --elo1 work!abulmo2 wrote:Thank you for reporting this. I probably misunderstood how the argument parser on the command line works, and I forgot to test this part thoroughly. I think the long option name --elo0 and --elo1 should work nonetheless. I will try to correct this in a future release.Laskos wrote:Excellent, with all the stats in the output, I love it and I saw no flaws. The setting of H0 and H1 doesn't work for me, I just use defaults, maybe I am doing something wrong, it gives:Code: Select all
std.conv.ConvException@C:\ldc\bin\..\import\std\conv.d(1876): Can't parse string: bool should be case-insensitive 'true' or 'false'
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: sprt tourney manager
Thanks for the script! Yes, this Richard's implementation is the most straightforward use of SPRT for 3- and 5-nomials, maybe Cutechess-Cli will follow. I tried to compute 3-nomial expression for LLR in closed form without substitutions, I hoped it will simplify a bit, but the most I get is still very long, and 5-nomial would be even longer. Just for fun, LLR for 3-nomial:Michel wrote:Thanks for the explanation. I put a link for the script computing LLR's here.Laskos wrote:It is probable that it's my fault. This stop (where test in not yet accepted) was my ad-hoc stop, the test was still running (it stops when both 3- and 5-nomial reach desired LLR), but I wanted to see what 3-nomial LLR is when 5-nomial stops. 2 others were stopped regularly (tests were accepted), should be ok to fine details.Michel wrote:There seems to be a puzzling deviation in the reported 3-nomial LLR's. E.g. for the last experiment I get
wdl: w: 231, d: 352, l: 187
LLR=1.02
whereas the reported LLR is 1.1.
The 5-nomial case is ok.
http://hardy.uhasselt.be/Toga/computeLLR.py
Now that there is an implementation in the Amoebe tourney manager perhaps cutechess-cli can follow? The code is really trivial. Much easier than the current code in cutechess-cli.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: sprt tourney manager
It seems that I have cases when nothing of that happens, but the stop is late:abulmo2 wrote:Several possibilities:Laskos wrote:By the way, I observed that the test sometimes doesn't stop at exactly the desired ln(19), but at a higher value, dozens of games later. I don't know if this is intended.
- I requested to do at least 100 pairs of games before stopping.
- By default the tournament stops when both LLR from 3-nomial and LLR from 5-nomial distributions are both > ln(19). Thus, you may see the 5-nomial test succeeded before the 3-nomial test. It is possible to use only 3-nomial or 5-nomial test by using the argument '-V 3nomial or '-V 5nomial'.
- When concurrency is used, it will finish all games already started before stopping. So it is possible to continue playing a few more games than necessary and even that the stopping rule is no more verified in the last game.
Code: Select all
Stockfish 210117 64 BMI2 vs Stockfish 140916 64 BMI2
results: 428 games
wdl: w: 47, d: 356, l: 25
pair: 0: 0, 0.5: 15, 1: 163, 1.5: 35, 2: 1
Using variance of the pentanomial distribution of game pairs:
Elo: 17.9 [13.0, 22.8]
LOS: 99.89 %
LLR: 6.756 [-2.944, 2.944]
test accepted
Using variance of the trinomial distribution of single games:
Elo: 17.9 [12.2, 23.6]
LOS: 99.55 %
LLR: 4.933 [-2.944, 2.944]
test accepted