sprt tourney manager

abulmo2 · Post by **abulmo2** » Wed Jan 25, 2017 12:23 am

Hello,

With the latest release of amoeba (version 2.1), I published the code of my own tourney manager, that I used to validate (or reject) amoeba changes.
It uses the SPRT approach with the remarks discussed here:
http://talkchess.com/forum/viewtopic.php?t=57465
and here:
http://www.talkchess.com/forum/viewtopic.php?t=61105

It is still very basic, though, as only a fixed time per move is allowed to set game duration but has got some nice features:
- written in D language (simple & efficient)
- can use various openings (from pgn files)
- can play several games in parallel
- can set H0 & H1 hypothesis for SPRT
- can limit the test for a fixed number of games (SPRT does not always converge).
- can save the played game to a pgn file.
- works with unbalanced openings.
- no bayeselo assumptions.

Laskos · Post by **Laskos** » Wed Jan 25, 2017 10:35 am

abulmo2 wrote:Hello,

With the latest release of amoeba (version 2.1), I published the code of my own tourney manager, that I used to validate (or reject) amoeba changes.
It uses the SPRT approach with the remarks discussed here:
http://talkchess.com/forum/viewtopic.php?t=57465
and here:
http://www.talkchess.com/forum/viewtopic.php?t=61105

It is still very basic, though, as only a fixed time per move is allowed to set game duration but has got some nice features:
- written in D language (simple & efficient)
- can use various openings (from pgn files)
- can play several games in parallel
- can set H0 & H1 hypothesis for SPRT
- can limit the test for a fixed number of games (SPRT does not always converge).
- can save the played game to a pgn file.
- works with unbalanced openings.
- no bayeselo assumptions.

Great! Does it use 5-nomial for paired color reversed games, potentially using unbalanced openings? 5-nomial is what was missing from Cutechess-Cli. Can you compile this thing for Win64, I would try it a bit? Can you implement EPD openings too? My files of all sorts of openings are almost all EPD. Thanks!

abulmo2 · Post by **abulmo2** » Sat Feb 04, 2017 9:14 pm

Laskos wrote:Great! Does it use 5-nomial for paired color reversed games, potentially using unbalanced openings?

Yes.

5-nomial is what was missing from Cutechess-Cli. Can you compile this thing for Win64, I would try it a bit? Can you implement EPD openings too? My files of all sorts of openings are almost all EPD. Thanks!

Done here : https://github.com/abulmo/amoeba/releas ... ba-win.zip

epd file are now supported for openings. pgn & epd files are distinguished from the file extension: ".pgn" for pgn files & ".epd" or ".fen" for epd files. Only the fen string is read in epd files. It is possible to use several opening files at once: -b file1.pgn -b file2.epd, -b file3.pgn etc.

Laskos · Post by **Laskos** » Sun Feb 05, 2017 7:37 am

abulmo2 wrote:
Laskos wrote:Great! Does it use 5-nomial for paired color reversed games, potentially using unbalanced openings?
Yes.

5-nomial is what was missing from Cutechess-Cli. Can you compile this thing for Win64, I would try it a bit? Can you implement EPD openings too? My files of all sorts of openings are almost all EPD. Thanks!
Done here : https://github.com/abulmo/amoeba/releas ... ba-win.zip

epd file are now supported for openings. pgn & epd files are distinguished from the file extension: ".pgn" for pgn files & ".epd" or ".fen" for epd files. Only the fen string is read in epd files. It is possible to use several opening files at once: -b file1.pgn -b file2.epd, -b file3.pgn etc.

Thanks, I will play with it!

Laskos · Post by **Laskos** » Sun Feb 05, 2017 5:43 pm

abulmo2 wrote:
Laskos wrote:Great! Does it use 5-nomial for paired color reversed games, potentially using unbalanced openings?
Yes.

5-nomial is what was missing from Cutechess-Cli. Can you compile this thing for Win64, I would try it a bit? Can you implement EPD openings too? My files of all sorts of openings are almost all EPD. Thanks!
Done here : https://github.com/abulmo/amoeba/releas ... ba-win.zip

epd file are now supported for openings. pgn & epd files are distinguished from the file extension: ".pgn" for pgn files & ".epd" or ".fen" for epd files. Only the fen string is read in epd files. It is possible to use several opening files at once: -b file1.pgn -b file2.epd, -b file3.pgn etc.

Richard, I don't seem to find a help or examples file with the usage of tourney.exe. Can you exemplify a command line to use 2 engines using SPRT stop? Thanks!

Adam Hair · Post by **Adam Hair** » Sun Feb 05, 2017 6:11 pm

Until Richard replies, here is what I have found:

Code: Select all

C:\Users\Adam\Downloads\amoeba-win>tourney-win64.exe -h

Run a tournament between two UCI engines using Sequential Probability Ratio Test as stopping condition.

tourney --engine|-e <cmd> --engine|-e <cmd>  [optional settings]
    --engine|-e <cmd>        launch an engine with <cmd>. 2 engines should be loaded
    --time|-t <movetime>     time (in seconds) to play a move (default 0.1s)
    --book|-b <pgn|epd file> opening book
    --output|-o <pgn file>   save the played games
    --games|-g <games>       max number of game pairs to play (default 30000)
    --cpu|-n <cpu>           number of games to play in parallel (default 1)
    --elo0|-H0 <elo>         H0 hypothesis (default = 0)
    --elo1|-H1 <elo>         H1 hypothesis (default = 5)
    --alpha|-╬▒ <alpha>       type I error (default = 0.05)
    --beta|-╬▓ <beta>         type II error (default = 0.05)
    --variance|-V <type>     3nomial|5nomial|all (default=all)
    --help|-h                display this help
    --version|-v             show version number

For example:
$ tourney -e amoeba-2.1 -e amoeba-2.0 -g 30000 -b opening.pgn -t 0.1 -n 3 -o game.pgn
[...]
Amoeba 2.1-l64p vs Amoeba 2.0.l64p
results: 3524 games
wdl:    w: 1058, d: 1545, l: 921
pair:   0: 112, 0.5: 395, 1: 651, 1.5: 452, 2: 152
Elo: 13.5 [9.9, 17.1]
LOS: 99.92 %
LLR: 2.992 [-2.944, 2.944]
test accepted

Laskos · Post by **Laskos** » Sun Feb 05, 2017 6:38 pm

Adam Hair wrote:Until Richard replies, here is what I have found:

Code: Select all

C:\Users\Adam\Downloads\amoeba-win>tourney-win64.exe -h

Run a tournament between two UCI engines using Sequential Probability Ratio Test as stopping condition.

tourney --engine|-e <cmd> --engine|-e <cmd>  [optional settings]
    --engine|-e <cmd>        launch an engine with <cmd>. 2 engines should be loaded
    --time|-t <movetime>     time (in seconds) to play a move (default 0.1s)
    --book|-b <pgn|epd file> opening book
    --output|-o <pgn file>   save the played games
    --games|-g <games>       max number of game pairs to play (default 30000)
    --cpu|-n <cpu>           number of games to play in parallel (default 1)
    --elo0|-H0 <elo>         H0 hypothesis (default = 0)
    --elo1|-H1 <elo>         H1 hypothesis (default = 5)
    --alpha|-╬▒ <alpha>       type I error (default = 0.05)
    --beta|-╬▓ <beta>         type II error (default = 0.05)
    --variance|-V <type>     3nomial|5nomial|all (default=all)
    --help|-h                display this help
    --version|-v             show version number

For example:
$ tourney -e amoeba-2.1 -e amoeba-2.0 -g 30000 -b opening.pgn -t 0.1 -n 3 -o game.pgn
[...]
Amoeba 2.1-l64p vs Amoeba 2.0.l64p
results: 3524 games
wdl:    w: 1058, d: 1545, l: 921
pair:   0: 112, 0.5: 395, 1: 651, 1.5: 452, 2: 152
Elo: 13.5 [9.9, 17.1]
LOS: 99.92 %
LLR: 2.992 [-2.944, 2.944]
test accepted

Thanks, that helps very much.

abulmo2 · Post by **abulmo2** » Mon Feb 06, 2017 7:25 am

Thank you for your anticipated answer.
Note that you can remove this garbage output:

Code: Select all

    --alpha|-╬▒ <alpha>       type I error (default = 0.05)
    --beta|-╬▓ <beta>         type II error (default = 0.05)

By typing chcp 65001 before launching the program and get:

Code: Select all

   --alpha|-α <alpha>       type I error (default = 0.05)
    --beta|-β <beta>         type II error (default = 0.05)

Michel · Post by **Michel** » Mon Feb 06, 2017 12:11 pm

Very nice!

I checked that the 5-nomial LLR is correct. The trinomial and 5-nomial LLR are respectively

Code: Select all

2.87
2.99

So the 5-nomial one is a bit bigger, allowing the test to stop earlier.

Here is some quick and dirty code for computing these LLR's

Code: Select all

from __future__ import division
import math

bb=math.log(10)/400
def L(x):
    return 1/(1+math.exp(-bb*x))

def LLR(elo0,elo1,results):
    """ 
Compute the generalized log-likelihood ratio for "results" which
should be a list of either length 3 or 5.
"""
    score0=L(elo0)
    score1=L(elo1)
    N=sum(results)
    if N<=1:
        return 0
    l=len(results)
    l1=l-1
    score=sum([results[i] * (i/l1) for i in xrange(0,l)])/(N)
    var=sum([results[i] * ((i/l1)-score) * ((i/l1)-score) for i in xrange(0,l)])/(N-1)
    var_score=var/N

    if len( filter(lambda x:cmp(x,0)>0,results))<=1:
        return 0
    else:
        return (score1-score0)*(2*score-score0-score1)/var_score/2

if __name__=='__main__':
    print LLR(0,5,[921,1545,1058])
    print LLR(0,5,[112,395,651,452,152])

Laskos · Post by **Laskos** » Tue Feb 07, 2017 7:20 am

Excellent, with all the stats in the output, I love it and I saw no flaws. The setting of H0 and H1 doesn't work for me, I just use defaults, maybe I am doing something wrong, it gives:

Code: Select all

std.conv.ConvException@C:\ldc\bin\..\import\std\conv.d(1876): Can't parse string: bool should be case-insensitive 'true' or 'false'

Some experiments:

First, to have a feeling about the scaling.
N = number of games.

LLR ~ N
standard deviation ~ confidence interval ~ 1/sqrt(N)

1/ Low draw rate (around 40%):
===============================================
2moves_v1 (balanced):

Stockfish 210117 64 BMI2 vs Stockfish 140916 64 BMI2
results: 3192 games
wdl: w: 1015, d: 1309, l: 868
pair: 0: 104, 0.5: 336, 1: 607, 1.5: 407, 2: 142
Using variance of the pentanomial distribution of game pairs:
Elo: 16.0 [12.2, 19.8]
LOS: 99.98 %
LLR: 3.325 [-2.944, 2.944]
test accepted

Using variance of the trinomial distribution of single games:
Elo: 16.0 [12.1, 20.0]
LOS: 99.97 %
LLR: 3.037 [-2.944, 2.944]
test accepted

5-nomial still helps a bit by having some 10% higher LLR even for balanced openings.

Open_07_09 (disbalance 70-90cp)

Stockfish 210117 64 BMI2 vs Stockfish 140916 64 BMI2
results: 2730 games
wdl: w: 944, d: 957, l: 829
pair: 0: 70, 0.5: 266, 1: 608, 1.5: 321, 2: 100
Using variance of the pentanomial distribution of game pairs:
Elo: 14.6 [10.9, 18.4]
LOS: 99.94 %
LLR: 2.979 [-2.944, 2.944]
test accepted

Using variance of the trinomial distribution of single games:
Elo: 14.6 [9.3, 20.0]
LOS: 99.68 %
LLR: 2.020 [-2.944, 2.944]

Unbalanced openings help only moderately or not at all. 5-nomial helps a lot for the case of unbalanced.
===============================================

2/ High draw rate (around 90%)
===============================================
Endgame (balanced)

Stockfish 210117 64 BMI2 vs Stockfish 090416 64 BMI2
results: 2516 games
wdl: w: 126, d: 2300, l: 90
pair: 0: 0, 0.5: 66, 1: 1095, 1.5: 92, 2: 5
Using variance of the pentanomial distribution of game pairs:
Elo: 5.0 [3.4, 6.5]
LOS: 99.66 %
LLR: 3.662 [-2.944, 2.944]
test accepted

Using variance of the trinomial distribution of single games:
Elo: 5.0 [3.3, 6.7]
LOS: 99.29 %
LLR: 3.007 [-2.944, 2.944]
test accepted

5-nomial still helps even for balanced positions, by having LLR 20% higher.

Endgame_10_14 (disbalance 100-140cp)

Stockfish 210117 64 BMI2 vs Stockfish 090416 64 BMI2
results: 770 games
wdl: w: 231, d: 352, l: 187
pair: 0: 1, 0.5: 46, 1: 249, 1.5: 86, 2: 3
Using variance of the pentanomial distribution of game pairs:
Elo: 19.9 [15.4, 24.4]
LOS: 99.99 %
LLR: 2.981 [-2.944, 2.944]
test accepted

Using variance of the trinomial distribution of single games:
Elo: 19.9 [10.7, 29.1]
LOS: 98.43 %
LLR: 1.103 [-2.944, 2.944]

Unbalanced openings & 5-nomial help very much, shortening the matches to SPRT stop by factors of 3-4, as predicted by the models.
===============================================

sprt tourney manager

sprt tourney manager

Re: sprt tourney manager

Re: sprt tourney manager

Re: sprt tourney manager

Re: sprt tourney manager

Re: sprt tourney manager

Re: sprt tourney manager

Re: sprt tourney manager

Re: sprt tourney manager

Re: sprt tourney manager