The SPRT without draw model, elo model or whatever...

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: The SPRT without draw model, Elo model or whatever...

Post by Michel »

I do not know how you got LLR ~ 2.4, Michel.
Hi Jesus. I always enjoy talking to you! The 2.4 was of course computed using SimpleSPRT which use a different principle from the SF SPRT.

However the results should be compatible in first approximation. And they are!

I converted the input data

Code: Select all

losses=559 
draws=1271 
wins=637 
elo0=-1.5
elo1=4.5
into Bayes Elo parameters

Code: Select all

draw_elo= 198.250953389
Bayes Elo0= -2.04375986542 Bayes Elo1= 6.13127959626
Feeding this into a Bayes Elo based LLR calculator one gets
LLR= 2.40220453233
Almost the same as SimpleSPRT. I claim that the result by SimpleSPRT is actually the correct one, but in practice the difference should not matter.

Below is the code to compute these numbers.

Code: Select all

from __future__ import division

import math

from SimpleSPRT import SimpleSPRT

bb=math.log(10)/400

def L(x):
    return 1/(1+math.exp(-bb*x))

def Linv(s):
    return -math.log(1/s-1)/bb

def de_from_dr(dr):
    return Linv((dr+1)/2)

def scale(de):
    return (4*math.exp(-bb*de))/(1+math.exp(-bb*de))**2

def wdl(de,elo):
# bayes elo
    w=L(elo-de)
    l=L(-elo-de)
    d=1-w-l
    return(w,d,l)

def LL(de,elo,W,D,L):
# bayes elo
    (w,d,l)=wdl(de,elo)
    return W*math.log(w)+D*math.log(d)+L*math.log(l)

if __name__=='__main__':
    losses=559 
    draws=1271 
    wins=637 
    elo0=-1.5
    elo1=4.5
    
    dr=(wins+draws/2)/(wins+draws+losses)
    de= de_from_dr(dr)
    print "draw_elo=",de
    sc=scale(de)
    belo0=elo0/sc
    belo1=elo1/sc
    print "Bayes Elo0=",belo0,"Bayes Elo1=",belo1
    print "LLR=",LL(de,belo1,wins,draws,losses)-LL(de,belo0,wins,draws,losses)

Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: The SPRT without draw model, elo model or whatever...

Post by Ferdy »

Tried a sprt session on game test together with simple sprt.
While game test is in progress, I take sample every 30s.
The game TC is 15s+100ms inc.

Sample output,

Code: Select all

TestEngine: D2015.1.241
BaseEngine: D2014.3.130
Elo0: -1.50, Elo1: 4.50, alpha: 0.05, beta: 0.05
T: 136, W: 41, L: 35, D: 60, Score: 52.2%, NetW: +6
Elo: +15, err95: +/-43, LOS: 0.75474
LLR: 0.14, [-2.94, +2.94]
SimpleSPRT status: EMPTY

TestEngine: D2015.1.241
BaseEngine: D2014.3.130
Elo0: -1.50, Elo1: 4.50, alpha: 0.05, beta: 0.05
T: 142, W: 41, L: 38, D: 63, Score: 51.1%, NetW: +3
Elo: +7, err95: +/-42, LOS: 0.63219
LLR: 0.06, [-2.94, +2.94]
SimpleSPRT status: EMPTY

TestEngine: D2015.1.241
BaseEngine: D2014.3.130
Elo0: -1.50, Elo1: 4.50, alpha: 0.05, beta: 0.05
T: 142, W: 41, L: 38, D: 63, Score: 51.1%, NetW: +3
Elo: +7, err95: +/-42, LOS: 0.63219
LLR: 0.06, [-2.94, +2.94]
SimpleSPRT status: EMPTY
[...]
Then later,

Code: Select all

[...]
TestEngine: D2015.1.241
BaseEngine: D2014.3.130
Elo0: -1.50, Elo1: 4.50, alpha: 0.05, beta: 0.05
T: 1318, W: 445, L: 321, D: 552, Score: 54.7%, NetW: +124
Elo: +32, err95: +/-14, LOS: 1.00000
LLR: 2.93, [-2.94, +2.94]
SimpleSPRT status: H1

TestEngine: D2015.1.241
BaseEngine: D2014.3.130
Elo0: -1.50, Elo1: 4.50, alpha: 0.05, beta: 0.05
T: 1318, W: 445, L: 321, D: 552, Score: 54.7%, NetW: +124
Elo: +32, err95: +/-14, LOS: 1.00000
LLR: 2.93, [-2.94, +2.94]
SimpleSPRT status: H1

TestEngine: D2015.1.241
BaseEngine: D2014.3.130
Elo0: -1.50, Elo1: 4.50, alpha: 0.05, beta: 0.05
T: 1324, W: 447, L: 322, D: 555, Score: 54.7%, NetW: +125
Elo: +32, err95: +/-14, LOS: 1.00000
LLR: 2.96, [-2.94, +2.94]
SimpleSPRT status: H1
Overall plot,

Image


Zoom-in at later part.

Image

Based from the plot stopping at first H1 could have save me around 100 games. The LLR goes down changing the simple sprt status to empty, but later regained it, then empty again and finally H1 till sprt LLR is above LLR_UB.

This is the data at first H1.

Code: Select all

TestEngine: D2015.1.241
BaseEngine: D2014.3.130
Elo0: -1.50, Elo1: 4.50, alpha: 0.05, beta: 0.05
T: 1218, W: 408, L: 303, D: 507, Score: 54.3%, NetW: +105
Elo: +30, err95: +/-14, LOS: 0.99997
LLR: 2.47, [-2.94, +2.94]
SimpleSPRT status: H1
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: The SPRT without draw model, elo model or whatever...

Post by Ferdy »

Created again another sprt session comparing stop rule for GSPRT and SF SPRT. Revised the engine trying to improve its rook eval and matched it with the one without that change. While game is in progress took sample data every 60s. The match used a game TC 15s+100ms inc.
Initial data,

Code: Select all

TestEngine: D2015.1.249
BaseEngine: D2015.1.241
Elo0: -1.50, Elo1: 4.50, alpha: 0.05, beta: 0.05
T: 10, W: 3, L: 3, D: 4, Score: 50.0%, NetW: +0
Elo: +0, err95: +/-181, LOS: 0.50000
LLR: -0.00, [-2.94, +2.94]
SimpleSPRT status: EMPTY

TestEngine: D2015.1.249
BaseEngine: D2015.1.241
Elo0: -1.50, Elo1: 4.50, alpha: 0.05, beta: 0.05
T: 17, W: 6, L: 5, D: 6, Score: 52.9%, NetW: +1
Elo: +20, err95: +/-139, LOS: 0.61880
LLR: 0.02, [-2.94, +2.94]
SimpleSPRT status: EMPTY
The game test was stopped when SF SPRT LLR value is outside the bounds. That SimpleSPRT is GSPRT in the plot.

Here is the complete progress in a plot.

Image

According to GSPRT I could have saved a lot of games if I use its stopping rule.

The first data where GSPRT status is H0.

Code: Select all

TestEngine: D2015.1.249
BaseEngine: D2015.1.241
Elo0: -1.50, Elo1: 4.50, alpha: 0.05, beta: 0.05
T: 3758, W: 935, L: 1009, D: 1814, Score: 49.0%, NetW: -74
Elo: -6, err95: +/-7, LOS: 0.04643
LLR: -2.21, [-2.94, +2.94]
SimpleSPRT status: H0
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: The SPRT without draw model, elo model or whatever...

Post by Michel »

Created again another sprt session comparing stop rule for GSPRT and SF SPRT.
Thank you for your interest!

But it is not clear to me if you are converting the elo inputs. If you don't do this then the comparison is incorrect.

SF SPRT input is in BayesElo and SimpleSPRT input is in standard Elo.

Thus

SF-SPRT(-1.5,4.5) is not equivalent to SimpleSPRT(-1.5,4.5)

In the example you originally posted the scale factor is 0.734 (draw_elo=198).

So

SF-SPRT(-1.5,4.5) is equivalent to SimpleSPRT(-1.101,3.303)

Michel
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: The SPRT without draw model, elo model or whatever...

Post by Ferdy »

Michel wrote:
Created again another sprt session comparing stop rule for GSPRT and SF SPRT.
Thank you for your interest!

But it is not clear to me if you are converting the elo inputs. If you don't do this then the comparison is incorrect.

SF SPRT input is in BayesElo and SimpleSPRT input is in standard Elo.

Thus

SF-SPRT(-1.5,4.5) is not equivalent to SimpleSPRT(-1.5,4.5)

In the example you originally posted the scale factor is 0.734 (draw_elo=198).

So

SF-SPRT(-1.5,4.5) is equivalent to SimpleSPRT(-1.101,3.303)

Michel
I am using from the class that you posted.

Code: Select all

class SimpleSPRT: 
    """ 
    This class performs a GSPRT for H0:elo=elo0 versus H1:elo=elo1 
    See here for a description of the GSPRT as well as theoretical (asymptotic) results. 
    
    http://stat.columbia.edu/~jcliu/paper/GSPRT_SQA3.pdf 
    
    To record the outcome of a game use the method self.record(result) where "result" is one of 0,1,2, 
    corresponding respectively to a loss, a draw and a win. 
""" 

    def __init__(self,alpha=0.05,beta=0.05,elo0=0,elo1=5): 
        self.score0=L(elo0) 
        self.score1=L(elo1) 
        self.LA=math.log(beta/(1-alpha)) 
        self.LB=math.log((1-beta)/alpha) 
        self.results=3*[0] 
        self._status='' 

    def record(self,result): 
        self.results[result]+=1 
        LLR=LL(self.score1,self.results)-LL(self.score0,self.results) 
        if LLR>self.LB: 
            self._status='H1' 
        elif LLR<self.LA&#58; 
            self._status='H0' 
        else&#58;
            self._status=''

    def status&#40;self&#41;&#58; 
        return self._status 
But that is fine, I will create a new plot based on that conversion and see the number of games where it produce H0.

So what is the formula to get elo0 and elo1 for simpleSPRT?
User avatar
Ajedrecista
Posts: 1966
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: The SPRT without draw model, Elo model or whatever...

Post by Ajedrecista »

Hello:
Ferdy wrote:
Michel wrote:
Ferdy wrote:Created again another sprt session comparing stop rule for GSPRT and SF SPRT.
Thank you for your interest!

But it is not clear to me if you are converting the elo inputs. If you don't do this then the comparison is incorrect.

SF SPRT input is in BayesElo and SimpleSPRT input is in standard Elo.

Thus

SF-SPRT(-1.5,4.5) is not equivalent to SimpleSPRT(-1.5,4.5)

In the example you originally posted the scale factor is 0.734 (draw_elo=198).
But that is fine, I will create a new plot based on that conversion and see the number of games where it produce H0.

So what is the formula to get elo0 and elo1 for simpleSPRT?

Code: Select all

W = wins/games
D = draws/games
L = loses/games
// W + D + L = 1
drawelo = 200*log10&#91;&#40;1 - L&#41;*&#40;1 - W&#41;/&#40;L*W&#41;&#93; // Estimate of drawelo from the sample of games.
Elo = &#123;4*&#91;10^&#40;drawelo/400&#41;&#93;/&#91;1 + 10^&#40;drawelo/400&#41;&#93;²&#125;*Bayeselo
But drawelo estimated from the sample changes when more games are added, so I am not sure about converting Bayeselo bounds to Elo bounds. I mean, SimpleSPRT bounds will be changing after each game. Is it valid?

Regards from Spain.

Ajedrecista.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: The SPRT without draw model, Elo model or whatever...

Post by Ferdy »

Ajedrecista wrote:Hello:
Ferdy wrote:
Michel wrote:
Ferdy wrote:Created again another sprt session comparing stop rule for GSPRT and SF SPRT.
Thank you for your interest!

But it is not clear to me if you are converting the elo inputs. If you don't do this then the comparison is incorrect.

SF SPRT input is in BayesElo and SimpleSPRT input is in standard Elo.

Thus

SF-SPRT(-1.5,4.5) is not equivalent to SimpleSPRT(-1.5,4.5)

In the example you originally posted the scale factor is 0.734 (draw_elo=198).
But that is fine, I will create a new plot based on that conversion and see the number of games where it produce H0.

So what is the formula to get elo0 and elo1 for simpleSPRT?

Code: Select all

W = wins/games
D = draws/games
L = loses/games
// W + D + L = 1
drawelo = 200*log10&#91;&#40;1 - L&#41;*&#40;1 - W&#41;/&#40;L*W&#41;&#93; // Estimate of drawelo from the sample of games.
Elo = &#123;4*&#91;10^&#40;drawelo/400&#41;&#93;/&#91;1 + 10^&#40;drawelo/400&#41;&#93;²&#125;*Bayeselo
But drawelo estimated from the sample changes when more games are added, so I am not sure about converting Bayeselo bounds to Elo bounds. I mean, SimpleSPRT bounds will be changing after each game. Is it valid?

Regards from Spain.

Ajedrecista.
I don't know, I am not going to prove it mathematically :). Michel knew it.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: The SPRT without draw model, elo model or whatever...

Post by Ferdy »

Converted the initial elo0 and elo1 of simple SPRT using drawelo = 200, and conversion formula given by Jesus. The elo values are displayed in the legend of the plot.

Here is the plot.

Image

Data of simple sprt at later part of the test run, with elos converted and status recalculated.

Code: Select all

&#91;...&#93;
sselo0 -1.10
sselo1 3.29
W 1535, L 1615, D 2955
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1540, L 1619, D 2960
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1543, L 1620, D 2964
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1547, L 1628, D 2967
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1550, L 1629, D 2969
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1552, L 1633, D 2974
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1555, L 1637, D 2981
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1556, L 1640, D 2986
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1561, L 1644, D 2990
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1565, L 1647, D 2995
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1569, L 1650, D 3000
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1573, L 1652, D 3005
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1577, L 1656, D 3011
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1583, L 1659, D 3011
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1587, L 1663, D 3015
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1590, L 1667, D 3019
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1593, L 1672, D 3023
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1596, L 1675, D 3028
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1598, L 1678, D 3034
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1599, L 1678, D 3041
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1603, L 1681, D 3046
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1608, L 1683, D 3050
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1610, L 1687, D 3057
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1610, L 1689, D 3063
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1613, L 1696, D 3068
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1616, L 1700, D 3073
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1618, L 1701, D 3080
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1620, L 1702, D 3084
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1623, L 1705, D 3090
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1628, L 1709, D 3096
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1629, L 1710, D 3100
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1632, L 1712, D 3109
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1634, L 1712, D 3115
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1636, L 1715, D 3127
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1638, L 1719, D 3131
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1639, L 1722, D 3139
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1641, L 1722, D 3146
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1644, L 1726, D 3150
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1644, L 1730, D 3157
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1646, L 1736, D 3159
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1648, L 1739, D 3163
status EMPTY

sselo0 -1.10
sselo1 3.29
W 1651, L 1745, D 3169
status EMPTY
The last 2 sf sprt data.

Code: Select all

TestEngine&#58; D2015.1.249
BaseEngine&#58; D2015.1.241
Elo0&#58; -1.50, Elo1&#58; 4.50, alpha&#58; 0.05, beta&#58; 0.05
T&#58; 6550, W&#58; 1648, L&#58; 1739, D&#58; 3163, Score&#58; 49.3%, NetW&#58; -91
Elo&#58; -4, err95&#58; +/-6, LOS&#58; 0.05878
LLR&#58; -2.89, &#91;-2.94, +2.94&#93;

TestEngine&#58; D2015.1.249
BaseEngine&#58; D2015.1.241
Elo0&#58; -1.50, Elo1&#58; 4.50, alpha&#58; 0.05, beta&#58; 0.05
T&#58; 6565, W&#58; 1651, L&#58; 1745, D&#58; 3169, Score&#58; 49.3%, NetW&#58; -94
Elo&#58; -4, err95&#58; +/-6, LOS&#58; 0.05319
LLR&#58; -2.96, &#91;-2.94, +2.94&#93;
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: SF and Simple SPRT plots

Post by Ferdy »

Run another game test, and change elo values of Simple SPRT using drawelo 200. I revised your class to return the Simple SPRT LLR and plot it together with SF SPRT LLR.

Image

The test passes by SF SPRT. The plot of Simple SPRT seemed to be just the same except that its LLR is lower.
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: The SPRT without draw model, elo model or whatever...

Post by Michel »

Sorry for bumbing this thread but it turned out that things are even much easier than I thought. See this thread http://talkchess.com/forum/viewtopic.php?t=61105 .

Here is very simple python code to implement the SPRT a la cutechess-cli or fishtest

Code: Select all

from __future__ import division

import math

bb=math.log&#40;10&#41;/400

def LL&#40;x&#41;&#58;
    return 1/&#40;1+math.exp&#40;-bb*x&#41;)

def TrivialLLR&#40;W,D,L,elo0,elo1&#41;&#58;
    """ 
This function computes the log likelihood ratio of H0&#58;elo=elo1 versus
H1&#58;elo=elo1 under the trinomial &#40;w/d/l&#41; logistic model. W/D/L are
respectively the Win/Draw/Loss count.

For details see

http&#58;//hardy.uhasselt.be/Toga/GSPRT_approximation.pdf
"""
# to avoid division by zero
    if W==0 or D==0 or  L==0&#58;
        return 0.0
    N=W+D+L
    w,d,l=W/N,D/N,L/N
    s=w+d/2
    m2=w+d/4
    var=m2-s**2
    var_s=var/N
    s0=LL&#40;elo0&#41;
    s1=LL&#40;elo1&#41;
    return &#40;s1-s0&#41;*&#40;2*s-s0-s1&#41;/var_s/2.0

def TrivialSPRT&#40;W,D,L,elo0,elo1,alpha,beta&#41;&#58;
    """ 
This function should be called after each game until it returns either
'H0' or 'H1' in which case the test stops.

alpha, beta are the type I,II error probabilities. The other parameters
are as in the description of the function TrivialLLR&#40;).
"""
    LLR=TrivialLLR&#40;W,D,L,elo0,elo1&#41;
    LA=math.log&#40;beta/&#40;1-alpha&#41;)
    LB=math.log&#40;&#40;1-beta&#41;/alpha&#41;
    if LLR>LB&#58;
        return 'H1'
    elif LLR<LA&#58;
        return 'H0'
    else&#58;
        return ''
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.