The SPRT without draw model, elo model or whatever...

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Michel
Posts: 1961
Joined: Sun Sep 28, 2008 11:50 pm

The SPRT without draw model, elo model or whatever...

A typical implementation of the SPRT (e.g. cutechess) now runs as follows:

(1) Estimate the draw_elo parameter of the BayesElo model from the sample.
(2) Use a "scale" given by a standard formula to convert logistic elo's (input to the SPRT) to Bayes elo's.
(3) Perform the SPRT using the BayesElo model.

This is partially the result of my own suggestion long ago to use the Bayes Elo model for the SPRT.

However I have now come to realize that this procedure is ridiculously circuitous! My suggestion was based on a lack of experience in statistics.

After all a match between two engines just follows a trinomial distribution. Thus we may perform directly a GSPRT(*) for H0:expected_score=score0 against H1:expected_score=score1 with unknown parameter the draw ratio. This is mathematically much cleaner and also much easier to implement....

(*) The GSPRT is a variant on the SPRT when there are unknown parameters. One replaces the log likelihood used in the SPRT by its maximum over the parameter space subject to the conditions H0 and H1. The GSPRT satisfies the same optimality properties as the SPRT, at least asymptotically.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

Posts: 8049
Joined: Wed Jul 26, 2006 8:21 pm

Re: The SPRT without draw model, elo model or whatever...

Michel wrote:A typical implementation of the SPRT (e.g. cutechess) now runs as follows:

(1) Estimate the draw_elo parameter of the BayesElo model from the sample.
(2) Use a "scale" given by a standard formula to convert logistic elo's (input to the SPRT) to Bayes elo's.
(3) Perform the SPRT using the BayesElo model.

This is partially the result of my own suggestion long ago to use the Bayes Elo model for the SPRT.

However I have now come to realize that this procedure is ridiculously circuitous! My suggestion was based on a lack of experience in statistics.

After all a match between two engines just follows a trinomial distribution. Thus we may perform directly a GSPRT(*) for H0:expected_score=score0 against H1:expected_score=score1 with unknown parameter the draw ratio. This is mathematically much cleaner and also much easier to implement....

(*) The GSPRT is a variant on the SPRT when there are unknown parameters. One replaces the log likelihood used in the SPRT by its maximum over the parameter space subject to the conditions H0 and H1. The GSPRT satisfies the same optimality properties as the SPRT, at least asymptotically.
Good. It always seemed to me that this Bayeselo draw model for SPRT is a useless pain in the neck of using a draw model when we do know the trinomial distribution. Didn't know about GSPRT, and to avoid the pain, in analytical derivations I simply used draw_elo=0, using binomial only. I also used binomial distribution in order to approximate the trinomial distribution, it fits very well in most cases (draw_elo now is not 0, but I get rid of it).

Michel
Posts: 1961
Joined: Sun Sep 28, 2008 11:50 pm

Re: The SPRT without draw model, elo model or whatever...

Here is some quick code that implements this. I have kept things as simple possible.

Code: Select all

``````from __future__ import division

import math

bb=math.log&#40;10&#41;/400

def L&#40;x&#41;&#58;
return 1/&#40;1+math.exp&#40;-bb*x&#41;)

def DrawRatioMLE&#40;s,results&#41;&#58;
"""
We compute the MLE for the draw ratio, subject to the condition score=s.
This analytically correct formula was computed with Sage. Probably
D/&#40;W+D+L&#41; works just as well.
"""
L=results&#91;0&#93;
D=results&#91;1&#93;
W=results&#91;2&#93;
return &#40;L*s - W*s + D + W - math.sqrt&#40;4*D**2*s**2 + 4*D*L*s**2 + 4*D*W*s**2 + L**2*s**2 - 2*L*W*s**2 + W**2*s**2 - 4*D**2*s - 2*D*L*s - 6*D*W*s + 2*L*W*s - 2*W**2*s + D**2 + 2*D*W + W**2&#41;)/&#40;W+D+L&#41;

def LL&#40;s,results&#41;&#58;
"""
Compute the log likelihood for score=s.
"""
d=DrawRatioMLE&#40;s,results&#41;
w=s-&#40;1/2&#41;*d
l=1-s-&#40;1/2&#41;*d
L=results&#91;0&#93;
D=results&#91;1&#93;
W=results&#91;2&#93;
# sanity
if W==0 or D==0 or L==0&#58;
return 0
else&#58;
return W*math.log&#40;w&#41;+D*math.log&#40;d&#41;+L*math.log&#40;l&#41;

#####################################################################################################

class SimpleSPRT&#58;
"""
This class performs a GSPRT for H0&#58;elo=elo0 versus H1&#58;elo=elo1
See here for a description of the GSPRT as well as theoretical &#40;asymptotic&#41; results.

http&#58;//stat.columbia.edu/~jcliu/paper/GSPRT_SQA3.pdf

To record the outcome of a game use the method self.record&#40;result&#41; where "result" is one of 0,1,2,
corresponding respectively to a loss, a draw and a win.
"""

def __init__&#40;self,alpha=0.05,beta=0.05,elo0=0,elo1=5&#41;&#58;
self.score0=L&#40;elo0&#41;
self.score1=L&#40;elo1&#41;
self.LA=math.log&#40;beta/&#40;1-alpha&#41;)
self.LB=math.log&#40;&#40;1-beta&#41;/alpha&#41;
self.results=3*&#91;0&#93;
self._status=''

def record&#40;self,result&#41;&#58;
self.results&#91;result&#93;+=1
LLR=LL&#40;self.score1,self.results&#41;-LL&#40;self.score0,self.results&#41;
if LLR>self.LB&#58;
self._status='H1'
elif LLR<self.LA&#58;
self._status='H0'

def status&#40;self&#41;&#58;
return self._status
``````
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

Ferdy
Posts: 3607
Joined: Sun Aug 10, 2008 1:15 pm
Location: Philippines

Re: The SPRT without draw model, elo model or whatever...

Michel wrote:Here is some quick code that implements this. I have kept things as simple possible.

Code: Select all

``````from __future__ import division

import math

bb=math.log&#40;10&#41;/400

def L&#40;x&#41;&#58;
return 1/&#40;1+math.exp&#40;-bb*x&#41;)

def DrawRatioMLE&#40;s,results&#41;&#58;
"""
We compute the MLE for the draw ratio, subject to the condition score=s.
This analytically correct formula was computed with Sage. Probably
D/&#40;W+D+L&#41; works just as well.
"""
L=results&#91;0&#93;
D=results&#91;1&#93;
W=results&#91;2&#93;
return &#40;L*s - W*s + D + W - math.sqrt&#40;4*D**2*s**2 + 4*D*L*s**2 + 4*D*W*s**2 + L**2*s**2 - 2*L*W*s**2 + W**2*s**2 - 4*D**2*s - 2*D*L*s - 6*D*W*s + 2*L*W*s - 2*W**2*s + D**2 + 2*D*W + W**2&#41;)/&#40;W+D+L&#41;

def LL&#40;s,results&#41;&#58;
"""
Compute the log likelihood for score=s.
"""
d=DrawRatioMLE&#40;s,results&#41;
w=s-&#40;1/2&#41;*d
l=1-s-&#40;1/2&#41;*d
L=results&#91;0&#93;
D=results&#91;1&#93;
W=results&#91;2&#93;
# sanity
if W==0 or D==0 or L==0&#58;
return 0
else&#58;
return W*math.log&#40;w&#41;+D*math.log&#40;d&#41;+L*math.log&#40;l&#41;

#####################################################################################################

class SimpleSPRT&#58;
"""
This class performs a GSPRT for H0&#58;elo=elo0 versus H1&#58;elo=elo1
See here for a description of the GSPRT as well as theoretical &#40;asymptotic&#41; results.

http&#58;//stat.columbia.edu/~jcliu/paper/GSPRT_SQA3.pdf

To record the outcome of a game use the method self.record&#40;result&#41; where "result" is one of 0,1,2,
corresponding respectively to a loss, a draw and a win.
"""

def __init__&#40;self,alpha=0.05,beta=0.05,elo0=0,elo1=5&#41;&#58;
self.score0=L&#40;elo0&#41;
self.score1=L&#40;elo1&#41;
self.LA=math.log&#40;beta/&#40;1-alpha&#41;)
self.LB=math.log&#40;&#40;1-beta&#41;/alpha&#41;
self.results=3*&#91;0&#93;
self._status=''

def record&#40;self,result&#41;&#58;
self.results&#91;result&#93;+=1
LLR=LL&#40;self.score1,self.results&#41;-LL&#40;self.score0,self.results&#41;
if LLR>self.LB&#58;
self._status='H1'
elif LLR<self.LA&#58;
self._status='H0'

def status&#40;self&#41;&#58;
return self._status
``````
Try to make use of your class and input my previous sprt records.

My sprt record, formula based from Sf,

Code: Select all

``````TestEngine&#58; D2015.1.35.239.tuned
BaseEngine&#58; D2015.1.35.239
Elo0&#58; -1.50, Elo1&#58; 4.50, alpha&#58; 0.05, beta&#58; 0.05
T&#58; 2467, W&#58; 637, L&#58; 559, D&#58; 1271, Score&#58; 51.6%, NetW&#58; +78
Elo&#58; +10, err95&#58; +/-9, LOS&#58; 0.98816
LLR&#58; 1.84, &#91;-2.94, +2.94&#93;``````

Code: Select all

``````enter alpha? 0.05
enter beta? 0.05

enter elo0? -1.5
enter elo1? 4.5

enter losses? 559
enter draws? 1271
enter wins? 637

status&#58; H0``````
So how to interpret that?

Here is the interface,

Code: Select all

``````from gsprt import SimpleSPRT
import sys

def main&#40;argv&#41;&#58;

alpha = float&#40;raw_input&#40;'enter alpha? '))
beta = float&#40;raw_input&#40;'enter beta? '))
print
elo0 = float&#40;raw_input&#40;'enter elo0? '))
elo1 = float&#40;raw_input&#40;'enter elo1? '))
print
losses = int&#40;raw_input&#40;'enter losses? '))
draws = int&#40;raw_input&#40;'enter draws? '))
wins = int&#40;raw_input&#40;'enter wins? '))
print

data = SimpleSPRT&#40;alpha, beta, elo0, elo1&#41;

# Losses
for _ in range&#40;losses&#41;&#58;
data.record&#40;0&#41;

# Draws
for _ in range&#40;draws&#41;&#58;
data.record&#40;1&#41;

# Wins
for _ in range&#40;wins&#41;&#58;
data.record&#40;2&#41;

print 'status&#58; %s' % data.status&#40;)

if __name__ == "__main__"&#58;
main&#40;sys.argv&#91;1&#58;&#93;)``````

Michel
Posts: 1961
Joined: Sun Sep 28, 2008 11:50 pm

Re: The SPRT without draw model, elo model or whatever...

So how to interpret that?
I have not yet checked your computation yet but the input to SimpleSPRT is in standard elo (as in cutechess-cli).

The input of the SPRT as used by Stockfish is in Bayes Elo, which is really a historical accident. It makes the interpretation of the paramaters needlessly difficult.

The conversion between Bayes Elo and standard elo can be done using a scale factor given by the following formula

(4*math.exp(-bb*de))/(1+math.exp(-bb*de))**2

where bb=math.log(10)/400 and de is draw_elo.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

Michel
Posts: 1961
Joined: Sun Sep 28, 2008 11:50 pm

Re: The SPRT without draw model, elo model or whatever...

# Losses
for _ in range(losses):
data.record(0)

# Draws
for _ in range(draws):
data.record(1)

# Wins
for _ in range(wins):
data.record(2)
Sorry this is not how one uses the class. This is a sequential test. So one should check self.status() after every self.record() and bail out as soon as self.status() returns 'H0' or 'H1'.

I thought this was clear from the way the code is written, but it seems not. Sorry about that.

Here is some code that uses the class to do simulations

Code: Select all

``````import random,sys

def pick&#40;w,d,l&#41;&#58;
s=random.random&#40;)
if s<=w&#58;
return 2
elif s<=w+d&#58;
return 1
else&#58;
return 0

def simulate&#40;draw_ratio,epsilon,alpha,elo&#41;&#58;
"""
We simulate the test H0&#58;elo_diff=0 versus H1&#58;elo_diff=epsilon with elo_diff equal
to elo.
"""
sp=SimpleSPRT&#40;alpha=alpha,beta=alpha,elo0=0,elo1=epsilon&#41;
s=L&#40;elo&#41;
d=draw_ratio
w=s-&#40;1/2&#41;*d
l=1-s-&#40;1/2&#41;*d
while True&#58;
r=pick&#40;w,d,l&#41;
sp.record&#40;r&#41;
status=sp.status&#40;)
if status!=''&#58;
return status

if __name__=='__main__'&#58;
epsilon=5
a=&#123;'H0'&#58;0,'H1'&#58;0&#125;
cc=0
while True&#58;
status=simulate&#40;draw_ratio=0.33,epsilon=epsilon,alpha=0.05,elo=0&#41;
a&#91;status&#93;+=1
cc+=1
print a&#91;'H0'&#93;/cc, a&#91;'H1'&#93;/cc
sys.stdout.flush&#40;)``````
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

Ferdy
Posts: 3607
Joined: Sun Aug 10, 2008 1:15 pm
Location: Philippines

Re: The SPRT without draw model, elo model or whatever...

Michel wrote:
# Losses
for _ in range(losses):
data.record(0)

# Draws
for _ in range(draws):
data.record(1)

# Wins
for _ in range(wins):
data.record(2)
Sorry this is not how one uses the class. This is a sequential test. So one should check self.status() after every self.record() and bail out as soon as self.status() returns 'H0' or 'H1'.

I thought this was clear from the way the code is written, but it seems not. Sorry about that.

Here is some code that uses the class to do simulations

Code: Select all

``````import random,sys

def pick&#40;w,d,l&#41;&#58;
s=random.random&#40;)
if s<=w&#58;
return 2
elif s<=w+d&#58;
return 1
else&#58;
return 0

def simulate&#40;draw_ratio,epsilon,alpha,elo&#41;&#58;
"""
We simulate the test H0&#58;elo_diff=0 versus H1&#58;elo_diff=epsilon with elo_diff equal
to elo.
"""
sp=SimpleSPRT&#40;alpha=alpha,beta=alpha,elo0=0,elo1=epsilon&#41;
s=L&#40;elo&#41;
d=draw_ratio
w=s-&#40;1/2&#41;*d
l=1-s-&#40;1/2&#41;*d
while True&#58;
r=pick&#40;w,d,l&#41;
sp.record&#40;r&#41;
status=sp.status&#40;)
if status!=''&#58;
return status

if __name__=='__main__'&#58;
epsilon=5
a=&#123;'H0'&#58;0,'H1'&#58;0&#125;
cc=0
while True&#58;
status=simulate&#40;draw_ratio=0.33,epsilon=epsilon,alpha=0.05,elo=0&#41;
a&#91;status&#93;+=1
cc+=1
print a&#91;'H0'&#93;/cc, a&#91;'H1'&#93;/cc
sys.stdout.flush&#40;)``````
You are doing simulation, and I am inputing the values. But as I undertand the use of class is the same.
The pick() return 0, 1, 2, and you do sp.record(r).

Code: Select all

``````r=pick&#40;w,d,l&#41;
sp.record&#40;r&#41;``````
Same as what I did, just the L/D/W were accumulated before it was applied to get the status.

Code: Select all

``````# Losses
for _ in range&#40;losses&#41;&#58;
data.record&#40;0&#41;

# Draws
for _ in range&#40;draws&#41;&#58;
data.record&#40;1&#41;

# Wins
for _ in range&#40;wins&#41;&#58;
data.record&#40;2&#41; ``````
If there are 100 loses, I recorded it 100 times by looping thru 0 ... 99, and each time calls the record() too.

I don't know if it was intended for simulation.

But still my input is valid, just consider that my last input is the last call to
self.record(). Then you see I print the status after that given L/D/W stats.

From the sample that I use of L/D/W it returns H0. So that would mean that I should have exited my test, but using SPRT is should continue.

Code: Select all

``````TestEngine&#58; D2015.1.35.239.tuned
BaseEngine&#58; D2015.1.35.239
Elo0&#58; -1.50, Elo1&#58; 4.50, alpha&#58; 0.05, beta&#58; 0.05
T&#58; 2467, W&#58; 637, L&#58; 559, D&#58; 1271, Score&#58; 51.6%, NetW&#58; +78
Elo&#58; +10, err95&#58; +/-9, LOS&#58; 0.98816
LLR&#58; 1.84, &#91;-2.94, +2.94&#93;``````

Code: Select all

``````enter alpha? 0.05
enter beta? 0.05

enter elo0? -1.5
enter elo1? 4.5

enter losses? 559
enter draws? 1271
enter wins? 637

status&#58; H0``````

Michel
Posts: 1961
Joined: Sun Sep 28, 2008 11:50 pm

Re: The SPRT without draw model, elo model or whatever...

I still do not see what your problem is. If you are worried that the log likelihood ratio (LLR) reported by SF is different from the LLR reported by SimpleSPRT then that is simply due to the fact that SF uses _Bayes Elo's_ as input. 4.5 elo for SF is perhaps 2.8 real elo.

=====================

Something else: I do not think your use of the SPRT is theoretically correct. The error probabilities for the SPRT are based on the assumption that the test is stopped when the LLR crosses one of the boundaries. If you truncated it after a fixed amount of time then the computation is different. For one thing, the LLR can be in the neutral region.

I checked the LLR in your example and it was actually in the neutral region (2.39957). So the code should have reported '' instead of 'H0'. But 'H0' had been true before (because you inputted the losses first) and the code never resets the status back to '', as in a correctly used sequential test this is never necessary.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

Ferdy
Posts: 3607
Joined: Sun Aug 10, 2008 1:15 pm
Location: Philippines

Re: The SPRT without draw model, elo model or whatever...

Michel wrote:I still do not see what your problem is.
I don't have a problem, I am just trying to use your posted class (check my first post). Then I get H0 based from the L/D/W that I use, and I let you interpret that.
You were saying my use of class is not really right, and I explain that it is just the same, you are using simulation updating record per run, and I am updating at a given sum of games with L/D/W.
Michel wrote: I checked the LLR in your example and it was actually in the neutral region (2.39957). So the code should have reported '' instead of 'H0'. But 'H0' had been true before (because you inputted the losses first) and the code never resets the status back to '', as in a correctly used sequential test this is never necessary.
Now I see the limitation of this function.
def record(self,result):
self.results[result]+=1
LLR=LL(self.score1,self.results)-LL(self.score0,self.results)
if LLR>self.LB:
self._status='H1'
elif LLR<self.LA:
self._status='H0'
An else should be added to cover the empty status. This is also because
that LLR changes as results changes affecting the status.
def record(self,result):
self.results[result]+=1
LLR=LL(self.score1,self.results)-LL(self.score0,self.results)
if LLR>self.LB:
self._status='H1'
elif LLR<self.LA:
self._status='H0'
else:
self._status=''
Tried again with that change and I got empty.

Code: Select all

``````enter alpha? 0.05
enter beta? 0.05

enter elo0? -1.5
enter elo1? 4.5

enter losses? 559
enter draws? 1271
enter wins? 637

status&#58; ``````
So the sprt that I use agreed with your gsprt, meaning the test continues.

Ajedrecista
Posts: 1364
Joined: Wed Jul 13, 2011 7:04 pm
Contact:

Re: The SPRT without draw model, Elo model or whatever...

Hello:

I wrote a LLR calculator a while ago and I use Bayeselo units in SPRT bounds. My LLR calculator agrees with Ferd's LLR result of 1.84:

Code: Select all

``````Parameters found at LLR_parameters.txt file&#58;

alpha&#58;        0.0500
beta&#58;         0.0500

bayeselo_0&#58;  -1.5000
bayeselo_1&#58;   4.5000

----------------------------

Lower bound for LLR&#58; -2.9444
Upper bound for LLR&#58;  2.9444

----------------------------

Games&#58;       2467

Wins&#58;         637 &#40;25.82 %).
Loses&#58;        559 &#40;22.66 %).
Draws&#58;       1271 &#40;51.52 %).

bayeselo&#58;    14.9710
drawelo&#58;    198.2956

----------------------------

LLR&#40;wins&#41;&#58;       16.6408
LLR&#40;loses&#41;&#58;     -14.6643
LLR&#40;draws&#41;&#58;      -0.1391

LLR&#58;       1.8374``````
I never disagree with SF testing framework. I know that LLR ~ 1.84 means nothing because SPRT is sequential and LLR could cross bounds an even number of times to reach a neutral region where lower_bound < LLR < upper_bound.

I do not know how you got LLR ~ 2.4, Michel. If I do upper_bound/lower_bound = -3 and I use Bayeselo units, I get this LLR ~ 2.3996 with SPRT(-2.0409, 6.1227), using +637 =1271 -559, of course. With my drawelo result of 198.2956, then 1.5/{4*10^(198.2956/400)/[1 + 10^(198.2956/400)]²} ~ 2.044030841, which is similar to 2.0409. With SPRT(-2.0440, 6.1321) I obtain LLR ~ 2.4027.

OTOH I think that I understand you when you said that Ferd inputs loses first. My thought is the following one:

Code: Select all

``````LLR computed after each state i&#41;&#58;

1&#41; +0 =0 -0
2&#41; +0 =0 -1
3&#41; +0 =0 -2

&#91;...&#93;  // LLR could have crossed lower bound somewhere here, hence 'H0' report.
// IIRC, I had problems in the past when wins*draws*loses = 0 &#40;that is, I needed non-zero values&#41;.

559&#41; +0 =0 -558
560&#41; +0 =0 -559
561&#41; +0 =1 -559  // Probably here LLR < lower_bound according with my last comment.
562&#41; +0 =2 -559

&#91;...&#93;

1830&#41; +0 =1270 -559
1831&#41; +0 =1271 -559
1832&#41; +1 =1271 -559
1833&#41; +2 =1271 -559

&#91;...&#93;

2467&#41; +636 =1271 -559
2468&#41; +637 =1271 -559``````
In my LLR calculator I only input the state +637 =1271 -559 without looking at the previous states.

I wonder what would happen if draws or wins are first input in Ferd's code. If wins are the first input, the code will return 'H1'? If draws are the first input, will the report depend on the second input (wins or losses)? The best thing is calculate LLR after each WDL state, just like every SPRT simulator.

I am glad that Ferd solved the issue.

Regards from Spain.

Ajedrecista.