Usage sprt / cutechess-cli

Sery · Post by **Sery** » Fri Sep 04, 2015 8:01 pm

It seems strange that your elo0 param is greater than elo1 in first setup.

Desperado · Post by **Desperado** » Fri Sep 04, 2015 8:43 pm

Sery wrote:It seems strange that your elo0 param is greater than elo1 in first setup.

I thought about it already in the previous post (just before my last one).
But finally H0 and H1 are independent criteria, so it should not matter.
Even if it matters, the second setup should continue too.

Thx, for the reply, but i still think there is sth. wrong.

Ferdy · Post by **Ferdy** » Fri Sep 04, 2015 10:38 pm

Desperado wrote: Setup2:
=====

games: 35000 (max)
Set eng=%eng% -sprt elo0=0 elo1=10 alpha=0.01 beta=0.01
Code: Select all
Score of Omen0003 vs Omen0002: 1477 - 1351 - 2942  [0.511] 5770
ELO difference: 8
SPRT: llr -4.71, lbound -4.6, ubound 4.6 - H0 was accepted
Finished match
Again, the test should continue!, because not stronger (>=) 10 is still possible.

Summary:
=======

Maybe there is somthing mixed up/incorrect in the description.
But without understanding the maths i do understand " is at least stronger than" + "not stronger than at least by"
(with respect to the given uncertainties), and further that these are the requirements to stop the test!!!

So, this is simply wrong

Come on, please tell me i miss something essential, and please do not tell me that everybody is just happy about a "randomly" shortend test.

Set eng=%eng% -sprt elo0=0 elo1=10 alpha=0.01 beta=0.01

Although I use cutechess-cli, I use sf sprt to verify elo improvement. If I input that stats in sf sprt, I get this,

enter alpha? 0.01
enter beta? 0.01

enter elo0? 0
enter elo1? 10

enter losses? 1351
enter draws? 2942
enter wins? 1477

elo: 8, err: +/-6, drawelo: 195.6, LOS: 0.99125
llr: 2.81, [-4.60, 4.60]
status: unclear

The status unclear indicates that the test should be continued.
There is probably something wrong with cutechess-cli sprt.

Ferdy · Post by **Ferdy** » Sat Sep 05, 2015 8:21 am

Desperado wrote: Setup2:
=====

games: 35000 (max)
Set eng=%eng% -sprt elo0=0 elo1=10 alpha=0.01 beta=0.01
Code: Select all
Score of Omen0003 vs Omen0002: 1477 - 1351 - 2942  [0.511] 5770
ELO difference: 8
SPRT: llr -4.71, lbound -4.6, ubound 4.6 - H0 was accepted
Finished match

Convert your elo to bayes elo before applying the sf sprt.

enter alpha? 0.01
enter beta? 0.01

enter logistic elo0? 0
enter logistic elo1? 10

enter losses? 1351
enter draws? 2942
enter wins? 1477

bayes elo0: 0.0
bayes elo1: 13.5

sf sprt
elo: 8, err: +/-6, drawelo: 195.6, LOS: 0.99125
llr: 2.53, [-4.60, 4.60]
state:

"Not rejected and not accepted"

Desperado · Post by **Desperado** » Sat Sep 05, 2015 10:02 am

Ferdy wrote:
Desperado wrote: Setup2:
=====

games: 35000 (max)
Set eng=%eng% -sprt elo0=0 elo1=10 alpha=0.01 beta=0.01
Code: Select all
Score of Omen0003 vs Omen0002: 1477 - 1351 - 2942  [0.511] 5770
ELO difference: 8
SPRT: llr -4.71, lbound -4.6, ubound 4.6 - H0 was accepted
Finished match
Convert your elo to bayes elo before applying the sf sprt.
enter alpha? 0.01
enter beta? 0.01

enter logistic elo0? 0
enter logistic elo1? 10

enter losses? 1351
enter draws? 2942
enter wins? 1477

bayes elo0: 0.0
bayes elo1: 13.5

sf sprt
elo: 8, err: +/-6, drawelo: 195.6, LOS: 0.99125
llr: 2.53, [-4.60, 4.60]
state:

"Not rejected and not accepted"

Hi, Ferdi,

so, "not rejected and not accepted" means that the test should have been continued, but it was stopped, which is wrong.

Can you confirm that H0 and H1 are independent tests, and that it is irrelevant how the setup of the sprt refering to elo0/elo1 is?
I conclude this because either H0 or H1 needs to be accepted.
I mean it really should not matter if elo0 is ">" or "<" than elo1.

According to the description i expect for H1: stop if(P1 >= P2 + elo0)
According to the description i expect for H0: stop if(P1 <= P2 + elo1)

Many thanks for your replies.

Ferdy · Post by **Ferdy** » Sat Sep 05, 2015 12:24 pm

Desperado wrote:
Ferdy wrote:
Desperado wrote: Setup2:
=====

games: 35000 (max)
Set eng=%eng% -sprt elo0=0 elo1=10 alpha=0.01 beta=0.01
Code: Select all
Score of Omen0003 vs Omen0002: 1477 - 1351 - 2942  [0.511] 5770
ELO difference: 8
SPRT: llr -4.71, lbound -4.6, ubound 4.6 - H0 was accepted
Finished match
Convert your elo to bayes elo before applying the sf sprt.
enter alpha? 0.01
enter beta? 0.01

enter logistic elo0? 0
enter logistic elo1? 10

enter losses? 1351
enter draws? 2942
enter wins? 1477

bayes elo0: 0.0
bayes elo1: 13.5

sf sprt
elo: 8, err: +/-6, drawelo: 195.6, LOS: 0.99125
llr: 2.53, [-4.60, 4.60]
state:

"Not rejected and not accepted"
Hi, Ferdi,

so, "not rejected and not accepted" means that the test should have been continued, but it was stopped, which is wrong.

Yes test should have been continued.

Desperado wrote: Can you confirm that H0 and H1 are independent tests, and that it is irrelevant how the setup of the sprt refering to elo0/elo1 is?
I conclude this because either H0 or H1 needs to be accepted.
I mean it really should not matter if elo0 is ">" or "<" than elo1.

According to the description i expect for H1: stop if(P1 >= P2 + elo0)
According to the description i expect for H0: stop if(P1 <= P2 + elo1)

Many thanks for your replies.

According to this source,
https://en.wikipedia.org/wiki/Sequentia ... ratio_test
value of elo1 is greater than value of of elo0.

Code: Select all

The hypotheses are simply H_0: \theta=\theta_0 and H_1: \theta=\theta_1, with \theta_1>\theta_0

In cutechess, LLR is calculated like this,

Code: Select all

// Log-Likelyhood Ratio
	status.llr = m_wins * std::log(p1.pWin() / p0.pWin()) +
		     m_losses * std::log(p1.pLoss() / p0.pLoss()) +
		     m_draws * std::log(p1.pDraw() / p0.pDraw());

which is similar to sf sprt, there is a ratio of P1/P0. The stopping rule depends on this llr value.

But what I am trying to understand is why in your setup2, cutechess sprt does not continue but in sf sprt it will continue the test.

Ferdy · Post by **Ferdy** » Sat Sep 05, 2015 4:04 pm

Desperado wrote: Setup2:
=====

games: 35000 (max)
Set eng=%eng% -sprt elo0=0 elo1=10 alpha=0.01 beta=0.01
Code: Select all
Score of Omen0003 vs Omen0002: 1477 - 1351 - 2942  [0.511] 5770
ELO difference: 8
SPRT: llr -4.71, lbound -4.6, ubound 4.6 - H0 was accepted
Finished match

I took cutechess 0.7.1 sprt code and tried to input those values above and I get.

Code: Select all

llr: 2.52 [-4.60, 4.60]

And that is similar to what I get from sf sprt.

Did that data in setup2 and sprt results coming from cutechess-cli 0.7.1?

Desperado · Post by **Desperado** » Sat Sep 05, 2015 5:32 pm

Ferdy wrote:
Desperado wrote: Setup2:
=====

games: 35000 (max)
Set eng=%eng% -sprt elo0=0 elo1=10 alpha=0.01 beta=0.01
Code: Select all
Score of Omen0003 vs Omen0002: 1477 - 1351 - 2942  [0.511] 5770
ELO difference: 8
SPRT: llr -4.71, lbound -4.6, ubound 4.6 - H0 was accepted
Finished match
I took cutechess 0.7.1 sprt code and tried to input those values above and I get.
Code: Select all
llr: 2.52 [-4.60, 4.60]
And that is similar to what I get from sf sprt.

Did that data in setup2 and sprt results coming from cutechess-cli 0.7.1?

I did use cutechess-cli 0.7.1

ilari · Post by **ilari** » Sat Sep 05, 2015 7:20 pm

Desperado wrote:
Ferdy wrote:
Desperado wrote: Setup2:
=====

games: 35000 (max)
Set eng=%eng% -sprt elo0=0 elo1=10 alpha=0.01 beta=0.01
Code: Select all
Score of Omen0003 vs Omen0002: 1477 - 1351 - 2942  [0.511] 5770
ELO difference: 8
SPRT: llr -4.71, lbound -4.6, ubound 4.6 - H0 was accepted
Finished match
I took cutechess 0.7.1 sprt code and tried to input those values above and I get.
Code: Select all
llr: 2.52 [-4.60, 4.60]
And that is similar to what I get from sf sprt.

Did that data in setup2 and sprt results coming from cutechess-cli 0.7.1?
I did use cutechess-cli 0.7.1

I can confirm Ferdy's result - cutechess-cli's SPRT does give an llr of 2.52509 when using the parameters and results that you got.
BUT: that doesn't mean that there's not a bug somewhere else in cutechess-cli. I'll try to debug it...

ilari · Post by **ilari** » Sat Sep 05, 2015 8:20 pm

I couldn't reproduce the isssue after running some tests with both the Windows and Linux versions.

Michael: Could you please use the "-ratinginterval 10" parameter in your next run so I could see how the llr value progresses throughout the run? And please try cutechess-cli 0.7.2 just in case: https://github.com/cutechess/cutechess

Usage sprt / cutechess-cli

Re: Usage sprt / cutechess-cli.

Re: Usage sprt / cutechess-cli.

Re: Usage sprt / cutechess-cli.

Re: Usage sprt / cutechess-cli.

Re: Usage sprt / cutechess-cli.

Re: Usage sprt / cutechess-cli.

Re: Usage sprt / cutechess-cli.

Re: Usage sprt / cutechess-cli.

Re: Usage sprt / cutechess-cli.

Re: Usage sprt / cutechess-cli.