I read your PDF and the expected stopping time. The problem is that we need more than the expected value, we need quantiles. Here's a practical demonstration of why this idea cannot work:Michel wrote:This is simply not true. Having a theoretical answer you know the outcomeSo that's why, I insist on saying that discussing things theoretically is not enough. If you don't experiment, you'll never understand if and how it works... (or doesn't work in this case)
of the experiment, so no need to do it (except perhaps for confirmation). The opposite is not true.
In this case you can compute what happens. In fact I predicted the behaviour in 3) in my previous post. To make this manageable you have to truncate the SPRT.
Now to compare two test you really need to compare the Type I and Type II error probabilities in function of the parameter P. For the (truncated) SPRT
you can compute these. Can you do it for your test?
* I call my program with 3 params P(win) P(draw) delta (hence delta/2 type I and delta/2 type II error threshold). elo0=0 and elo1=+10, so testing whether A beats B by 10 elo or more is what I'm really looking at.
* Sample runs:
Code: Select all
lucas@megatron:~/Chess/DoubleCheck/StopAlgo/Release$ ./StopAlgo .4 .2 .05
min = 758 max = 30926 avg = 7353 win = 0.038000
lucas@megatron:~/Chess/DoubleCheck/StopAlgo/Release$ ./StopAlgo .4 .228 .05
min = 796 max = 29506 avg = 6429 win = 0.987000
lucas@megatron:~/Chess/DoubleCheck/StopAlgo/Release$ ./StopAlgo .4 .21 .05
min = 802 max = 76301 avg = 12142 win = 0.337000
* second example elo=elo1(+epsilon), and SPTR works fine too. observe the min and max values over 1000 runs of the stopping time
* third example, right in the danger zone elo0 < elo < elo1. SPTR doesn't makle sense here, as discussed. observe the min and max values over 1000 runs of the stopping time.
* now if you look at all these stopping time min/max, you can see that there's far too much overlapping for your idea to work. So really testing elo=0 vs elo!=0 is *not* the same as the sequential Wald test. So there is a good reason why there is so much theory on the test that I'm looking at, and the paper of Volodymyr (on Empirical Bernstein stopping) doesn't even mention the Wald test...