I started to run some tests again and just to do something different I decided to use the 2-SPRT instead of the SPRT. Recall that the 2-SPRT optimizes the worst case behaviour of a sequential test which is when the real elo is precisely in the middle of the upper and lower elo bounds. Also unlike the SPRT the 2-SPRT has an absolute upper bound on its running time.
Here is the script I wrote some time ago to calculate the 2-SPRT parameters.
http://hardy.uhasselt.be/Toga/sprt2.py
To validate the results the script can do simulation and it can even compute the operating characteristic (OC) and average sample number (ASN) of the test exactly!!
Here are some notes on the formulas
http://hardy.uhasselt.be/Toga/support_whitehead.pdf
These are mainly notes to myself, so they are neither very rigorous nor very readable. You can see that the formulas used by the script are based on approximations but these hold to a high degree of accuracy for small elo differences.
			
			
									
						
							2-SPRT
Moderator: Ras
- 
				Ajedrecista  
- Posts: 2143
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: 2-SPRT.
Hello Michel:
I see that the point is to minimize the expected length of a test whose patch has an Bayeselo gain of e = (theta_0 + theta_1)/2 using the notation of the paper. I hope 2-SPRT does not harm the average run of other cases such as e = theta_0.
Could you post some examples of the saving of the number of games, please? I remember that it was posted in SF Google Group and I recall a number of 10% less games for the case of e = (theta_0 + theta_1)/2. Is it right? Here are two links that I found:
https://groups.google.com/forum/#!msg/f ... jQl8P_jAsJ
https://groups.google.com/forum/#!msg/f ... 4hXAj7t2IJ
For example, in a normal SPRT(-1.5, 4.5) with e = 1.5, d = 240, alpha = 0.05 = beta and 20000 simulations (446.06 seconds with a speed of 1305193 games/second), I get an average run of circa (29110 ± 326) games with 95% confidence, a median run of 22178 games, a shortest run of 1338 games (+211 -314 =813, fail), a longest run of 242686 games (+48992 -48427 =145267, fail), 10117 passes and 9883 fails (near to the expected 50% - 50%), and circa 99.39% of finished simulations with less than 128000 games (useful info for Fishtest). These values match with yours? If that, in this case I expect something like 26200 games on average using 2-SPRT.
Thanks in advance.
------------------------
On other side, I made a correction here. I think I am right... but I am very, very far from perfection. Any confirmation is welcome.
  Any confirmation is welcome.
Regards from Spain.
Ajedrecista.
			
			
									
						
										
						The paper is very interesting, thanks for share it. I guess that the script is also very valuable, but since I do not have Python and do not know how to run a script (my bad! :O), I can not get info from it.Michel wrote:I started to run some tests again and just to do something different I decided to use the 2-SPRT instead of the SPRT. Recall that the 2-SPRT optimizes the worst case behaviour of a sequential test which is when the real elo is precisely in the middle of the upper and lower elo bounds. Also unlike the SPRT the 2-SPRT has an absolute upper bound on its running time.
Here is the script I wrote some time ago to calculate the 2-SPRT parameters.
http://hardy.uhasselt.be/Toga/sprt2.py
To validate the results the script can do simulation and it can even compute the operating characteristic (OC) and average sample number (ASN) of the test exactly!!
Here are some notes on the formulas
http://hardy.uhasselt.be/Toga/support_whitehead.pdf
These are mainly notes to myself, so they are neither very rigorous nor very readable. You can see that the formulas used by the script are based on approximations but these hold to a high degree of accuracy for small elo differences.
I see that the point is to minimize the expected length of a test whose patch has an Bayeselo gain of e = (theta_0 + theta_1)/2 using the notation of the paper. I hope 2-SPRT does not harm the average run of other cases such as e = theta_0.
Could you post some examples of the saving of the number of games, please? I remember that it was posted in SF Google Group and I recall a number of 10% less games for the case of e = (theta_0 + theta_1)/2. Is it right? Here are two links that I found:
https://groups.google.com/forum/#!msg/f ... jQl8P_jAsJ
https://groups.google.com/forum/#!msg/f ... 4hXAj7t2IJ
For example, in a normal SPRT(-1.5, 4.5) with e = 1.5, d = 240, alpha = 0.05 = beta and 20000 simulations (446.06 seconds with a speed of 1305193 games/second), I get an average run of circa (29110 ± 326) games with 95% confidence, a median run of 22178 games, a shortest run of 1338 games (+211 -314 =813, fail), a longest run of 242686 games (+48992 -48427 =145267, fail), 10117 passes and 9883 fails (near to the expected 50% - 50%), and circa 99.39% of finished simulations with less than 128000 games (useful info for Fishtest). These values match with yours? If that, in this case I expect something like 26200 games on average using 2-SPRT.
Thanks in advance.
------------------------
On other side, I made a correction here. I think I am right... but I am very, very far from perfection.
 Any confirmation is welcome.
  Any confirmation is welcome.Regards from Spain.
Ajedrecista.
- 
				Michel
- Posts: 2292
- Joined: Mon Sep 29, 2008 1:50 am
Re: 2-SPRT.
Yes it hurts a bit at theta0 and theta1. It has to because the SPRT is precisely optimal at these values. The choice between the SPRT and 2-SPRT is mainly psychological. For me improving the worst case behaviour is important since testing basically freezes my workflow.
Here is some sample output
			
			
									
						
							Here is some sample output
Code: Select all
$ python sprt2.py --expected
Test parameters:
alpha=0.05
beta=0.05
elo0=-1.50
elo1=4.50
draw_elo=240
Continuation region:
1.00277097*N-83.412576 < S < 1.00000000*N+83.412576
Worst case expected running time: 25363
Maximal running time: 60205
Approximate expected running time:
 Elo           SPRT       SPRT2     Hoeffding
=====          =====      =====     =========
-4.50           9570      12052           
-3.90          10587      13078           
-3.30          11815      14270           
-2.70          13309      15649           
-2.10          15129      17219           
-1.50(H0)      17322      18953           
-0.90          19891      20768           
-0.30          22714      22516           
 0.30          25459      24000           
 0.90          27544      25011           
 1.50          28336      25363      24448
 2.10          27544      25011           
 2.70          25458      24000           
 3.30          22714      22516           
 3.90          19890      20767           
 4.50(H1)      17322      18952           
 5.10          15128      17219           
 5.70          13309      15649           
 6.30          11814      14270           
 6.90          10586      13078           
 7.50           9570      12052 Ideas=science. Simplification=engineering. 
Without ideas there is nothing to simplify.
			
						Without ideas there is nothing to simplify.
- 
				Michel
- Posts: 2292
- Joined: Mon Sep 29, 2008 1:50 am
Re: 2-SPRT.
The best sequential test is actually the "Schwarz" test.
http://hardy.uhasselt.be/Toga/ref_schwarz.pdf
Here is an old .pdf I found
http://hardy.uhasselt.be/Toga/schwarz_results.pdf
One can see that the Schwarz test is almost equivalent to the SPRT at elo0,elo1 but the worst case behaviour is substantially better (similar to the 2-SPRT).
The results in the above .pdf were established by simulation so they took a long time to generate. Meanwhile I know how to compute the parameters of a Schwarz test theoretically and I have a script for that as well. Since I now plan to switch to the Schwarz test for my own testing, I will clean up the script and post it here.
			
			
									
						
							http://hardy.uhasselt.be/Toga/ref_schwarz.pdf
Here is an old .pdf I found
http://hardy.uhasselt.be/Toga/schwarz_results.pdf
One can see that the Schwarz test is almost equivalent to the SPRT at elo0,elo1 but the worst case behaviour is substantially better (similar to the 2-SPRT).
The results in the above .pdf were established by simulation so they took a long time to generate. Meanwhile I know how to compute the parameters of a Schwarz test theoretically and I have a script for that as well. Since I now plan to switch to the Schwarz test for my own testing, I will clean up the script and post it here.
Ideas=science. Simplification=engineering. 
Without ideas there is nothing to simplify.
			
						Without ideas there is nothing to simplify.
- 
				Ajedrecista  
- Posts: 2143
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Comparison of our SPRT results.
Hello Michel:
In my case: power(-7.5) > power(-7)... well, bad luck! I see that in general my average number of games is slightly higher than yours... I suppose that we choosed the same value for drawelo parameter. The opposite for the values of power: yours are higher in general.
The sum of my elapsed times is 2283.11 seconds = 0:38:03.11 if I am right. Of course I spent extra time in changing the Bayeselo value each time in the input notepad, but less than an hour for 160000 simulations in SPRT(-3, 3) is not bad in such an old computer IMHO. I do not know the number of simulations you ran, but I would say that we agree in our results in general terms.
Your PDF is trimmed at the right margin (the last numbers are not visible). For 0 Bayeselo, the four values are: 28431.7, 0.4996; 25349.3, 0.4993. Just as a side note: my results for average number of games are rounded to the closest integers.
Good luck with your Schwarz tests! I stay tuned.
Regards from Spain.
Ajedrecista.
			
			
									
						
										
						Just to check my SPRT simulator, I ran it in my Intel Pentium D930 of year 2006. The programme is 32-bit and single core. I understand the SPRT power as the probability to accept the patch. Here are my results:Michel wrote:The results in the above .pdf were established by simulation so they took a long time to generate.
Code: Select all
alpha:            0.0500
beta:             0.0500
drawelo_fixed:  240.0000
bayeselo_0:      -3.0000
bayeselo_1:       3.0000
Simulations:   10000 (each time)Code: Select all
Comparison of results (M = Michel's; J = Jesús'):
Bayeselo     <Games (M)>     <Games (J)>     Power(M)     Power(J)     Elapse time (J)
========     ===========     ===========     ========     ========     ===============
  -7.5          7709.9           7906         0.0006       0.0008         79.55 sec. 
  -7            8276.9           8470         0.0012       0.0005         83.23 sec.
  -6.5          8894.5           9101         0.0017       0.0011         88.23 sec.
  -6            9604.4           9840         0.0028       0.0012         93.72 sec.
  -5.5         10440.0          10619         0.0045       0.0035        102.09 sec.
  -5           11426.9          11617         0.0074       0.0060        104.10 sec.
  -4.5         12587.8          12787         0.0115       0.0091        112.02 sec.
  -4           13926.1          14122         0.0193       0.0194        121.12 sec.
  -3.5         15532.2          15755         0.0312       0.0294        132.37 sec.
  -3           17409.8          17593         0.0483       0.0476        149.39 sec.
  -2.5         19561.5          19713         0.0798       0.0771        162.78 sec.
  -2           21918.5          22128         0.1240       0.1210        179.91 sec.
  -1.5         24377.8          24903         0.1842       0.1790        196.77 sec.
  -1           26456.1          26916         0.2729       0.2687        214.55 sec.
  -0.5         27987.2          28501         0.3799       0.3791        228.82 sec.
   0           28431.7          28899         0.4996       0.5010        234.46 sec.The sum of my elapsed times is 2283.11 seconds = 0:38:03.11 if I am right. Of course I spent extra time in changing the Bayeselo value each time in the input notepad, but less than an hour for 160000 simulations in SPRT(-3, 3) is not bad in such an old computer IMHO. I do not know the number of simulations you ran, but I would say that we agree in our results in general terms.
Your PDF is trimmed at the right margin (the last numbers are not visible). For 0 Bayeselo, the four values are: 28431.7, 0.4996; 25349.3, 0.4993. Just as a side note: my results for average number of games are rounded to the closest integers.
Good luck with your Schwarz tests! I stay tuned.
Regards from Spain.
Ajedrecista.
- 
				Michel
- Posts: 2292
- Joined: Mon Sep 29, 2008 1:50 am
Re: Comparison of our SPRT results.
Well I don't remember how I generated the SPRT results... This is just an old .pdf I found from the time I was learning about sequential testing.In my case: power(-7.5) > power(-7)... well, bad luck! I see that in general my average number of games is slightly higher than yours... I suppose that we choosed the same value for drawelo parameter. The opposite for the values of power: yours are higher in general.
Sequential testing involves really beautiful math. It is in a sense a solved problem as for a given prior there is a way to generate an optimal solution (using dynamic programming) that minimizes the average duration of the test for that prior, given overall type 1 and type 2 error bounds. Asymptotically things improve and the optimal solution depends only on the support of the prior (places where it is non-zero). The SPRT, 2-SPRT and the Schwarz tests are all asymptotic solutions of this type.
Ah. It is not really trimmed, but really stops at 0 (since the table is symmetric the values >0 can be derived from those <0). But I agree the layout could be much better.Your PDF is trimmed at the right margin (the last numbers are not visible). For 0 Bayeselo, the four values are: 28431.7, 0.4996; 25349.3, 0.4993. Just as a side note: my results for average number of games are rounded to the closest integers.
I am adding a section to my document on the 2-SPRT about the Schwarz test. To simplify things I am again using Taylor series approximations to the likelihood functions. After that I will implement the formulas in a script.Good luck with your Schwarz tests! I stay tuned.
Ideas=science. Simplification=engineering. 
Without ideas there is nothing to simplify.
			
						Without ideas there is nothing to simplify.
- 
				Michel
- Posts: 2292
- Joined: Mon Sep 29, 2008 1:50 am
Re: Comparison of our SPRT results.
Addendum to avoid potential confusion: by a lucky mathematical accident the optimality properties of the SPRT are not just asymptotic. They hold always!The SPRT, 2-SPRT and the Schwarz tests are all asymptotic solutions of this type.
Ideas=science. Simplification=engineering. 
Without ideas there is nothing to simplify.
			
						Without ideas there is nothing to simplify.
- 
				Ajedrecista  
- Posts: 2143
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: Comparison of our SPRT results.
Hello again:
I agree: the layout could be better just writing the value of drawelo for this example (probably 240) and fixing the issue of the right margin, as well as the values of alpha and beta (I know that they could be derived easily with power(elo0) if we suppose that alpha = beta, and it looks like alpha = 1/20 = beta this time) and the number of simulations... but if this info is quite old and you do not remember the details, there is nothing to be done. Otherwise it is perfect.
Once again, good luck with your new tests.
Regards from Spain.
Ajedrecista.
			
			
									
						
										
						I was not talking about the lack of values for Bayeselo = {0.5, 1, ...}. Since elo0 + elo1 = 0, I already understood that you said that the values for elo_i are the same (in number of games) to the values of -elo_i, while power(elo_i) = 1 - power(-elo_i) in this special case of elo0 = -elo1. What I wanted to say is that for 0 Bayeselo results I only see {28431, 0.499; 25349, 0.499} instead of {28431.7, 0.4996; 25349.3, 0.4993}, that is, the last number of each result is missing due to the margin, but I managed to get the full results just copying and pasting the full text of the PDF into a Notepad.Michel wrote:Ah. It is not really trimmed, but really stops at 0 (since the table is symmetric the values >0 can be derived from those <0). But I agree the layout could be much better.Ajedrecista wrote:Your PDF is trimmed at the right margin (the last numbers are not visible). For 0 Bayeselo, the four values are: 28431.7, 0.4996; 25349.3, 0.4993. Just as a side note: my results for average number of games are rounded to the closest integers.
I agree: the layout could be better just writing the value of drawelo for this example (probably 240) and fixing the issue of the right margin, as well as the values of alpha and beta (I know that they could be derived easily with power(elo0) if we suppose that alpha = beta, and it looks like alpha = 1/20 = beta this time) and the number of simulations... but if this info is quite old and you do not remember the details, there is nothing to be done. Otherwise it is perfect.
Once again, good luck with your new tests.
Regards from Spain.
Ajedrecista.