Blitz 10s per game, ponder ON
                                    
1   Critter 1.6 64-bit              +2  +54/=93/-53 50.25%  100.5/200
2   Critter 1.4a 64-bit SSE4    -2  +53/=93/-54 49.75%   99.5/200
 Blitz  10s  per game, ponder OFF                             
                                
1   Critter 1.6 64-bit             +23  +56/=101/-43 53.25%  106.5/200
2   Critter 1.4a 64-bit SSE4   -23  +43/=101/-56 46.75%   93.5/200
Games played at chess960
Fritz 13 GUI
1 core used
i7 980x
windows 7
Best Regards
			
			
									
						
							Critter 1.6 - Critter 1.4a ponder ON/OFF
Moderator: Ras
- 
				ernest
- Posts: 2053
- Joined: Wed Mar 08, 2006 8:30 pm
Re: Critter 1.6 - Critter 1.4a ponder ON/OFF
95% error bar is ±34 EloMM wrote:Code: Select all
1 Critter 1.6 64-bit +23 +56/=101/-43 53.25% 106.5/200 2 Critter 1.4a 64-bit SSE4 -23 +43/=101/-56 46.75% 93.5/200

- 
				MM
- Posts: 766
- Joined: Sun Oct 16, 2011 11:25 am
Re: Critter 1.6 - Critter 1.4a ponder ON/OFF
Hi all, 
just for the record.
utente-PC, Blitz 1m ponder ON, 1 core, 3,33 ghz, no tablebases.
	
1 Critter 1.4a 64-bit SSE4 +110/=280/-110 50.00% 250.0/500 -3036.00
2 Critter 1.6 64-bit +110/=280/-110 50.00% 250.0/500 -3036.00
Best Regards
			
			
									
						
							just for the record.
utente-PC, Blitz 1m ponder ON, 1 core, 3,33 ghz, no tablebases.
1 Critter 1.4a 64-bit SSE4 +110/=280/-110 50.00% 250.0/500 -3036.00
2 Critter 1.6 64-bit +110/=280/-110 50.00% 250.0/500 -3036.00
Best Regards
MM
			
						- 
				MM
- Posts: 766
- Joined: Sun Oct 16, 2011 11:25 am
Re: Critter 1.6 - Critter 1.4a ponder ON/OFF
MM wrote:Hi all,
just for the record.
utente-PC, Blitz 1m ponder ON, 1 core, 3,33 ghz, no tablebases.
1 Critter 1.4a 64-bit SSE4 +110/=280/-110 50.00% 250.0/500 -3036.00
2 Critter 1.6 64-bit +110/=280/-110 50.00% 250.0/500 -3036.00
Best Regards
utente-PC, Blitz 1m ponder OFF
1 Critter 1.6 64-bit +23 +125/=283/-92 53.30% 266.5/500
2 Critter 1.4a 64-bit SSE4 -23 +92/=283/-125 46.70% 233.5/500
MM
			
						- 
				ernest
- Posts: 2053
- Joined: Wed Mar 08, 2006 8:30 pm
Re: Critter 1.6 - Critter 1.4a ponder ON/OFF
95% error bar is now ±21 Elo, which means that there is more than 95% probability that with ponder OFF, Critter 1.6 is stronger than Critter 1.4a SSE4MM wrote:Code: Select all
1 Critter 1.6 64-bit +23 +125/=283/-92 53.30% 266.5/500 2 Critter 1.4a 64-bit SSE4 -23 +92/=283/-125 46.70% 233.5/500
However, it cannot yet be said that the (ponder OFF) and (ponder ON) distributions are distinct, with 95% probability, the global result (summation ON+OFF) being:
Code: Select all
1 Critter 1.6 64-bit	       +15	+235/=563/-202	51.65%		516.5/1000
2 Critter 1.4a 64-bit SSE4	 -15	+202/=563/-235	48.35%		483.5/1000- 
				Ajedrecista  
- Posts: 2141
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: Critter 1.6 - Critter 1.4a, ponder ON/OFF.
Hello Ernest:
It seems that my programme more less agrees with BayesElo in the first match, which is an achievement! However, in the second match, ratting difference is ~ 11.5 Elo, not 15 Elo... which is more likely the error bar. I notice that you are adding two 500-game matches, so I do not know if I can simply use +235 =563 -202 of one 1000-game match or not.
Regards from Spain.
Ajedrecista.
			
			
									
						
										
						I suppose that all these error bars were obtained with the great BayesElo. I ran my own small programme, just to compare my results:ernest wrote:95% error bar is now ±21 Elo, which means that there is more than 95% probability that with ponder OFF, Critter 1.6 is stronger than Critter 1.4a SSE4MM wrote:Code: Select all
1 Critter 1.6 64-bit +23 +125/=283/-92 53.30% 266.5/500 2 Critter 1.4a 64-bit SSE4 -23 +92/=283/-125 46.70% 233.5/500
However, it cannot yet be said that the (ponder OFF) and (ponder ON) distributions are distinct, with 95% probability, the global result (summation ON+OFF) being:This means that it cannot be said (with 95% probability), as of yet, that compared to Critter 1.4a SSE4, Critter 1.6 performs better with ponder OFF than with Ponder ON.Code: Select all
1 Critter 1.6 64-bit +15 +235/=563/-202 51.65% 516.5/1000 2 Critter 1.4a 64-bit SSE4 -15 +202/=563/-235 48.35% 483.5/1000
Code: Select all
LOS_and_Elo_uncertainties_calculator, ® 2012.
----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------
(The input and output data is referred to the first engine).
Please write down non-negative integers.
Write down the number of wins:
125
Write down the number of loses:
92
Write down the number of draws:
283
Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:
3
***************************************
1-sigma confidence ~ 68.27% confidence.
2-sigma confidence ~ 95.45% confidence.
3-sigma confidence ~ 99.73% confidence.
***************************************
---------------------------------------
Elo interval for 1-sigma confidence:
Elo rating difference:     22.96 Elo
Lower rating difference:   12.75 Elo
Upper rating difference:   33.22 Elo
Lower bound uncertainty:  -10.21 Elo
Upper bound uncertainty:   10.25 Elo
Average error:        +/-  10.23 Elo
K = (average error)*[sqrt(n)] =  228.80
Elo interval: ]  12.75,   33.22[
---------------------------------------
Elo interval for 2-sigma confidence:
Elo rating difference:     22.96 Elo
Lower rating difference:    2.56 Elo
Upper rating difference:   43.53 Elo
Lower bound uncertainty:  -20.40 Elo
Upper bound uncertainty:   20.56 Elo
Average error:        +/-  20.48 Elo
K = (average error)*[sqrt(n)] =  458.00
Elo interval: ]   2.56,   43.53[
---------------------------------------
Elo interval for 3-sigma confidence:
Elo rating difference:     22.96 Elo
Lower rating difference:   -7.62 Elo
Upper rating difference:   53.91 Elo
Lower bound uncertainty:  -30.59 Elo
Upper bound uncertainty:   30.95 Elo
Average error:        +/-  30.77 Elo
K = (average error)*[sqrt(n)] =  688.01
Elo interval: ]  -7.62,   53.91[
---------------------------------------
Number of games of the match:                500
Score: 53.30 %
Elo rating difference:   22.96 Elo
Draw ratio: 56.60 %
**********************************************
1 sigma:  1.4657 % of the points of the match.
2 sigma:  2.9314 % of the points of the match.
3 sigma:  4.3970 % of the points of the match.
**********************************************
 Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.
-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------
LOS:  98.78 %
This value of LOS is rounded up to 0.01%
End of the calculations. Approximated elapsed time:  50 ms.
Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
Code: Select all
LOS_and_Elo_uncertainties_calculator, ® 2012.
----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------
(The input and output data is referred to the first engine).
Please write down non-negative integers.
Write down the number of wins:
235
Write down the number of loses:
202
Write down the number of draws:
563
Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:
3
***************************************
1-sigma confidence ~ 68.27% confidence.
2-sigma confidence ~ 95.45% confidence.
3-sigma confidence ~ 99.73% confidence.
***************************************
---------------------------------------
Elo interval for 1-sigma confidence:
Elo rating difference:     11.47 Elo
Lower rating difference:    4.21 Elo
Upper rating difference:   18.74 Elo
Lower bound uncertainty:   -7.26 Elo
Upper bound uncertainty:    7.27 Elo
Average error:        +/-   7.26 Elo
K = (average error)*[sqrt(n)] =  229.67
Elo interval: ]   4.21,   18.74[
---------------------------------------
Elo interval for 2-sigma confidence:
Elo rating difference:     11.47 Elo
Lower rating difference:   -3.04 Elo
Upper rating difference:   26.02 Elo
Lower bound uncertainty:  -14.51 Elo
Upper bound uncertainty:   14.55 Elo
Average error:        +/-  14.53 Elo
K = (average error)*[sqrt(n)] =  459.55
Elo interval: ]  -3.04,   26.02[
---------------------------------------
Elo interval for 3-sigma confidence:
Elo rating difference:     11.47 Elo
Lower rating difference:  -10.30 Elo
Upper rating difference:   33.33 Elo
Lower bound uncertainty:  -21.77 Elo
Upper bound uncertainty:   21.86 Elo
Average error:        +/-  21.81 Elo
K = (average error)*[sqrt(n)] =  689.83
Elo interval: ] -10.30,   33.33[
---------------------------------------
Number of games of the match:               1000
Score: 51.65 %
Elo rating difference:   11.47 Elo
Draw ratio: 56.30 %
**********************************************
1 sigma:  1.0439 % of the points of the match.
2 sigma:  2.0878 % of the points of the match.
3 sigma:  3.1318 % of the points of the match.
**********************************************
 Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.
-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------
LOS:  94.30 %
This value of LOS is rounded up to 0.01%
End of the calculations. Approximated elapsed time:  47 ms.
Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
Regards from Spain.
Ajedrecista.
- 
				ernest
- Posts: 2053
- Joined: Wed Mar 08, 2006 8:30 pm
Re: Critter 1.6 - Critter 1.4a, ponder ON/OFF.
Hi Jesus,Ajedrecista wrote:I suppose that all these error bars were obtained with the great BayesElo.
Not at all, I compute these ratings and error bars by hand (sometimes using a hand calculator), using my basic knowledge (school/university) in statistics.
Here we have a trinomial distribution (win-loss-draw) and when the result is close to 50%, you have SD(sigma)=[sqrt(W+L)]/2N (formula is a little more complicated if the result is not close to 50%)
So I find the 2SD error bar (95% probability) of
Code: Select all
P OFF
1 Critter 1.6 64-bit       +23 +125/=283/-92 53.30% 266.5/500 
2 Critter 1.4a 64-bit SSE4 -23 +92/=283/-125 46.70% 233.5/500and multiplying that % by 7 (valid for low %) you get the Elo error bar = 20.6 rounded to 21.
Now if we want to see if Ponder OFF or ON makes a significant difference in a match between Critter 1.6 and Critter 1.4a SSE4, we have to consider the global (sum) distribution
Code: Select all
P ON+OFF
1 Critter 1.6 64-bit       +12 +235/=563/-202 51.65% 516.5/1000 
2 Critter 1.4a 64-bit SSE4 -12 +202/=563/-235 48.35% 483.5/1000 (you were perfectly right with your However, in the second match, ratting difference is ~ 11.5 Elo, not 15 Elo... which is more likely the error bar
 ).
 ).If from this 1000-game distribution you pick a 500-game sample, you expect that sample to have a mean of 51.65% (or +12 Elo) and a SD of sqrt(1000/500)*15/2= 11 Elo
Since the actual P ON sample (50%, 0 Elo) is 12 Elo away from that mean, SD being 11 Elo, that P ON sample does not distinguish itself enough from the P ON+OFF distribution.
Same reasoning for the actual P OFF sample (53.3%, 23 Elo).
- 
				Ajedrecista  
- Posts: 2141
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: Critter 1.6 - Critter 1.4a, ponder ON/OFF.
Hi again!
You will find a value for the average error |<e>| in the post I called #2. This is:
If you had not posted the trick of multiplying by seven, I will not realize never about this number 16/ln(10), so today I have learnt something. Thanks!
@Maurizio: There is no intention of hijacking your thread, but you can see that statistics applied to error bars could be a whole world! At least it is boundless for me. Thanks for your comprehension and your tests! Please keep up the good work.
Regards from Spain.
Ajedrecista.
			
			
									
						
										
						I see. I also used to calculate them by hand with the only help of a hand calculator, until I did a programme in Fortran. I use this standard deviation:ernest wrote:I compute these ratings and error bars by hand (sometimes using a hand calculator), using my basic knowledge (school/university) in statistics.
I took this formula from the 22nd post of this thread. I posted two messages in January that might be useful: #1 and #2.n = wins + draws + loses
µ = (wins + draws/2)/n
D = draws/n
σ = sqrt{[µ·(1 - µ) - D/4]/n}
I did not know that, when µ ~ 0.5, then σ ~ sqrt(wins + loses)/2n in this trinomial distribution. It is interesting, so thank you for share it. Rewriting your standard deviation using the draw ratio D: σ = sqrt[n·(1 - D)]/2n = (1/2)·sqrt[(1 - D)/n]. If I compare our nσ², I obtain:ernest wrote:Here we have a trinomial distribution (win-loss-draw) and when the result is close to 50%, you have SD(sigma)=[sqrt(W+L)]/2N (formula is a little more complicated if the result is not close to 50%)
Which are exactly the same with µ = 1/2. Your nσ² does not depend on µ, while mine yes... although the expression of σ that I use is not good for µ (or 1 - µ) > 0.85 or 0.9, for saying something. For your info: µ must be in the interval [0.15, 0.85] in my programme, else it does not calculate anything. The farest is µ from 1/2, the less accurate is the value of σ; it also has a problem with the extreme case of D = 1 (100% of draws), when σ = 0. But it is just a model that works reasonably well in real cases.(Yours): nσ² = (1 - D)/4
(Mine): nσ² = µ·(1 - µ) - D/4; (mine with µ = 0.5): nσ² = (1 - D)/4
You will find a value for the average error |<e>| in the post I called #2. This is:
Where k denotes the confidence level (k = 1.96 for ~ 95% confidence, k = 2 for ~ 95.45% confidence, etc.). If I replace µ = 1/2 in that equation:|<e>| = 200·log[(µ + kσ)(1 - µ + kσ)/(µ - kσ)(1 - µ - kσ)]
Here, σ is not in percentage; if you want σ in percentage, then the constant that multiplies kσ is 16/ln(10) ~ 6.9487, which is almost your seven. This could be valid in my approximation with a normal distribution, although this should be valid only when µ = 0.5 and kσ (or σ, for reasonable confidence levels, where k is finite) tends to zero, because it is a rough approximation with those assumptions.|<e>| = 200·log[(0.5 + kσ)(0.5 + kσ)/(0.5 - kσ)(0.5 - kσ)] = 400·log[(0.5 + kσ)/(0.5 - kσ)] = [400/ln(10)]·[ln(1 + 2kσ) - ln(1 - 2kσ)]
With kσ > 0 and kσ << 1 (lots of games): ln(1 + 2kσ) ~ 2kσ; ln(1 - 2kσ) ~ -2kσ
|<e>| ~ 400·4kσ/ln(10) = [1600/ln(10)]·kσ
If you had not posted the trick of multiplying by seven, I will not realize never about this number 16/ln(10), so today I have learnt something. Thanks!
@Maurizio: There is no intention of hijacking your thread, but you can see that statistics applied to error bars could be a whole world! At least it is boundless for me. Thanks for your comprehension and your tests! Please keep up the good work.
Regards from Spain.
Ajedrecista.
- 
				MM
- Posts: 766
- Joined: Sun Oct 16, 2011 11:25 am
Re: Critter 1.6 - Critter 1.4a, ponder ON/OFF.
Hi, i'm only glad of this interestAjedrecista wrote:
@Maurizio: There is no intention of hijacking your thread, but you can see that statistics applied to error bars could be a whole world! At least it is boundless for me. Thanks for your comprehension and your tests! Please keep up the good work.
Regards from Spain.
Ajedrecista.
 And i'm interested too. Thanks
 And i'm interested too. ThanksMM
			
						- 
				ernest
- Posts: 2053
- Joined: Wed Mar 08, 2006 8:30 pm
Re: Critter 1.6 - Critter 1.4a, ponder ON/OFF.
Hi Jesus,Ajedrecista wrote:Hi again!
Thanks for this detailed post, I will study it carefully!
Of course, your program gives more accurate numbers, I only get (not too bad) approximations.
Do you have a comment on my section starting with
Now if we want to see if Ponder OFF or ON makes a significant difference in a match between Critter 1.6 and Critter 1.4a SSE4, we have to consider the global (sum) distribution
which shows that so far (i.e. with only those 500+500 games) the difference is NOT significant?