Masta wrote:Yeah...seems that SF will run over other engines like a damn TRUCK!
18 days from release date of SF4 and almost +30 ELO gain. -> http://95.47.140.100/tests/view/522bcb1 ... 2ee68dc04a
Have a nice day yo false magicians. Your days are counted.
18 days from SF4 release and about ~30+ ELO gain!
Moderator: Ras
-
- Posts: 5297
- Joined: Thu Mar 09, 2006 9:40 am
- Full name: Vincent Lejeune
Re: 18 days from SF4 release and about ~30+ ELO gain!
Now, 23 days but it's not easy to know the rating of the latest SF-dev against other engines ...
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: 19 days from SF 4 release and about ~30 Elo gain!
Ok, I have now derived the formula to calculate Elo points error in a match of two engines. Percentage error is easyAjedrecista wrote:Hello Kai:
[800/ln(10)]*1.96*sqrt(4*0.645*0.355 - 0.245)/sqrt(30080) ~ 3.22 indeed. But please note that it is not what I wrote. Your typo comes in sqrt(4*0.645*0.355 - 0.245) ~ 0.819, while I wrote sqrt[1/(4*0.645*0.355) - 0.245] ~ 0.9202. Of course: (0.9202/0.819)*3.21 ~ 3.61 (my estimate). Thanks for your interest.Laskos wrote:I don't understand your formula, isn't it 800/log(10) * 1.96*sqrt(4*0.645*0.355-0.245)/sqrt(30080) ~ 3.22 Elo points?Ajedrecista wrote:
At first approximation, I would say for a score of 64.5% and a draw ratio of 24.5%: ±800*1.96*sqrt[1/(4*0.645*0.355) - 0.245]/[ln(10)*sqrt(30080)] ~ ± 3.61 Elo.
Regards from Spain.
Ajedrecista.
Regards from Spain.
Ajedrecista.
Code: Select all
Error in % is = 100% *SD*Sqrt[s*(1-s) - d/4]/Sqrt[n]
Transferring percentages to Elo gives
Code: Select all
Elo difference = 400*Log[(1 - s)/s]/Log[10]
Its derivative in s:
400/((1 - s)*s*Log[10])
Elo error = 200*Sqrt[4*(1 - s)*s - d] * SD/(Sqrt[n]*(1 - s)*s*Log[10])
-
- Posts: 2121
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: 19 days from SF 4 release and about ~30 Elo gain!
Hello again:

Regards from Spain.
Ajedrecista.
Good job! Thanks for sharing it. Just as a side note: Elo difference = -400*ln[(1 - µ)/µ]/ln(10). The minus sign is important: with your formula: µ > 1/2 brings negative Elo differences! The derivative is fine (it would be negative if you derivate from the wrong formula, but it is positive with the correct formula). I am also satisfied with your last formula that you named 'Elo error'.Laskos wrote:
Ok, I have now derived the formula to calculate Elo points error in a match of two engines. Percentage error is easywhere SD is the desired standard deviations, s is the score, d is the draw ration, n is the number of games.Code: Select all
Error in % is = 100% *SD*Sqrt[s*(1-s) - d/4]/Sqrt[n]
Transferring percentages to Elo givesand in Robert Hyatt's case it's 3.51 Elo points, as derived by you by your "own model".Code: Select all
Elo difference = 400*Log[(1 - s)/s]/Log[10] Its derivative in s: 400/((1 - s)*s*Log[10]) Elo error = 200*Sqrt[4*(1 - s)*s - d] * SD/(Sqrt[n]*(1 - s)*s*Log[10])

Regards from Spain.
Ajedrecista.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: 19 days from SF 4 release and about ~30 Elo gain!
I just did it quickly to have a positive derivative, and a positive error. The minus sign of the beginning disappeared on the wayAjedrecista wrote:Hello again:
Good job! Thanks for sharing it. Just as a side note: Elo difference = -400*ln[(1 - µ)/µ]/ln(10). The minus sign is important: with your formula: µ > 1/2 brings negative Elo differences! The derivative is fine (it would be negative if you derivate from the wrong formula, but it is positive with the correct formula). I am also satisfied with your last formula that you named 'Elo error'.Laskos wrote:
Ok, I have now derived the formula to calculate Elo points error in a match of two engines. Percentage error is easywhere SD is the desired standard deviations, s is the score, d is the draw ration, n is the number of games.Code: Select all
Error in % is = 100% *SD*Sqrt[s*(1-s) - d/4]/Sqrt[n]
Transferring percentages to Elo givesand in Robert Hyatt's case it's 3.51 Elo points, as derived by you by your "own model".Code: Select all
Elo difference = 400*Log[(1 - s)/s]/Log[10] Its derivative in s: 400/((1 - s)*s*Log[10]) Elo error = 200*Sqrt[4*(1 - s)*s - d] * SD/(Sqrt[n]*(1 - s)*s*Log[10])
Regards from Spain.
Ajedrecista.

-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: 19 days from SF 4 release and about ~30 Elo gain!
And just as a rule of thumb in a match of two engines, for 95% confidence interval:
Elo error = 170*Sqrt[4*(1 - s)*s - d] / (s*(1 - s)*Sqrt[N])
Where N is the number of games, s is the score, i.e. (W+D/2)/N, d is the draw ratio.
For Robert Hyatt match:
s=0.645
d=0.245
N=30080
Error: +/- 3.51 Elo points 95% confidence interval.
Elo error = 170*Sqrt[4*(1 - s)*s - d] / (s*(1 - s)*Sqrt[N])
Where N is the number of games, s is the score, i.e. (W+D/2)/N, d is the draw ratio.
For Robert Hyatt match:
s=0.645
d=0.245
N=30080
Error: +/- 3.51 Elo points 95% confidence interval.
-
- Posts: 3241
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: 19 days from SF 4 release and about ~30 Elo gain!
Discussions about how to calcualte a p-value, and a confidence interval, really never end in this forum!gladius wrote: I used my rating calculator http://forwardcoding.com/projects/ajaxchess/rating.html, assuming 60% draw rate, and 10 elo advantage. It gives this:
ELO: 9.93 +- 8.15
LOS: 99.99%
Wins: 1600 Losses: 1400 Draws: 4000
However, I just tried it with fishtest's stat_util.py https://github.com/glinscott/fishtest/b ... at_util.py and it gives
ELO: 11.56 +- 6.2
LOS: 99.99%
I would tend to trust stat_util.py more, but I'm not honestly sure.
Don't listen to charlatans: stat_util.py is correct.
The code is really self-explanatory:
Code: Select all
def get_elo(WLD):
# win/loss/draw ratio
N = sum(WLD)
w = float(WLD[0])/N
l = float(WLD[1])/N
d = float(WLD[2])/N
# mu is the empirical mean of the variables (Xi), assumed i.i.d.
mu = w + d/2
# stdev is the empirical standard deviation of the random variable (X1+...+X_N)/N
stdev = math.sqrt(w*(1-mu)**2 + l*(0-mu)**2 + d*(0.5-mu)**2) / math.sqrt(N)
# 95% confidence interval for mu
mu_min = mu + phi_inv(0.025) * stdev
mu_max = mu + phi_inv(0.975) * stdev
el = elo(mu)
elo95 = (elo(mu_max) - elo(mu_min)) / 2
los = phi((mu-0.5) / stdev)
return el, elo95, los
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: 19 days from SF 4 release and about ~30 Elo gain!
Sorry, I didn't follow the discussion, but for Wins: 1600 Losses: 1400 Draws: 4000, what's the matter with 11.56 Elo points difference, while it's 9.93?lucasart wrote:Discussions about how to calcualte a p-value, and a confidence interval, really never end in this forum!gladius wrote: I used my rating calculator http://forwardcoding.com/projects/ajaxchess/rating.html, assuming 60% draw rate, and 10 elo advantage. It gives this:
ELO: 9.93 +- 8.15
LOS: 99.99%
Wins: 1600 Losses: 1400 Draws: 4000
However, I just tried it with fishtest's stat_util.py https://github.com/glinscott/fishtest/b ... at_util.py and it gives
ELO: 11.56 +- 6.2
LOS: 99.99%
I would tend to trust stat_util.py more, but I'm not honestly sure.
Don't listen to charlatans: stat_util.py is correct.
The code is really self-explanatory:Code: Select all
def get_elo(WLD): # win/loss/draw ratio N = sum(WLD) w = float(WLD[0])/N l = float(WLD[1])/N d = float(WLD[2])/N # mu is the empirical mean of the variables (Xi), assumed i.i.d. mu = w + d/2 # stdev is the empirical standard deviation of the random variable (X1+...+X_N)/N stdev = math.sqrt(w*(1-mu)**2 + l*(0-mu)**2 + d*(0.5-mu)**2) / math.sqrt(N) # 95% confidence interval for mu mu_min = mu + phi_inv(0.025) * stdev mu_max = mu + phi_inv(0.975) * stdev el = elo(mu) elo95 = (elo(mu_max) - elo(mu_min)) / 2 los = phi((mu-0.5) / stdev) return el, elo95, los
-
- Posts: 568
- Joined: Tue Dec 12, 2006 10:10 am
- Full name: Gary Linscott
Re: 19 days from SF 4 release and about ~30 Elo gain!
It's because I can't type! I entered draws as 3000 when testing stat_util.py. When using the correct 1600,1400,4000, I get:Laskos wrote:Sorry, I didn't follow the discussion, but for Wins: 1600 Losses: 1400 Draws: 4000, what's the matter with 11.56 Elo points difference, while it's 9.93?lucasart wrote:Discussions about how to calcualte a p-value, and a confidence interval, really never end in this forum!gladius wrote: I used my rating calculator http://forwardcoding.com/projects/ajaxchess/rating.html, assuming 60% draw rate, and 10 elo advantage. It gives this:
ELO: 9.93 +- 8.15
LOS: 99.99%
Wins: 1600 Losses: 1400 Draws: 4000
However, I just tried it with fishtest's stat_util.py https://github.com/glinscott/fishtest/b ... at_util.py and it gives
ELO: 11.56 +- 6.2
LOS: 99.99%
I would tend to trust stat_util.py more, but I'm not honestly sure.
Don't listen to charlatans: stat_util.py is correct.
The code is really self-explanatory:Code: Select all
def get_elo(WLD): # win/loss/draw ratio N = sum(WLD) w = float(WLD[0])/N l = float(WLD[1])/N d = float(WLD[2])/N # mu is the empirical mean of the variables (Xi), assumed i.i.d. mu = w + d/2 # stdev is the empirical standard deviation of the random variable (X1+...+X_N)/N stdev = math.sqrt(w*(1-mu)**2 + l*(0-mu)**2 + d*(0.5-mu)**2) / math.sqrt(N) # 95% confidence interval for mu mu_min = mu + phi_inv(0.025) * stdev mu_max = mu + phi_inv(0.975) * stdev el = elo(mu) elo95 = (elo(mu_max) - elo(mu_min)) / 2 los = phi((mu-0.5) / stdev) return el, elo95, los
Elo: 9.9294334900128192
+-: 5.3203271189105408,
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: 19 days from SF 4 release and about ~30 Elo gain!
These numbers are correct. 9.93 and 5.32. The last one you can derive using Elo error = 170*Sqrt[4*(1 - s)*s - d] / (s*(1 - s)*Sqrt[N]), but it's all the same O(1) formula, so use the Lucas way.gladius wrote:It's because I can't type! I entered draws as 3000 when testing stat_util.py. When using the correct 1600,1400,4000, I get:Laskos wrote:Sorry, I didn't follow the discussion, but for Wins: 1600 Losses: 1400 Draws: 4000, what's the matter with 11.56 Elo points difference, while it's 9.93?lucasart wrote:Discussions about how to calcualte a p-value, and a confidence interval, really never end in this forum!gladius wrote: I used my rating calculator http://forwardcoding.com/projects/ajaxchess/rating.html, assuming 60% draw rate, and 10 elo advantage. It gives this:
ELO: 9.93 +- 8.15
LOS: 99.99%
Wins: 1600 Losses: 1400 Draws: 4000
However, I just tried it with fishtest's stat_util.py https://github.com/glinscott/fishtest/b ... at_util.py and it gives
ELO: 11.56 +- 6.2
LOS: 99.99%
I would tend to trust stat_util.py more, but I'm not honestly sure.
Don't listen to charlatans: stat_util.py is correct.
The code is really self-explanatory:Code: Select all
def get_elo(WLD): # win/loss/draw ratio N = sum(WLD) w = float(WLD[0])/N l = float(WLD[1])/N d = float(WLD[2])/N # mu is the empirical mean of the variables (Xi), assumed i.i.d. mu = w + d/2 # stdev is the empirical standard deviation of the random variable (X1+...+X_N)/N stdev = math.sqrt(w*(1-mu)**2 + l*(0-mu)**2 + d*(0.5-mu)**2) / math.sqrt(N) # 95% confidence interval for mu mu_min = mu + phi_inv(0.025) * stdev mu_max = mu + phi_inv(0.975) * stdev el = elo(mu) elo95 = (elo(mu_max) - elo(mu_min)) / 2 los = phi((mu-0.5) / stdev) return el, elo95, los
Elo: 9.9294334900128192
+-: 5.3203271189105408,
-
- Posts: 2121
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: 19 days from SF 4 release and about ~30 Elo gain!
Hello:
I see that some patches fail SPRT with score > 50% from time to time in SF testing framework and some people there want to give an additional try to these patches. My question is: how often is this scenario present? I did my own SPRT simulator inspired by this Lucas' post. Here are my results for stages I and II:
Probabilities of pass these two stages are:
If one patch passes Stage I then plays at Stage II. The average number of games until stop by SPRT rules (considering a game at Stage II about four times longer than a game at Stage I):
I hope that someone can confirm my results. It is needless to say that my simulator is far from perfect.
Regards from Spain.
Ajedrecista.
I see that some patches fail SPRT with score > 50% from time to time in SF testing framework and some people there want to give an additional try to these patches. My question is: how often is this scenario present? I did my own SPRT simulator inspired by this Lucas' post. Here are my results for stages I and II:
Code: Select all
Stage I --> SPRT (-1.5, 4.5):
alpha = beta = 0.05 (5%); 'a priori' drawelo = 240.
10000 simulations each time.
Bayeselo Passes Fails <Games> Fails with score > 50% Fails with score = 50%
0.5 2702 7298 26316 1493 27
1 3782 6218 28037 1423 25
1.5 4995 5005 28599 1211 21
2 6196 3804 27908 850 13
2.5 7249 2751 26427 552 12
3 8127 1873 24447 345 7
=============================================================================================================
Stage II --> SPRT (0, 6):
alpha = beta = 0.05 (5%); 'a priori' drawelo = 270.
10000 simulations each time.
Bayeselo Passes Fails <Games> Fails with score > 50% Fails with score = 50%
0.5 791 9209 20872 3477 43
1 1190 8810 23704 3830 34
1.5 1808 8192 26478 3950 32
2 2737 7263 28387 3733 26
2.5 3796 6204 30394 3358 29
3 5008 4992 30719 2741 23
Code: Select all
Bayeselo = 0.5: P ~ 0.2702*0.0791 ~ 2.14%
Bayeselo = 1: P ~ 0.3782*0.1190 ~ 4.50%
Bayeselo = 1.5: P ~ 0.4995*0.1808 ~ 9.03%
Bayeselo = 2: P ~ 0.6196*0.2737 ~ 16.96%
Bayeselo = 2.5: P ~ 0.7249*0.3796 ~ 27.52%
Bayeselo = 3: P ~ 0.8127*0.5008 ~ 40.70%
Code: Select all
Bayeselo = 0.5: <games> ~ 26316 + 4*20872 = 109804 @ 15+0.05 ~ 27451 @ 60+0.05
Bayeselo = 1: <games> ~ 28037 + 4*23704 = 122853 @ 15+0.05 ~ 30713 @ 60+0.05
Bayeselo = 1.5: <games> ~ 28599 + 4*26478 = 134511 @ 15+0.05 ~ 33628 @ 60+0.05
Bayeselo = 2: <games> ~ 27908 + 4*28387 = 141456 @ 15+0.05 ~ 35364 @ 60+0.05
Bayeselo = 2.5: <games> ~ 26427 + 4*30394 = 148003 @ 15+0.05 ~ 37001 @ 60+0.05
Bayeselo = 3: <games> ~ 24447 + 4*30719 = 147323 @ 15+0.05 ~ 36831 @ 60+0.05
Regards from Spain.
Ajedrecista.