18 days from SF4 release and about ~30+ ELO gain!

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Vinvin
Posts: 5297
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: 18 days from SF4 release and about ~30+ ELO gain!

Post by Vinvin »

Now, 23 days but it's not easy to know the rating of the latest SF-dev against other engines ...

Masta wrote:Yeah...seems that SF will run over other engines like a damn TRUCK!

18 days from release date of SF4 and almost +30 ELO gain. -> http://95.47.140.100/tests/view/522bcb1 ... 2ee68dc04a

Have a nice day yo false magicians. Your days are counted.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: 19 days from SF 4 release and about ~30 Elo gain!

Post by Laskos »

Ajedrecista wrote:Hello Kai:
Laskos wrote:
Ajedrecista wrote:
At first approximation, I would say for a score of 64.5% and a draw ratio of 24.5%: ±800*1.96*sqrt[1/(4*0.645*0.355) - 0.245]/[ln(10)*sqrt(30080)] ~ ± 3.61 Elo.
Regards from Spain.

Ajedrecista.
I don't understand your formula, isn't it 800/log(10) * 1.96*sqrt(4*0.645*0.355-0.245)/sqrt(30080) ~ 3.22 Elo points?
[800/ln(10)]*1.96*sqrt(4*0.645*0.355 - 0.245)/sqrt(30080) ~ 3.22 indeed. But please note that it is not what I wrote. Your typo comes in sqrt(4*0.645*0.355 - 0.245) ~ 0.819, while I wrote sqrt[1/(4*0.645*0.355) - 0.245] ~ 0.9202. Of course: (0.9202/0.819)*3.21 ~ 3.61 (my estimate). Thanks for your interest.

Regards from Spain.

Ajedrecista.
Ok, I have now derived the formula to calculate Elo points error in a match of two engines. Percentage error is easy

Code: Select all

Error in % is = 100% *SD*Sqrt[s*(1-s) - d/4]/Sqrt[n]
where SD is the desired standard deviations, s is the score, d is the draw ratio, n is the number of games.
Transferring percentages to Elo gives

Code: Select all

Elo difference = 400*Log[(1 - s)/s]/Log[10]

Its derivative in s:
400/((1 - s)*s*Log[10])

Elo error = 200*Sqrt[4*(1 - s)*s - d] * SD/(Sqrt[n]*(1 - s)*s*Log[10])
and in Robert Hyatt's case it's 3.51 Elo points, as derived by you in your "own model".
User avatar
Ajedrecista
Posts: 2121
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: 19 days from SF 4 release and about ~30 Elo gain!

Post by Ajedrecista »

Hello again:
Laskos wrote:
Ok, I have now derived the formula to calculate Elo points error in a match of two engines. Percentage error is easy

Code: Select all

Error in % is = 100% *SD*Sqrt[s*(1-s) - d/4]/Sqrt[n]
where SD is the desired standard deviations, s is the score, d is the draw ration, n is the number of games.
Transferring percentages to Elo gives

Code: Select all

Elo difference = 400*Log[(1 - s)/s]/Log[10]

Its derivative in s:
400/((1 - s)*s*Log[10])

Elo error = 200*Sqrt[4*(1 - s)*s - d] * SD/(Sqrt[n]*(1 - s)*s*Log[10])
and in Robert Hyatt's case it's 3.51 Elo points, as derived by you by your "own model".
Good job! Thanks for sharing it. Just as a side note: Elo difference = -400*ln[(1 - µ)/µ]/ln(10). The minus sign is important: with your formula: µ > 1/2 brings negative Elo differences! The derivative is fine (it would be negative if you derivate from the wrong formula, but it is positive with the correct formula). I am also satisfied with your last formula that you named 'Elo error'. :)

Regards from Spain.

Ajedrecista.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: 19 days from SF 4 release and about ~30 Elo gain!

Post by Laskos »

Ajedrecista wrote:Hello again:
Laskos wrote:
Ok, I have now derived the formula to calculate Elo points error in a match of two engines. Percentage error is easy

Code: Select all

Error in % is = 100% *SD*Sqrt[s*(1-s) - d/4]/Sqrt[n]
where SD is the desired standard deviations, s is the score, d is the draw ration, n is the number of games.
Transferring percentages to Elo gives

Code: Select all

Elo difference = 400*Log[(1 - s)/s]/Log[10]

Its derivative in s:
400/((1 - s)*s*Log[10])

Elo error = 200*Sqrt[4*(1 - s)*s - d] * SD/(Sqrt[n]*(1 - s)*s*Log[10])
and in Robert Hyatt's case it's 3.51 Elo points, as derived by you by your "own model".
Good job! Thanks for sharing it. Just as a side note: Elo difference = -400*ln[(1 - µ)/µ]/ln(10). The minus sign is important: with your formula: µ > 1/2 brings negative Elo differences! The derivative is fine (it would be negative if you derivate from the wrong formula, but it is positive with the correct formula). I am also satisfied with your last formula that you named 'Elo error'. :)

Regards from Spain.

Ajedrecista.
I just did it quickly to have a positive derivative, and a positive error. The minus sign of the beginning disappeared on the way :)
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: 19 days from SF 4 release and about ~30 Elo gain!

Post by Laskos »

And just as a rule of thumb in a match of two engines, for 95% confidence interval:

Elo error = 170*Sqrt[4*(1 - s)*s - d] / (s*(1 - s)*Sqrt[N])

Where N is the number of games, s is the score, i.e. (W+D/2)/N, d is the draw ratio.

For Robert Hyatt match:
s=0.645
d=0.245
N=30080

Error: +/- 3.51 Elo points 95% confidence interval.
lucasart
Posts: 3241
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: 19 days from SF 4 release and about ~30 Elo gain!

Post by lucasart »

gladius wrote: I used my rating calculator http://forwardcoding.com/projects/ajaxchess/rating.html, assuming 60% draw rate, and 10 elo advantage. It gives this:

ELO: 9.93 +- 8.15
LOS: 99.99%
Wins: 1600 Losses: 1400 Draws: 4000

However, I just tried it with fishtest's stat_util.py https://github.com/glinscott/fishtest/b ... at_util.py and it gives
ELO: 11.56 +- 6.2
LOS: 99.99%

I would tend to trust stat_util.py more, but I'm not honestly sure.
Discussions about how to calcualte a p-value, and a confidence interval, really never end in this forum!

Don't listen to charlatans: stat_util.py is correct.

The code is really self-explanatory:

Code: Select all

def get_elo(WLD):
  # win/loss/draw ratio
  N = sum(WLD)
  w = float(WLD[0])/N
  l = float(WLD[1])/N
  d = float(WLD[2])/N

  # mu is the empirical mean of the variables (Xi), assumed i.i.d.
  mu = w + d/2

  # stdev is the empirical standard deviation of the random variable (X1+...+X_N)/N
  stdev = math.sqrt(w*(1-mu)**2 + l*(0-mu)**2 + d*(0.5-mu)**2) / math.sqrt(N)

  # 95% confidence interval for mu
  mu_min = mu + phi_inv(0.025) * stdev
  mu_max = mu + phi_inv(0.975) * stdev

  el = elo(mu)
  elo95 = (elo(mu_max) - elo(mu_min)) / 2
  los = phi((mu-0.5) / stdev)

  return el, elo95, los
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: 19 days from SF 4 release and about ~30 Elo gain!

Post by Laskos »

lucasart wrote:
gladius wrote: I used my rating calculator http://forwardcoding.com/projects/ajaxchess/rating.html, assuming 60% draw rate, and 10 elo advantage. It gives this:

ELO: 9.93 +- 8.15
LOS: 99.99%
Wins: 1600 Losses: 1400 Draws: 4000

However, I just tried it with fishtest's stat_util.py https://github.com/glinscott/fishtest/b ... at_util.py and it gives
ELO: 11.56 +- 6.2
LOS: 99.99%

I would tend to trust stat_util.py more, but I'm not honestly sure.
Discussions about how to calcualte a p-value, and a confidence interval, really never end in this forum!

Don't listen to charlatans: stat_util.py is correct.

The code is really self-explanatory:

Code: Select all

def get_elo(WLD):
  # win/loss/draw ratio
  N = sum(WLD)
  w = float(WLD[0])/N
  l = float(WLD[1])/N
  d = float(WLD[2])/N

  # mu is the empirical mean of the variables (Xi), assumed i.i.d.
  mu = w + d/2

  # stdev is the empirical standard deviation of the random variable (X1+...+X_N)/N
  stdev = math.sqrt(w*(1-mu)**2 + l*(0-mu)**2 + d*(0.5-mu)**2) / math.sqrt(N)

  # 95% confidence interval for mu
  mu_min = mu + phi_inv(0.025) * stdev
  mu_max = mu + phi_inv(0.975) * stdev

  el = elo(mu)
  elo95 = (elo(mu_max) - elo(mu_min)) / 2
  los = phi((mu-0.5) / stdev)

  return el, elo95, los
Sorry, I didn't follow the discussion, but for Wins: 1600 Losses: 1400 Draws: 4000, what's the matter with 11.56 Elo points difference, while it's 9.93?
gladius
Posts: 568
Joined: Tue Dec 12, 2006 10:10 am
Full name: Gary Linscott

Re: 19 days from SF 4 release and about ~30 Elo gain!

Post by gladius »

Laskos wrote:
lucasart wrote:
gladius wrote: I used my rating calculator http://forwardcoding.com/projects/ajaxchess/rating.html, assuming 60% draw rate, and 10 elo advantage. It gives this:

ELO: 9.93 +- 8.15
LOS: 99.99%
Wins: 1600 Losses: 1400 Draws: 4000

However, I just tried it with fishtest's stat_util.py https://github.com/glinscott/fishtest/b ... at_util.py and it gives
ELO: 11.56 +- 6.2
LOS: 99.99%

I would tend to trust stat_util.py more, but I'm not honestly sure.
Discussions about how to calcualte a p-value, and a confidence interval, really never end in this forum!

Don't listen to charlatans: stat_util.py is correct.

The code is really self-explanatory:

Code: Select all

def get_elo(WLD):
  # win/loss/draw ratio
  N = sum(WLD)
  w = float(WLD[0])/N
  l = float(WLD[1])/N
  d = float(WLD[2])/N

  # mu is the empirical mean of the variables (Xi), assumed i.i.d.
  mu = w + d/2

  # stdev is the empirical standard deviation of the random variable (X1+...+X_N)/N
  stdev = math.sqrt(w*(1-mu)**2 + l*(0-mu)**2 + d*(0.5-mu)**2) / math.sqrt(N)

  # 95% confidence interval for mu
  mu_min = mu + phi_inv(0.025) * stdev
  mu_max = mu + phi_inv(0.975) * stdev

  el = elo(mu)
  elo95 = (elo(mu_max) - elo(mu_min)) / 2
  los = phi((mu-0.5) / stdev)

  return el, elo95, los
Sorry, I didn't follow the discussion, but for Wins: 1600 Losses: 1400 Draws: 4000, what's the matter with 11.56 Elo points difference, while it's 9.93?
It's because I can't type! I entered draws as 3000 when testing stat_util.py. When using the correct 1600,1400,4000, I get:
Elo: 9.9294334900128192
+-: 5.3203271189105408,
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: 19 days from SF 4 release and about ~30 Elo gain!

Post by Laskos »

gladius wrote:
Laskos wrote:
lucasart wrote:
gladius wrote: I used my rating calculator http://forwardcoding.com/projects/ajaxchess/rating.html, assuming 60% draw rate, and 10 elo advantage. It gives this:

ELO: 9.93 +- 8.15
LOS: 99.99%
Wins: 1600 Losses: 1400 Draws: 4000

However, I just tried it with fishtest's stat_util.py https://github.com/glinscott/fishtest/b ... at_util.py and it gives
ELO: 11.56 +- 6.2
LOS: 99.99%

I would tend to trust stat_util.py more, but I'm not honestly sure.
Discussions about how to calcualte a p-value, and a confidence interval, really never end in this forum!

Don't listen to charlatans: stat_util.py is correct.

The code is really self-explanatory:

Code: Select all

def get_elo(WLD):
  # win/loss/draw ratio
  N = sum(WLD)
  w = float(WLD[0])/N
  l = float(WLD[1])/N
  d = float(WLD[2])/N

  # mu is the empirical mean of the variables (Xi), assumed i.i.d.
  mu = w + d/2

  # stdev is the empirical standard deviation of the random variable (X1+...+X_N)/N
  stdev = math.sqrt(w*(1-mu)**2 + l*(0-mu)**2 + d*(0.5-mu)**2) / math.sqrt(N)

  # 95% confidence interval for mu
  mu_min = mu + phi_inv(0.025) * stdev
  mu_max = mu + phi_inv(0.975) * stdev

  el = elo(mu)
  elo95 = (elo(mu_max) - elo(mu_min)) / 2
  los = phi((mu-0.5) / stdev)

  return el, elo95, los
Sorry, I didn't follow the discussion, but for Wins: 1600 Losses: 1400 Draws: 4000, what's the matter with 11.56 Elo points difference, while it's 9.93?
It's because I can't type! I entered draws as 3000 when testing stat_util.py. When using the correct 1600,1400,4000, I get:
Elo: 9.9294334900128192
+-: 5.3203271189105408,
These numbers are correct. 9.93 and 5.32. The last one you can derive using Elo error = 170*Sqrt[4*(1 - s)*s - d] / (s*(1 - s)*Sqrt[N]), but it's all the same O(1) formula, so use the Lucas way.
User avatar
Ajedrecista
Posts: 2121
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: 19 days from SF 4 release and about ~30 Elo gain!

Post by Ajedrecista »

Hello:

I see that some patches fail SPRT with score > 50% from time to time in SF testing framework and some people there want to give an additional try to these patches. My question is: how often is this scenario present? I did my own SPRT simulator inspired by this Lucas' post. Here are my results for stages I and II:

Code: Select all

Stage I --> SPRT (-1.5, 4.5):
alpha = beta = 0.05 (5%); 'a priori' drawelo = 240.
10000 simulations each time.

Bayeselo        Passes        Fails       <Games>        Fails with score > 50%        Fails with score = 50%
   0.5           2702          7298        26316                  1493                           27
   1             3782          6218        28037                  1423                           25
   1.5           4995          5005        28599                  1211                           21
   2             6196          3804        27908                   850                           13
   2.5           7249          2751        26427                   552                           12
   3             8127          1873        24447                   345                            7

=============================================================================================================

Stage II --> SPRT (0, 6):
alpha = beta = 0.05 (5%); 'a priori' drawelo = 270.
10000 simulations each time.

Bayeselo        Passes        Fails       <Games>        Fails with score > 50%        Fails with score = 50%
   0.5            791          9209        20872                  3477                           43
   1             1190          8810        23704                  3830                           34
   1.5           1808          8192        26478                  3950                           32
   2             2737          7263        28387                  3733                           26
   2.5           3796          6204        30394                  3358                           29
   3             5008          4992        30719                  2741                           23
Probabilities of pass these two stages are:

Code: Select all

Bayeselo = 0.5: P ~ 0.2702*0.0791 ~  2.14%
Bayeselo = 1:   P ~ 0.3782*0.1190 ~  4.50%
Bayeselo = 1.5: P ~ 0.4995*0.1808 ~  9.03%
Bayeselo = 2:   P ~ 0.6196*0.2737 ~ 16.96%
Bayeselo = 2.5: P ~ 0.7249*0.3796 ~ 27.52%
Bayeselo = 3:   P ~ 0.8127*0.5008 ~ 40.70%
If one patch passes Stage I then plays at Stage II. The average number of games until stop by SPRT rules (considering a game at Stage II about four times longer than a game at Stage I):

Code: Select all

Bayeselo = 0.5: <games> ~ 26316 + 4*20872 = 109804 @ 15+0.05 ~ 27451 @ 60+0.05
Bayeselo = 1:   <games> ~ 28037 + 4*23704 = 122853 @ 15+0.05 ~ 30713 @ 60+0.05
Bayeselo = 1.5: <games> ~ 28599 + 4*26478 = 134511 @ 15+0.05 ~ 33628 @ 60+0.05
Bayeselo = 2:   <games> ~ 27908 + 4*28387 = 141456 @ 15+0.05 ~ 35364 @ 60+0.05
Bayeselo = 2.5: <games> ~ 26427 + 4*30394 = 148003 @ 15+0.05 ~ 37001 @ 60+0.05
Bayeselo = 3:   <games> ~ 24447 + 4*30719 = 147323 @ 15+0.05 ~ 36831 @ 60+0.05
I hope that someone can confirm my results. It is needless to say that my simulator is far from perfect.

Regards from Spain.

Ajedrecista.