Search time variance samples

Discussion of chess software programming and technical issues.

Moderator: Ras

abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: Search time variance samples.

Post by abulmo »

Ajedrecista wrote:Hello Marcel:

I think you are referring to sqrt[SUM{[(t_i - min.(t)]²; i=1, N}/(N - 1)]
, right?
I do not think so. You are computing the deviation of t_i from the min, not the variance of min.
Richard
User avatar
Ajedrecista
Posts: 2208
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Search time variance samples.

Post by Ajedrecista »

Hello Richard:
abulmo wrote:
Ajedrecista wrote:Hello Marcel:

I think you are referring to sqrt[SUM{[(t_i - min.(t)]²; i=1, N}/(N - 1)]
, right?
I do not think so. You are computing the deviation of t_i from the min, not the variance of min.
I must admit that I am clueless now. I have just taken a quick look here and have done the calculations of (sigma_y)² > ... I got an harmonic mean of more less 9.2616692580335054, so substituting I get (sigma_y)² ~ 0.16394833859619505; sigma_y ~ 0.40490534523045636 < s, but I think that it is not the 'variance of the minimum'. I searched a little on Internet but I did not find anything, so I can not calculate it properly; luckily, I made the data available, so somebody can calculate it. I would do it myself but I need the formula... :oops:

Regards from Spain.

Ajedrecista.
petero2
Posts: 734
Joined: Mon Apr 19, 2010 7:07 pm
Location: Sweden
Full name: Peter Osterlund

Re: Search time variance samples.

Post by petero2 »

Daniel Shawul wrote:
but also the variance of the minimum is lower than the variance of the mean.
Are u sure about this? It seems to me that depends on the underlying distribution from what I vaguely remember of order statistics.
Indeed it depends on the distribution. If we have a random variable X that takes the value 0 with probability 0.1 and the value 1 with probability 0.9, and take 10 measurements at a time, and assume that measurements are independent, we get for the average of 10:

Code: Select all

mean = 0.1 * 0 + 0.9 * 1 = 0.9
var  = (0.1 * (0 - 0.9)^2 + 0.9 * (1 - 0.9)^2) / 10 = 0.009
However, for a random variable Y that is the minimum of 10 independent samples of X, we get:

Code: Select all

P(Y=1) = 0.9^10 ~= 0.34868
P(Y=0) = 1 - P(Y=1) ~= 0.65132

mean = 0 * P(Y=0) + 1 * P(Y=1) ~= 0.34868
var  = P(Y=0)*(0-mean)^2+P(Y=1)*(1-mean)^2 ~= 0.22710
So the statement that "the variance of the minimum is lower than the variance of the mean" is not true for all distributions and all samples sizes. It may still be true for the distributions you typically get when you measure the execution time of a program running on a typical computer system though.
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Search time variance samples.

Post by Daniel Shawul »

I don't think you got my point. The variance of a sample mean (aka standard error) is not necessarily greater than the variance of a sample minimum. This is similar to a min-max update where order statistics is extensively used. To be more specific, the cdf for max is F(x)^n and for mean 1-(1-F(x))^N. Finding the variance of the minimum is a bit complicated but still workable as shown here. You can test this in a simple script by generating uniform and normally distributed random variables. You will see that for the normal dist variance of min/max can be bigger that of the mean's. So if Jesus did the tests multiple times with sample of 100, you shouldn't expect the variance of the minimum to be smaller than the variance of the mean, because it very much depends on the distribution of the perft times which is unknown. I am not even sure if it is invariant of sample size..
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Search time variance samples.

Post by Daniel Shawul »

Indeed I get observed similar behaviour.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Search time variance samples.

Post by bob »

Daniel Shawul wrote:Indeed I get observed similar behaviour.
The only advantage I see for using the smallest time is that it is the time that has the least "system noise" interjected. Poor memory layout producing fewer cache hits, or fewer context switches (fewer network packets during that sample, maybe) and so forth. You eventually approach the "minimum time" the code will take on that particular hardware, which is not a bad number to measure. But it is painful to get enough samples to make sure you get close to that lower bound..
abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: Search time variance samples.

Post by abulmo »

As we do not know the fucntion distribution, I think the best method to estimate the variance of the min of 100 samples is resampling methods like bootstrap.
.
Richard
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: Search time variance samples.

Post by mvk »

bob wrote:But it is painful to get enough samples to make sure you get close to that lower bound..
The suggestion is/was that it is easier to estimate that lower bound than the mean with the same precision. As posters have pointed out, that depends on the distribution of the samples. I have observed this effect in my measurements long time ago, but I would have to dig it up (or retest to see if it still holds on my current system). I have not attempted to demonstrate this analytically, assuming, for example, a Poisson distribution. Btw, I typically used a sample size of n=10 in my tests.