Search time variance samples

abulmo · Post by **abulmo** » Tue Aug 27, 2013 7:45 pm

Ajedrecista wrote:Hello Marcel:

I think you are referring to sqrt[SUM{[(t_i - min.(t)]²; i=1, N}/(N - 1)]
, right?

I do not think so. You are computing the deviation of t_i from the min, not the variance of min.

Ajedrecista · Post by **Ajedrecista** » Tue Aug 27, 2013 8:44 pm

Hello Richard:

abulmo wrote:
Ajedrecista wrote:Hello Marcel:

I think you are referring to sqrt[SUM{[(t_i - min.(t)]²; i=1, N}/(N - 1)]
, right?
I do not think so. You are computing the deviation of t_i from the min, not the variance of min.

I must admit that I am clueless now. I have just taken a quick look here and have done the calculations of (sigma_y)² > ... I got an harmonic mean of more less 9.2616692580335054, so substituting I get (sigma_y)² ~ 0.16394833859619505; sigma_y ~ 0.40490534523045636 < s, but I think that it is not the 'variance of the minimum'. I searched a little on Internet but I did not find anything, so I can not calculate it properly; luckily, I made the data available, so somebody can calculate it. I would do it myself but I need the formula...

Regards from Spain.

Ajedrecista.

petero2 · Post by **petero2** » Tue Aug 27, 2013 8:55 pm

Daniel Shawul wrote:
but also the variance of the minimum is lower than the variance of the mean.
Are u sure about this? It seems to me that depends on the underlying distribution from what I vaguely remember of order statistics.

Indeed it depends on the distribution. If we have a random variable X that takes the value 0 with probability 0.1 and the value 1 with probability 0.9, and take 10 measurements at a time, and assume that measurements are independent, we get for the average of 10:

Code: Select all

mean = 0.1 * 0 + 0.9 * 1 = 0.9
var  = (0.1 * (0 - 0.9)^2 + 0.9 * (1 - 0.9)^2) / 10 = 0.009

However, for a random variable Y that is the minimum of 10 independent samples of X, we get:

Code: Select all

P(Y=1) = 0.9^10 ~= 0.34868
P(Y=0) = 1 - P(Y=1) ~= 0.65132

mean = 0 * P(Y=0) + 1 * P(Y=1) ~= 0.34868
var  = P(Y=0)*(0-mean)^2+P(Y=1)*(1-mean)^2 ~= 0.22710

So the statement that "the variance of the minimum is lower than the variance of the mean" is not true for all distributions and all samples sizes. It may still be true for the distributions you typically get when you measure the execution time of a program running on a typical computer system though.

Daniel Shawul · Post by **Daniel Shawul** » Tue Aug 27, 2013 9:02 pm

I don't think you got my point. The variance of a sample mean (aka standard error) is not necessarily greater than the variance of a sample minimum. This is similar to a min-max update where order statistics is extensively used. To be more specific, the cdf for max is F(x)^n and for mean 1-(1-F(x))^N. Finding the variance of the minimum is a bit complicated but still workable as shown here. You can test this in a simple script by generating uniform and normally distributed random variables. You will see that for the normal dist variance of min/max can be bigger that of the mean's. So if Jesus did the tests multiple times with sample of 100, you shouldn't expect the variance of the minimum to be smaller than the variance of the mean, because it very much depends on the distribution of the perft times which is unknown. I am not even sure if it is invariant of sample size..

Daniel Shawul · Post by **Daniel Shawul** » Tue Aug 27, 2013 9:13 pm

Indeed I get observed similar behaviour.

bob · Post by **bob** » Tue Aug 27, 2013 11:40 pm

Daniel Shawul wrote:Indeed I get observed similar behaviour.

The only advantage I see for using the smallest time is that it is the time that has the least "system noise" interjected. Poor memory layout producing fewer cache hits, or fewer context switches (fewer network packets during that sample, maybe) and so forth. You eventually approach the "minimum time" the code will take on that particular hardware, which is not a bad number to measure. But it is painful to get enough samples to make sure you get close to that lower bound..

abulmo · Post by **abulmo** » Wed Aug 28, 2013 7:29 am

As we do not know the fucntion distribution, I think the best method to estimate the variance of the min of 100 samples is resampling methods like bootstrap.
.

mvk · Post by **mvk** » Wed Aug 28, 2013 8:10 pm

bob wrote:But it is painful to get enough samples to make sure you get close to that lower bound..

The suggestion is/was that it is easier to estimate that lower bound than the mean with the same precision. As posters have pointed out, that depends on the distribution of the samples. I have observed this effect in my measurements long time ago, but I would have to dig it up (or retest to see if it still holds on my current system). I have not attempted to demonstrate this analytically, assuming, for example, a Poisson distribution. Btw, I typically used a sample size of n=10 in my tests.

Search time variance samples

Re: Search time variance samples.

Re: Search time variance samples.

Re: Search time variance samples.

Re: Search time variance samples.

Re: Search time variance samples.

Re: Search time variance samples.

Re: Search time variance samples.

Re: Search time variance samples.