Search time variance samples

sje · Post by **sje** » Sat Aug 24, 2013 6:22 pm

Search time variance for a fixed initial condition is a problem in all Unix platforms because of the indeterminacy introduced by background tasks, memory layout, paging, etc. I have found that this variance is so large that it makes single-run time measurements of small program changes meaningless.

Here is a sample of Symbolic's output running the same perft(6) call eight times in a row on a quadcore machine. The Pt field is total processor time and the Wt filed is wall (elapsed) time:

Code: Select all

[] pcfull 6
PathFull(6): 119,060,324   Pt: 38.563   Wt: 10.742   F/P: 1.10829e+07/9.0229e-08
Total: one hundred nineteen million sixty thousand three hundred twenty-four
[] pcfull 6
PathFull(6): 119,060,324   Pt: 35.661   Wt: 9.543   F/P: 1.24761e+07/8.01535e-08
Total: one hundred nineteen million sixty thousand three hundred twenty-four
[] pcfull 6
PathFull(6): 119,060,324   Pt: 35.044   Wt: 9.179   F/P: 1.29701e+07/7.71007e-08
Total: one hundred nineteen million sixty thousand three hundred twenty-four
[] pcfull 6
PathFull(6): 119,060,324   Pt: 38.181   Wt: 10.768   F/P: 1.10562e+07/9.04467e-08
Total: one hundred nineteen million sixty thousand three hundred twenty-four
[] pcfull 6
PathFull(6): 119,060,324   Pt: 35.210   Wt: 9.275   F/P: 1.28361e+07/7.79052e-08
Total: one hundred nineteen million sixty thousand three hundred twenty-four
[] pcfull 6
PathFull(6): 119,060,324   Pt: 36.307   Wt: 9.933   F/P: 1.1986e+07/8.34305e-08
Total: one hundred nineteen million sixty thousand three hundred twenty-four
[] pcfull 6
PathFull(6): 119,060,324   Pt: 35.225   Wt: 9.290   F/P: 1.2815e+07/7.80337e-08
Total: one hundred nineteen million sixty thousand three hundred twenty-four
[] pcfull 6
PathFull(6): 119,060,324   Pt: 35.930   Wt: 9.724   F/P: 1.22427e+07/8.16811e-08
Total: one hundred nineteen million sixty thousand three hundred twenty-four

Ajedrecista · Post by **Ajedrecista** » Sun Aug 25, 2013 12:05 pm

Hello Steven:

sje wrote:Search time variance for a fixed initial condition is a problem in all Unix platforms because of the indeterminacy introduced by background tasks, memory layout, paging, etc. I have found that this variance is so large that it makes single-run time measurements of small program changes meaningless.

Here is a sample of Symbolic's output running the same perft(6) call eight times in a row on a quadcore machine. The Pt field is total processor time and the Wt filed is wall (elapsed) time:

Code: Select all

[] pcfull 6
PathFull(6): 119,060,324   Pt: 38.563   Wt: 10.742   F/P: 1.10829e+07/9.0229e-08
Total: one hundred nineteen million sixty thousand three hundred twenty-four
[] pcfull 6
PathFull(6): 119,060,324   Pt: 35.661   Wt: 9.543   F/P: 1.24761e+07/8.01535e-08
Total: one hundred nineteen million sixty thousand three hundred twenty-four
[] pcfull 6
PathFull(6): 119,060,324   Pt: 35.044   Wt: 9.179   F/P: 1.29701e+07/7.71007e-08
Total: one hundred nineteen million sixty thousand three hundred twenty-four
[] pcfull 6
PathFull(6): 119,060,324   Pt: 38.181   Wt: 10.768   F/P: 1.10562e+07/9.04467e-08
Total: one hundred nineteen million sixty thousand three hundred twenty-four
[] pcfull 6
PathFull(6): 119,060,324   Pt: 35.210   Wt: 9.275   F/P: 1.28361e+07/7.79052e-08
Total: one hundred nineteen million sixty thousand three hundred twenty-four
[] pcfull 6
PathFull(6): 119,060,324   Pt: 36.307   Wt: 9.933   F/P: 1.1986e+07/8.34305e-08
Total: one hundred nineteen million sixty thousand three hundred twenty-four
[] pcfull 6
PathFull(6): 119,060,324   Pt: 35.225   Wt: 9.290   F/P: 1.2815e+07/7.80337e-08
Total: one hundred nineteen million sixty thousand three hundred twenty-four
[] pcfull 6
PathFull(6): 119,060,324   Pt: 35.930   Wt: 9.724   F/P: 1.22427e+07/8.16811e-08
Total: one hundred nineteen million sixty thousand three hundred twenty-four

I ran Perft(7) 100 times with JetChess 1.0.0.0 (256 MB of hash):

Code: Select all

Time in seconds:

9.668
9.801
9.67
9.186
8.748
8.828
9.59
9.296
8.998
8.976
9.107
8.635
8.649
8.6
8.615
8.647
8.649
8.65
8.555
8.718
8.758
9.25
9.631
9.71
9.634
9.098
8.995
9.058
9.619
9.045
9.599
9.22
9.765
9.613
8.776
9.687
9.058
9.041
8.7
9.599
9.589
9.461
9.599
9.434
9.658
9.562
9.653
9.429
9.462
9.421
9.528
9.609
9.914
9.64
9.731
9.678
9.668
9.634
9.834
9.512
9.823
9.719
9.761
9.61
9.673
9.622
9.226
8.7
9.568
8.675
8.705
9.653
8.874
9.417
8.766
8.803
8.706
9.467
9.447
8.919
9.19
8.657
9.583
9.544
8.961
9.428
9.699
8.947
8.666
8.672
8.621
9.643
8.679
9.53
9.695
9.412
9.573
9.416
9.83
9.696

Code: Select all

Median: between 9.429 and 9.434 seconds.
Sample mean = <t> = 9.28034 seconds.
Sample standard deviation = s ~ 0.41438800487690107 seconds.
s/<t> ~ 0.0446522438700415148
<t>/s ~ 22.395291105872711

I use Windows XP; my computer is an Intel Pentium D930 at 3 GHz.

Regards from Spain.

Ajedrecista.

sje · Post by **sje** » Sun Aug 25, 2013 4:27 pm

It looks like Windows XP has similar variance characteristics to those of Unix.

The more a program uses available resources (CPU, RAM), the more it is affected by background processing and indeterminacy.

michiguel · Post by **michiguel** » Sun Aug 25, 2013 5:50 pm

sje wrote:It looks like Windows XP has similar variance characteristics to those of Unix.

The more a program uses available resources (CPU, RAM), the more it is affected by background processing and indeterminacy.

I don't have this variability at all. Times are almost identical with perft 6 every single time (Linux AMD Phenom(tm) II X4 965 Processor × 4). Would it be because I do not use hash in perft?

When I do searches I always observe variations that may be around ~1%. I observed that this variability in a laptop I measured was way bigger, which made my experiments on compiler switches impossible.

Miguel

bob · Post by **bob** » Mon Aug 26, 2013 6:23 pm

michiguel wrote:
sje wrote:It looks like Windows XP has similar variance characteristics to those of Unix.

The more a program uses available resources (CPU, RAM), the more it is affected by background processing and indeterminacy.
I don't have this variability at all. Times are almost identical with perft 6 every single time (Linux AMD Phenom(tm) II X4 965 Processor × 4). Would it be because I do not use hash in perft?

When I do searches I always observe variations that may be around ~1%. I observed that this variability in a laptop I measured was way bigger, which made my experiments on compiler switches impossible.

Miguel

There is one particularly ugly problem dealing with where a program is laid out in physical memory, since that affects how it maps into cache. You can get a poor memory layout that causes it to skip some of the blocks in cache. Each time you load a program into memory, this will change. There is a "page-coloring" algorithm that solves this, but I don't know if it ever made it into any Linux kernel (Richard Gooch wrote one particularly ugly implementation years ago. I wrote one that was much cleaner but it changed the system call to allocate a new physical page and was frowned on for changing that).

You can't just run a program several times. Once it gets into memory, the thing tends to sit there ready for the next execution. I generally run a version of Crafty and ask for max hash. Then I terminate and run a different version (different executable name, anyway) and let it get loaded again since the previous run flushed everything due to max memory allocation.

I just tested perft 6 on my linux box and got times between 7.08 and 7.45... I've generally found that resolution beyond 300ms or so is problematic. That is the basic kernel time quantum, for normal processes anyway. Newer kernels drop that to 200 or so but higher priority processes get more.

mvk · Post by **mvk** » Tue Aug 27, 2013 5:51 pm

Ajedrecista wrote:Hello Steven:

I ran Perft(7) 100 times with JetChess 1.0.0.0 (256 MB of hash):
Code: Select all
Time in seconds:

9.668
9.801
... snip ...
Code: Select all
Median: between 9.429 and 9.434 seconds.
Sample mean = <t> = 9.28034 seconds.
Sample standard deviation = s ~ 0.41438800487690107 seconds.
s/<t> ~ 0.0446522438700415148
<t>/s ~ 22.395291105872711
I use Windows XP; my computer is an Intel Pentium D930 at 3 GHz.

Regards from Spain.

Ajedrecista.

One piece of advice when making multiple runs: report the minimum sample instead of the mean. Not only is it more accurate for what you want to measure (what your code does, and not what gets added randomly), but also the variance of the minimum is lower than the variance of the mean. So you get more precision out of your measurement.

Daniel Shawul · Post by **Daniel Shawul** » Tue Aug 27, 2013 6:53 pm

but also the variance of the minimum is lower than the variance of the mean.

Are u sure about this? It seems to me that depends on the underlying distribution from what I vaguely remember of order statistics. Nitpick alert, in the degenerate case that the sample size is 1, var(mean(x)) = var(min(x)) for iid variables.

Ajedrecista · Post by **Ajedrecista** » Tue Aug 27, 2013 7:14 pm

Hello Marcel:

mvk wrote:One piece of advice when making multiple runs: report the minimum sample instead of the mean. Not only is it more accurate for what you want to measure (what your code does, and not what gets added randomly), but also the variance of the minimum is lower than the variance of the mean. So you get more precision out of your measurement.

I think you are referring to sqrt[SUM{[(t_i - min.(t)]²; i=1, N}/(N - 1)], right? It is the first time I read about it. Well, you will get variances of the minimum and maximum for the same price:

Code: Select all

Min.(t) = 8.555 seconds.
Max.(t) = 9.914 seconds.

s_min ~ 0.83854031422815837 seconds.
s_max ~ 0.75980143604906553 seconds.

Well, I have given to you standard deviations instead of variances. If I calculated correctly, s_min > s.

Regards from Spain.

Ajedrecista.

mvk · Post by **mvk** » Tue Aug 27, 2013 7:17 pm

Yes, because the error is one-sided, that is, never faster than the thing you want to measure: the minimum time to do the same job is fixed by the binary.

mvk · Post by **mvk** » Tue Aug 27, 2013 7:20 pm

Ajedrecista wrote: If I calculated correctly, s_min > s.

Hmm, that is different from my observations. Interesting... I'm not near my data at this moment, so I have to dig into it later.

Search time variance samples

Search time variance samples

Re: Search time variance samples.

Re: Search time variance samples.

Re: Search time variance samples.

Re: Search time variance samples.

Re: Search time variance samples.

Re: Search time variance samples.

Re: Search time variance samples.

Re: Search time variance samples.

Re: Search time variance samples.