Reliable speed comparison: some math required

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
mcostalba
Posts: 2679
Joined: Sat Jun 14, 2008 7:17 pm

Re: Reliable speed comparison: some math required

Post by mcostalba » Tue Feb 27, 2018 8:53 pm

What about this one?

http://www.brendangregg.com/perf.html#CPUstatistics

This could be the final weapon! In particular look at the table in the above link:

Code: Select all

5,649,595,479 cycles 

See also https://perf.wiki.kernel.org/index.php/Tutorial

syzygy
Posts: 4343
Joined: Tue Feb 28, 2012 10:56 pm

Re: Reliable speed comparison: some math required

Post by syzygy » Tue Feb 27, 2018 8:59 pm

Without background noise and running just a single bench at a time:

Code: Select all

$ taskset -c 4 cfish bench 2>&1 >/dev/null | grep second
Results:

Code: Select all

Nodes/second    : 2775955
Nodes/second    : 2778906
Nodes/second    : 2777430
Nodes/second    : 2778906
Nodes/second    : 2780385
Nodes/second    : 2777430
Nodes/second    : 2778906
Nodes/second    : 2780385
Nodes/second    : 2777430
Nodes/second    : 2775955
Nodes/second    : 2777430
Nodes/second    : 2777430
So max speed is 2780385.

User avatar
lucasart
Posts: 3014
Joined: Mon May 31, 2010 11:29 am
Full name: lucasart
Contact:

Re: Reliable speed comparison: some math required

Post by lucasart » Wed Feb 28, 2018 1:56 am

syzygy wrote:
lucasart wrote:
syzygy wrote:Testing in parallel is only more noisy
Did you verify this hypothesis of yours with empirical data ? I suggest you try…
I took the liberty to create some noise:

Code: Select all

run       base       test     diff
  1    1929540    2652016  +722476
  2    1929540    2657409  +727869
  3    2645305    1925985  -719320
  4    2670988    2639961   -31027
  5    2665540    2669624    +4084
  6    2625376    2666900   +41524
  7    2604446    2604446       +0
  8    2673720    2664181    -9539
  9    2637297    2006573  -630724
 10    1930965    2662824  +731859
 11    2670988    2670988       +0
 12    2672353    2677829    +5476
 13    2668261    2672353    +4092
 14    2660113    2666900    +6787
 15    2670988    2639961   -31027
 16    2444866    2460981   +16115
 17    2574937    2656058   +81121
 18    1926695    2653362  +726667
 19    1921736    2669624  +747888
 20    1926695    2656058  +729363

Result of  20 runs
==================
base (cfish          ) =    2422517  +/- 147234
test (cfish          ) =    2578702  +/- 94082
diff                   =    +156184  +/- 191677

speedup        = +0.0645
P(speedup > 0) =  0.9446

CPU: 6 x Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz
Hyperthreading: on
And ?

I don't see that you have proven anything here.

You need to compare 2 methods. Running one sample of one method achieves nothing. You should do (severalruns of) each method (sequential vs. concurrent) and compare.

And the only thing you should compare is the standard deviation of the "diff" vector. We don't care about the master and test: of course those are noisier in concurrent vs. sequential, that's trivial and missing the point. Point is more noise but more *correlated* such that the stdev of diff is lower in concurrent testing.

And you should definitely not take any liberty with the testing protocol. Should be all else equal between both (as much as possible). No external noise, other than the bare minimum possible on your machine/OS.
Last edited by lucasart on Wed Feb 28, 2018 2:09 am, edited 1 time in total.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.

User avatar
lucasart
Posts: 3014
Joined: Mon May 31, 2010 11:29 am
Full name: lucasart
Contact:

Re: Reliable speed comparison: some math required

Post by lucasart » Wed Feb 28, 2018 1:59 am

syzygy wrote:Without background noise and running just a single bench at a time:

Code: Select all

$ taskset -c 4 cfish bench 2>&1 >/dev/null | grep second
Results:

Code: Select all

Nodes/second    : 2775955
Nodes/second    : 2778906
Nodes/second    : 2777430
Nodes/second    : 2778906
Nodes/second    : 2780385
Nodes/second    : 2777430
Nodes/second    : 2778906
Nodes/second    : 2780385
Nodes/second    : 2777430
Nodes/second    : 2775955
Nodes/second    : 2777430
Nodes/second    : 2777430
So max speed is 2780385.
Same comment as my previous post. This data proves nothing.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.

bob
Posts: 20358
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: Reliable speed comparison: some math required

Post by bob » Wed Feb 28, 2018 2:26 am

lucasart wrote:
syzygy wrote:Testing in parallel is only more noisy
Did you verify this hypothesis of yours with empirical data ? I suggest you try…
It has been verified MANY times in the past. Even worse, testing in parallel can result in some CPUS throttling back due to heat or power consumption, which adds MORE noise.

The concept of pinning a process to a specific CPU is always a good place to start. If a process bounces to a different CPU, that CPU has to "throttle up". The more bounces, the slower it runs.

My advice: (1) disable hyper-threading if possible; (2) disable turbo-boost if possible; (3) run in single-user mode (Linux) to reduce system noise; (4) You can even unplug/disable network which can cause overhead.

Then run a quick test to "warm up" the CPU and cache (get important stuff loaded), then run real test. You can get pretty consistent timing. If you run tests that run for a few minutes, you will get repeatability to a second nearly every time, within a second every time. Measuring "short things" is a basic no-no for benchmarking. If you want to go as far as I did, run a lightweight kernel. No virtual memory or anything there, rock-solid repeatability.

Jouni
Posts: 1868
Joined: Wed Mar 08, 2006 7:15 pm

Re: Reliable speed comparison: some math required

Post by Jouni » Wed Feb 28, 2018 6:29 am

I think the problem is not math, but too many things to consider: linux/windows, AMD/Intel, popcount, bmi, 64 bit single/multi etc!?
Jouni

Fulvio
Posts: 98
Joined: Fri Aug 12, 2016 6:43 pm

Re: Reliable speed comparison: some math required

Post by Fulvio » Wed Feb 28, 2018 8:46 am

mcostalba wrote:Given N noisy measures of speed of NEW and N noisy measures of MASTER, we want to know:

1. If NEW is faster than MASTER with 95% reliability

2. Maximum speed up of NEW, defined as (Speed NEW - Speed Master) / Speed MASTER that is guaranteed with 95% reliability

The second point needs a clarification. Suppose NEW is faster then MASTER of 0.5% with 95% guarantee), then it will be faster also of at least 0,4% and lower. On the contrary it will not be faster than 0.8% with 95% reliability, maybe just with 30% reliability. We want to find the max speed-up (in this case 0.5%) that has 95% reliability.
What's wrong with the simple approach?
Measure alternatively MASTER, NEW, MASTER, etc, let's say 111 times (the number of repetitions depends on how noisy is the system).
Then calculate the derivative values comparing each NEW measurement with the MASTER measurement immediately before and after it.

Code: Select all

for &#40;int i = 0; i < 110; i++) &#123;
	v&#91;i&#93; -= v&#91;i + 1&#93;;
	if (&#40;i & 1&#41; == 0&#41;
		v&#91;i&#93; = -v&#91;i&#93;;
&#125;
Remove the last element and the first 10 elements, then sort the vector in descending order.
v[0] will be the maximum gain and v[99] the minimum gain.
v[94] will be the minimum gain obtained in at least 95% measurements.
The index of the first negative element is the percentage of measurements with a gain.
The middle element of the vector will be the median, etc...

mcostalba
Posts: 2679
Joined: Sat Jun 14, 2008 7:17 pm

Re: Reliable speed comparison: some math required

Post by mcostalba » Thu Mar 01, 2018 4:23 pm

I have done it ! :-)

Finally I was able to enable performance counters on my VMware virtual Linux and run the amazing perf tool.

This is the magical command line:

Code: Select all

sudo perf stat -r 5 -a -B -e cycles&#58;u ./stockfish bench > /dev/null

It _counts_ the CPU instructions of a bench run, repeating the measure 5 time and averaging the result, that is the one below:

Code: Select all

 Performance counter stats for 'system wide' &#40;5 runs&#41;&#58;

    11.694.771.724      cycles&#58;u                                                      ( +-  0,08% )

       4,370493245 seconds time elapsed                                          ( +-  0,23% )

Please not the amazing precision and repeatability of the measure, just +- 0,08%, moreover there is no influence of other OS activity, time , disk, etc...it just counts instructions (userspace instructions to be precise).

BeyondCritics
Posts: 335
Joined: Sat May 05, 2012 12:48 pm
Location: Bergheim

Re: Reliable speed comparison: some math required

Post by BeyondCritics » Fri Mar 02, 2018 12:26 am

bob wrote:...
If you want to go as far as I did, run a lightweight kernel. No virtual memory or anything there, rock-solid repeatability.
I have googled it, but it appears you can not disable virtual memory in Linux. What kernel did you run?
What test metrics do you use for testing?

bob
Posts: 20358
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: Reliable speed comparison: some math required

Post by bob » Fri Mar 02, 2018 1:45 am

Jouni wrote:I think the problem is not math, but too many things to consider: linux/windows, AMD/Intel, popcount, bmi, 64 bit single/multi etc!?
Certainly isn't math. It takes a little thought and preparation to get good benchmarking results...

Post Reply