Reliable speed comparison: some math required

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
syzygy
Posts: 4376
Joined: Tue Feb 28, 2012 10:56 pm

Re: Reliable speed comparison: some math required

Post by syzygy » Tue Feb 27, 2018 9:27 am

Kotlov wrote:I also noticed that the first test will be faster than the next one, it probably depends on the temperature of the processor.
Probably turboboost that only works for a short time.

It can also be exactly the other way around because of cpu scaling (it costs time to go from 1.2Ghz to 4.2Ghz).

User avatar
lucasart
Posts: 3023
Joined: Mon May 31, 2010 11:29 am
Full name: lucasart
Contact:

Re: Reliable speed comparison: some math required

Post by lucasart » Tue Feb 27, 2018 9:29 am

syzygy wrote:Testing in parallel is only more noisy
Did you verify this hypothesis of yours with empirical data ? I suggest you try…
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.

User avatar
Kotlov
Posts: 197
Joined: Fri Jul 10, 2015 7:23 pm
Location: Russia

Re: Reliable speed comparison: some math required

Post by Kotlov » Tue Feb 27, 2018 9:54 am

I use something like this:
Image
For example, this picture is typical for a slight speed improvement.
(third column)

BeyondCritics
Posts: 336
Joined: Sat May 05, 2012 12:48 pm
Location: Bergheim

Re: Reliable speed comparison: some math required

Post by BeyondCritics » Tue Feb 27, 2018 2:33 pm

Kotlov wrote:I use something like this:
Image
For example, this picture is typical for a slight speed improvement.
(third column)
This looks extremely sophisticated :-)
Can you explain what exactly you are doing here? This would be of interest to other developers!

BeyondCritics
Posts: 336
Joined: Sat May 05, 2012 12:48 pm
Location: Bergheim

Re: Reliable speed comparison: some math required

Post by BeyondCritics » Tue Feb 27, 2018 2:55 pm

I think the most efficient setup, is first to measure mean and variance of your chosen test metric very carefully for "MASTER" and then later use a simple "z-test" for "NEW" variants.
It has been already discussed, that you need a dedicated machine for that, since otherwise you will be seriously hampered by fluctuating variance.

syzygy
Posts: 4376
Joined: Tue Feb 28, 2012 10:56 pm

Re: Reliable speed comparison: some math required

Post by syzygy » Tue Feb 27, 2018 7:02 pm

lucasart wrote:
mar wrote:I typically do something very simple: n runs for each version, pick the fastest one for each, then simply compare.
Not very scientific but works well for me.
In theory, with noisy observations, it's best to choose the median. It's a more robust statistic than the max. By the way, that's what I do in my engine (median of 5 runs).
The only effect noise can have here is to decrease speed; a bench is not going to ruin faster if the OS interrupts it more often. So max speed corresponds to the least noisy measurement.

AlvaroBegue
Posts: 913
Joined: Tue Mar 09, 2010 2:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: Reliable speed comparison: some math required

Post by AlvaroBegue » Tue Feb 27, 2018 7:10 pm

syzygy wrote: The only effect noise can have here is to decrease speed; a bench is not going to ruin faster if the OS interrupts it more often. So max speed corresponds to the least noisy measurement.
I was about to post that. The only problem with this is Turbo Boost. If you can disable it, that's probably best. If not, you can run your program a few times in a row. The first execution could be benefiting from Turbo Boost, so you can discard its measurement. Then pick the lowest running time of the lot.

That has worked well for me in the past.

syzygy
Posts: 4376
Joined: Tue Feb 28, 2012 10:56 pm

Re: Reliable speed comparison: some math required

Post by syzygy » Tue Feb 27, 2018 7:42 pm

AlvaroBegue wrote:
syzygy wrote:The only effect noise can have here is to decrease speed; a bench is not going to ruin faster if the OS interrupts it more often. So max speed corresponds to the least noisy measurement.
I was about to post that. The only problem with this is Turbo Boost. If you can disable it, that's probably best. If not, you can run your program a few times in a row. The first execution could be benefiting from Turbo Boost, so you can discard its measurement. Then pick the lowest running time of the lot.

That has worked well for me in the past.
The lim sup should do :)

My desktop PC maintains turboboost speed for an indefinite period of time. The main problem is cpu scaling, but this can be overcome on Linux with "cpupower frequency-set -g performance". But running many benches in a row and taking the lim sup should also solve that.

What remains (apart from background processes) is cpu throttling on laptops. If the laptop heats up too much, it will clock down.

zullil
Posts: 5130
Joined: Mon Jan 08, 2007 11:31 pm
Location: PA USA
Full name: Louis Zulli

Re: Reliable speed comparison: some math required

Post by zullil » Tue Feb 27, 2018 8:05 pm

syzygy wrote:
My desktop PC maintains turboboost speed for an indefinite period of time. The main problem is cpu scaling, but this can be overcome on Linux with "cpupower frequency-set -g performance".
On Xeon boxes running Linux, the only approach that has proved reliable for me is to enable turboboost in BIOS but also to write 100 in the following file:

/sys/devices/system/cpu/intel_pstate/min_perf_pct

Finding decent documentation on Intel p-states was a challenge, though I haven't tried recently.

syzygy
Posts: 4376
Joined: Tue Feb 28, 2012 10:56 pm

Re: Reliable speed comparison: some math required

Post by syzygy » Tue Feb 27, 2018 8:41 pm

lucasart wrote:
syzygy wrote:Testing in parallel is only more noisy
Did you verify this hypothesis of yours with empirical data ? I suggest you try…
I took the liberty to create some noise:

Code: Select all

run       base       test     diff
  1    1929540    2652016  +722476
  2    1929540    2657409  +727869
  3    2645305    1925985  -719320
  4    2670988    2639961   -31027
  5    2665540    2669624    +4084
  6    2625376    2666900   +41524
  7    2604446    2604446       +0
  8    2673720    2664181    -9539
  9    2637297    2006573  -630724
 10    1930965    2662824  +731859
 11    2670988    2670988       +0
 12    2672353    2677829    +5476
 13    2668261    2672353    +4092
 14    2660113    2666900    +6787
 15    2670988    2639961   -31027
 16    2444866    2460981   +16115
 17    2574937    2656058   +81121
 18    1926695    2653362  +726667
 19    1921736    2669624  +747888
 20    1926695    2656058  +729363

Result of  20 runs
==================
base (cfish          ) =    2422517  +/- 147234
test (cfish          ) =    2578702  +/- 94082
diff                   =    +156184  +/- 191677

speedup        = +0.0645
P(speedup > 0) =  0.9446

CPU: 6 x Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz
Hyperthreading: on

Post Reply