Serial insignificance

hgm · Post by **hgm** » Fri Aug 02, 2013 1:41 pm

I am repairing a lot of minor bugs in one of my engines, and I always test the fixed version in an 800-game match against the previous version, to guard against the possibility that I have inadvertantly broken something. Usually the fixed version is not significantly worse, and more often than not it beats the old version, but by an insignificant lead. This brings me in a position where I have a long chain of versions A, B, C, ..., which compare as B >~ A, C >~ B, D >~ C, ..., none of the >~ really significant.

It struck me, however, that if the chain is long enough, the comparison becomes significant. E.g. if each next version would score half a standard deviation SD (of a single match) better than the previous one, and I would do that 16 times, the accumulated empirical difference is now 16 x SD/2 = 8 SD, while the accumulated statistical error in that difference is now sqrt(16) SD = 4 SD (as for a sum of independently measured quantities the variances add, not the standard deviations). So the chances that this 16th fix is better than the original one is now two times the statistical error in the accumulated difference, which is significant.

I never realized that before, and it actually seems pretty funny!

Rein Halbersma · Post by **Rein Halbersma** » Fri Aug 02, 2013 1:52 pm

hgm wrote:I am repairing a lot of minor bugs in one of my engines, and I always test the fixed version in an 800-game match against the previous version, to guard against the possibility that I have inadvertantly broken something. Usually the fixed version is not significantly worse, and more often than not it beats the old version, but by an insignificant lead. This brings me in a position where I have a long chain of versions A, B, C, ..., which compare as B >~ A, C >~ B, D >~ C, ..., none of the >~ really significant.

It struck me, however, that if the chain is long enough, the comparison becomes significant. E.g. if each next version would score half a standard deviation SD (of a single match) better than the previous one, and I would do that 16 times, the accumulated empirical difference is now 16 x SD/2 = 8 SD, while the accumulated statistical error in that difference is now sqrt(16) SD = 4 SD (as for a sum of independently measured quantities the variances add, not the standard deviations). So the chances that this 16th fix is better than the original one is now two times the statistical error in the accumulated difference, which is significant.

I never realized that before, and it actually seems pretty funny!

It's reminiscent of some bird species around the Arctic circle: every few hundred miles or so there is new subspecies that can still interbreed with its neighbors, but the ones at Alaska and in Siberia on both ends of the Bering straight can't. This phenomenon pops up everywhere there is a continuous chain of small differences where all of a sudden the accumulated difference is noticable (e.g. beer labels change every year, and you generally don't notice, unless you compare bottles from say 10 years difference).

Serial insignificance

Serial insignificance

Re: Serial insignificance