Serial insignificance

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
hgm
Posts: 28472
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Serial insignificance

Post by hgm »

I am repairing a lot of minor bugs in one of my engines, and I always test the fixed version in an 800-game match against the previous version, to guard against the possibility that I have inadvertantly broken something. Usually the fixed version is not significantly worse, and more often than not it beats the old version, but by an insignificant lead. This brings me in a position where I have a long chain of versions A, B, C, ..., which compare as B >~ A, C >~ B, D >~ C, ..., none of the >~ really significant.

It struck me, however, that if the chain is long enough, the comparison becomes significant. E.g. if each next version would score half a standard deviation SD (of a single match) better than the previous one, and I would do that 16 times, the accumulated empirical difference is now 16 x SD/2 = 8 SD, while the accumulated statistical error in that difference is now sqrt(16) SD = 4 SD (as for a sum of independently measured quantities the variances add, not the standard deviations). So the chances that this 16th fix is better than the original one is now two times the statistical error in the accumulated difference, which is significant.

I never realized that before, and it actually seems pretty funny! 8-)
Rein Halbersma
Posts: 771
Joined: Tue May 22, 2007 11:13 am

Re: Serial insignificance

Post by Rein Halbersma »

hgm wrote:I am repairing a lot of minor bugs in one of my engines, and I always test the fixed version in an 800-game match against the previous version, to guard against the possibility that I have inadvertantly broken something. Usually the fixed version is not significantly worse, and more often than not it beats the old version, but by an insignificant lead. This brings me in a position where I have a long chain of versions A, B, C, ..., which compare as B >~ A, C >~ B, D >~ C, ..., none of the >~ really significant.

It struck me, however, that if the chain is long enough, the comparison becomes significant. E.g. if each next version would score half a standard deviation SD (of a single match) better than the previous one, and I would do that 16 times, the accumulated empirical difference is now 16 x SD/2 = 8 SD, while the accumulated statistical error in that difference is now sqrt(16) SD = 4 SD (as for a sum of independently measured quantities the variances add, not the standard deviations). So the chances that this 16th fix is better than the original one is now two times the statistical error in the accumulated difference, which is significant.

I never realized that before, and it actually seems pretty funny! 8-)
It's reminiscent of some bird species around the Arctic circle: every few hundred miles or so there is new subspecies that can still interbreed with its neighbors, but the ones at Alaska and in Siberia on both ends of the Bering straight can't. This phenomenon pops up everywhere there is a continuous chain of small differences where all of a sudden the accumulated difference is noticable (e.g. beer labels change every year, and you generally don't notice, unless you compare bottles from say 10 years difference).