Statistical Significance

Discussion of chess software programming and technical issues.

Moderator: Ras

CRoberson
Posts: 2091
Joined: Mon Mar 13, 2006 2:31 am
Location: North Carolina, USA

Statistical Significance

Post by CRoberson »

As I've stated before, I get much more repeatable results from
my tests than Bob or some of the rest of you.

At least I used to.

Then I added two features: easy move and failing low timer extension.
I think the timer extension is the issue.

Bob,
You might try turning that off in Crafty and testing the statistical
significance again to see if that is indeed the culprit.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Statistical Significance

Post by bob »

CRoberson wrote:As I've stated before, I get much more repeatable results from
my tests than Bob or some of the rest of you.

At least I used to.

Then I added two features: easy move and failing low timer extension.
I think the timer extension is the issue.

Bob,
You might try turning that off in Crafty and testing the statistical
significance again to see if that is indeed the culprit.
It isn't.

In fact, I have run thousands of games where the search is limited by number of nodes. For example, limit the game to 3,000,000 nodes per search for crafty vs crafty. Then re-run the same 160 games with 3,001,000 (1,000 nodes more) and the results vary significantly game to game...

The issue is timing. If your program searches 1M nodes per second, the operating system can't provide anywhere near 1ms timing accuracy, so your search will vary by well over 1,000 nodes per search, which will lead to different results for one or more moves in the game,and that is all it takes to change the result.
User avatar
hgm
Posts: 28353
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Statistical Significance

Post by hgm »

It depends on the engine. I tested this for uMax 1.6 vs Eden 0.11, and on the average the first 40 moves of each game where the same. As many games (starting counting from the Silver positions) lasted shorter than 40 moves, it means many games were identically repeated.

Can't test it with Joker, as Joker randomizes its moves even if you would play with the same node limit.