Testing question

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Kempelen
Posts: 620
Joined: Fri Feb 08, 2008 10:44 am
Location: Madrid - Spain

Testing question

Post by Kempelen »

Last week I was testing differents futility margin values for Rodin. My tests (2 tournaments of 2200 games agains 11 oponents for two versions of Rodin) give the same result, so I was not able to distinguist between futility margen A and B. Seen this I deceided to disable search features (LMR, NULL move, etc....) and run the tournaments again with weaker opponents..... For my surprise now the 2 results were clearly good and I saw what futility margin is best.

As disabling a few features the results were more conclusive, my question is: why dont test small elo gain features in this way, where result are more visibles? sure it maybe has drawbacks. which?

thans
Fermin Serrano
Author of 'Rodin' engine
http://sites.google.com/site/clonfsp/
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Testing question

Post by bob »

Kempelen wrote:Last week I was testing differents futility margin values for Rodin. My tests (2 tournaments of 2200 games agains 11 oponents for two versions of Rodin) give the same result, so I was not able to distinguist between futility margen A and B. Seen this I deceided to disable search features (LMR, NULL move, etc....) and run the tournaments again with weaker opponents..... For my surprise now the 2 results were clearly good and I saw what futility margin is best.

As disabling a few features the results were more conclusive, my question is: why dont test small elo gain features in this way, where result are more visibles? sure it maybe has drawbacks. which?

thans
The reason is this. You are testing as if there is no correlation between the terms you turn off and the one you are adjusting. But what if they do interact? Turning things off is not a good idea unless you are _certain_ there is no possible interaction between the two things. If you disable some time allocation stuff, you can be pretty sure it won't effect evaluation terms, and vice-versa. But if you are fiddling with search (which futility certainly is) then turning things on or off can easily produce misleading or completely wrong results.