estimating error bars in engine tournis-a problem sometimes?

ozziejoe · Post by **ozziejoe** » Sun May 20, 2007 4:27 am

According to my understanding, the error bars are based on number of games and draw rate. Engines are said to differ if their error bars do not overlap

I am wondering if this is an excessively conservative way of doign things, in some circumstances.

Consider the following two scenarios (Assume draw rates are constant. Assume also single processor for now. Finally, assume that we know the true ELO for engines X1 to X10, and they are equivilant).

Scenario 1: Rybka 2.1 and 2.2 play 500 games against the same five opponents (X1 to X5), using the same hardware, and the exact same openings. Variance due to factors other than engine strength is truely minimal

Scenario 2: Rybka 2.1 plays 500 games against oponent X1 to X5, and rybka 2.2 plays 500 games against oponent X6 to X10 (i.e., not same oponents). They also randomly sample openings, so openings that rybka 2.1 plays are not the same as the openings rybka 2.2 plays. Variance is performance is now in part due to the random sample of engines, and openings, as well as due to engine strength.

let's assume that in both cases 2.2 plays 10 games better than 2.1

In both cases, using the present methods for estimating error, the error bars will be the same, and the odds of saying the engines are different is the same. Yet a 10 game difference in scenario 1 seems much more impressive, because the methods in scenario 1 have minimized the variance of factors not due to engine strength.

Can anybody explain if my reasoning is wrong? (play nicely Uri

)

best
Joseph