lucasart wrote:
Sven Schüle wrote:
Kempelen wrote:
 Ferdy wrote: Your methodology is favorable considering your goal. There is no point testing 1. b3 e5 positions if it is not in your engines repertoire. Only use test positions where you want your engine to be. Of course there are drawbacks, but that can be overcomed as you say using large number of selected positions. Perhaps start from smaller number of positions, and as your engine able to improve from it, then add other positions, to be considered in its repertoire. But I have a bad feeling about it, to me the engine should be able to handle all positions, it can be blocked, open, full of pinned pieces, etc.

I think you misunderstand my idea. The goal is not test only a limited set of opening positions, but a large and varied set of starting middle-game positions. The point is repeating always the same games with the same positions, but enought positions to say the engine is played a varied.

The point is, the positions are selected once by random but then always the same positions are used for testing. That's exactly what Bob is doing for a long while now, and also lots of other people, so it is not a new method but kind of "de facto standard". I recall there were long discussions about the details few years ago. Doing it that way instead of newly choosing different positions by random each time has been found to result in lower error bars as far as I remember. I guess Bob and the other experts in statistics can explain the exact reasons.

Sven

It seems pretty obvious that it lowers the error bar. In fact the whole estimation model implicitly assumes that you do this.

Let's say that the score of engine A vs B is distributed under a probablity law P(mu,sigma) with mean mu and stdev sigma. That means that given equal chances from the starting position the distribution of the result should be P(mu,sigma). However if the position is chosen that favors A or B, then the distribution will be sth like Q(position)P(mu,sigma) where Q(position) is centered around 1 and is more or less depending on whether A or B is favored. the fact that E(Q)=1 may still ensure an unbiaised estimator, but with a higher variance...

No need to be an expert in statistics to understand it, at least intuitively. You can write it cleanly too, and it isn't hard!

PS: please no ball busting on the details, I purposly made the math notations oversimplistic.

I think this is not about positions favoring either side A or B, the selected positions have to be "balanced". Instead, it is all about
a) always using the same set of starting positions (for each single "test tournament"), or
b) repeating the step of choosing a set of starting positions for each "test tournament".

The statement was then that method a) would result in lower error bars, which is not my own statement but which is what I recall was mentioned by someone else in the past.

Sven
