Would an Oracle Test-book be possible?

voyagerOne · Post by **voyagerOne** » Wed May 13, 2015 5:12 pm

Mostly a theoretical question.

An Oracle opening book is a book that contains NO more than 100 openings.

Basically, you test two engine with this book.

Based on the score results of the 200 games...it will always determine with 99% accuracy which engine is superior...or if ELO difference is negligible.

So instead of testing 100,000s of games...one only needs to run 200 games.
............

My first thought of an Oracle book is to use the first 20 opening moves. (Perft 1)

hgm · Post by **hgm** » Wed May 13, 2015 6:01 pm

It depends on what you consider 'negligible'.

With 200 games you will have a 40%/sqrt(200) = 2.8% standard deviation, or about 20 Elo. To have 99% confidence you need about 2.5 STD, or 50 Elo. As 50 Elo is indeed a negligible Elo difference, I guess most sets of 100 openings that are sufficiently different will do fine.

voyagerOne · Post by **voyagerOne** » Wed May 13, 2015 7:19 pm

What I am trying to say is that SF Test framework plays tens of thousands of games to determine if a patch should be approved or not.

Instead of playing all those games...the Oracle book can make the same decision by only playing 200 games.

What I mean by negligible is ELO between the two engines are +/- 1-2 ELO.

hgm · Post by **hgm** » Wed May 13, 2015 7:33 pm

Well, for 1 Elo with 99% confidence, you will need 200,000 games. If the book is perfect (i.e. if the initial positions are all sufficiently different to guarantee independent games). This cannot be helped.

Robert Pope · Post by **Robert Pope** » Wed May 13, 2015 8:00 pm

voyagerOne wrote:What I am trying to say is that SF Test framework plays tens of thousands of games to determine if a patch should be approved or not.

Instead of playing all those games...the Oracle book can make the same decision by only playing 200 games.

What I mean by negligible is ELO between the two engines are +/- 1-2 ELO.

I think the other piece of the equation is the game score.

With 200 games, you can determine that A is better than B, but it has to score 55% to do so. With 1000 games, you prove the same thing with a 52% score, but that is actually a lower hurdle because the random variation is lower.

The fewer games you test with, the more spectacularly you have to dominate the opponent in order to prove actual superiority.

voyagerOne · Post by **voyagerOne** » Wed May 13, 2015 8:27 pm

The key word here is "Oracle".

hgm · Post by **hgm** » Wed May 13, 2015 8:44 pm

You cannot make an oracle from opening lines anymore than you can make a nuclear reactor by stacking ice cubes. And if you had a true oracle, you would need no lines at all. You would just feed it the names, and ask which was better.

Would an Oracle Test-book be possible?

Would an Oracle Test-book be possible?

Re: Would an Oracle Test-book be possible?

Re: Would an Oracle Test-book be possible?

Re: Would an Oracle Test-book be possible?

Re: Would an Oracle Test-book be possible?

Re: Would an Oracle Test-book be possible?

Re: Would an Oracle Test-book be possible?