Engine testing with opening books - why?

kranium · Post by **kranium** » Mon Dec 15, 2008 10:51 am

I don't understand why many (or most) chess engine test tournaments and test matches are run using a common opening book...?

I can't see any benefit...
consider:

Any opening book (even the best constructed) will contain lines that are superior (or inferior) to other lines in the book.

Doesn't this introduce chance (and therefore possible innacuracy) into the test match, and its results? i.e. one engine may end up in opening positions which are in fact inferior to other positions in the book. These positions are forced upon the engine. (on the other hand, another engine might get lucky and receive a multitude of superior positions!).

what about:

1) using the most common starting positions (or perhaps Nunn, Noomen, etc.) and each pairing plays 2 games (one as white and one as black) from the start position.

or (better yet, IMHO)

2) simply letting the engines play their own opening moves ...i.e. let the engine make 100% of the moves without opening book (instead of being forced into a particular position (randomly?) chosen by the opening book).

Norm

Graham Banks · Post by **Graham Banks** » Mon Dec 15, 2008 11:06 am

Ideally when using a common opening book, it is best to let each engine play from the particular opening position that arises as both White and Black.

Kirill Kryukov · Post by **Kirill Kryukov** » Mon Dec 15, 2008 12:34 pm

kranium wrote:I don't understand why many (or most) chess engine test tournaments and test matches are run using a common opening book...?

I can't see any benefit...
consider:

Any opening book (even the best constructed) will contain lines that are superior (or inferior) to other lines in the book.

Doesn't this introduce chance (and therefore possible innacuracy) into the test match, and its results? i.e. one engine may end up in opening positions which are in fact inferior to other positions in the book. These positions are forced upon the engine. (on the other hand, another engine might get lucky and receive a multitude of superior positions!).

Yes it does introduce inaccuracy, but I think it is still better than the alternatives. See below.

kranium wrote:what about:

1) using the most common starting positions (or perhaps Nunn, Noomen, etc.) and each pairing plays 2 games (one as white and one as black) from the start position.

An engine can be tuned to play well from Nunn positions. Simple way is to just test and make sure the engine does not play bad moves from those positions. More sophisticated tuning can use a built-in mini book for those positions. This could be made to be very difficult to detect (in a closed source engine), and it would give substantial advantage to such engine in a tournament that uses Nunn positions. This is one of the ways how using a small set of positions can distort the ratings. Note that this distortion is not statistical noise, so it won't disappear when you run more games. If I will ever write an engine of my own, I will make sure that it plays well (compared to its normal play) from the popular testing positions.

I use broad opening book in my tests. It is similar to using a subset from a very large set of positions. This method may have more noise, as there are some unbalanced positions, but it is relatively free from distortion. At least it would take much more efforts to tune an engine for all of those positions, or to make a hidden built-in book for those positions.

kranium wrote:or (better yet, IMHO)

2) simply letting the engines play their own opening moves ...i.e. let the engine make 100% of the moves without opening book (instead of being forced into a particular position (randomly?) chosen by the opening book).

Norm

Engines tend to play the same moves in the same position (unless using some Monte Carlo mode). So this could work if you need just 2 games per pair of engines. If you need 30 or 100 games, it will seriously distort the results. (The games may be not completely identical due to random timing differeces). Usually we assume that the results of two different games are independent from each other. However in this case the results will be very far from being independent.

I think it is important to realize that there is no absolute "playing strength". Playing strength can be defined only for specific conditions, which include the set of opponents, time control, opening book, etc.. So when a tester is running an engine-engine match, he has to be aware that he is not measuring some absolute strength, but only strength under specific conditions he is using. With this in mind, a tester should always ask himself "What do I learn from my tests?".

I think a good test should give results that are relevant in the context in which people are using engines. Most of people are using engines to analyze some positions. Advanced chess (correspondence, freestyle, etc), opening preparation, analyzing your own games or games of your opponents - these are typical uses of engines by many people. So I try do my testing in a way that the results are meaningful for these people.

When you suggest a different testing method, a good idea would be to provide a rationale. What applications or uses are relevant to the results of your tests? Who should be interested in your results? Coming back to your suggestion to test from starting position. How many people do you know that actually let engines think in starting position of chess?

Note that from this point of view, engine tests with own books are useless to most engine users. This includes WCCC, SSDF. These tests are not answering the question which engine is better for the user, but simply allow programmers to compete, which is a totally different perspective. I think it is important to make distinction between these two perspectives. A competition of programmers may be also fun to follow, but the results are not very directly related to the user's perspective. Another thing to consider when discussing approaches to engine testing.

Best,
Kirill

kranium · Post by **kranium** » Mon Dec 15, 2008 2:53 pm

thanks Kirill for your detailed response,
what you say makes sense.

I guess Grant's suggestion is the solution really:

Graham Banks wrote:Ideally when using a common opening book, it is best to let each engine play from the particular opening position that arises as both White and Black.

As far as you know, can this be done with most (or any) GUIs or tournament managers?

Gandalf · Post by **Gandalf** » Mon Dec 15, 2008 3:35 pm

kranium wrote:
Graham Banks wrote:Ideally when using a common opening book, it is best to let each engine play from the particular opening position that arises as both White and Black.
As far as you know, can this be done with most (or any) GUIs or tournament managers?

ChessBase 9 does this by default when running an engine-engine match.

M ANSARI · Post by **M ANSARI** » Mon Dec 15, 2008 4:08 pm

Yes, if you were to use as a standard one set of opening positions, you will encourage an engine make to tune his engine to that particular set which could not give you true insight of engine strength. Best is to use many different opening books from reputable authors and truncate the books at 7 moves or 10 moves or even 20 moves.

Personally I also like to look at the games played to see if I can identify any certain weakness or strength. For example after looking at many losses of N4, it is clear to me that the next generation of N4 will be quite a bit stronger since N4 seems to have a weakness that can quickly be fixed. N4 does not have accurate assessment of when connected pawns are stronger than pieces ... at least not as accurate as R3. That should not be too difficult to fix, but at the moment it is causing many losses to R3.

Dr.Wael Deeb · Post by **Dr.Wael Deeb** » Mon Dec 15, 2008 5:25 pm

kranium wrote:I don't understand why many (or most) chess engine test tournaments and test matches are run using a common opening book...?

I can't see any benefit...consider:

Any opening book (even the best constructed) will contain lines that are superior (or inferior) to other lines in the book.

Doesn't this introduce chance (and therefore possible innacuracy) into the test match, and its results? i.e. one engine may end up in opening positions which are in fact inferior to other positions in the book. These positions are forced upon the engine. (on the other hand, another engine might get lucky and receive a multitude of superior positions!).

what about:

1) using the most common starting positions (or perhaps Nunn, Noomen, etc.) and each pairing plays 2 games (one as white and one as black) from the start position.

or (better yet, IMHO)

2) simply letting the engines play their own opening moves ...i.e. let the engine make 100% of the moves without opening book (instead of being forced into a particular position (randomly?) chosen by the opening book).

Norm

Well,shall I open my big mouth or better keep it closed

I for one test all engines in my rating list,still private with there own opening books....
I use generic books only for the bookless engines,because........Sorry,I have to go now,I see Graham running toward me with an axe in his hand

Running for life regards,
Dr.D

Dr.Wael Deeb · Post by **Dr.Wael Deeb** » Mon Dec 15, 2008 5:27 pm

kranium wrote:thanks Kirill for your detailed response,
what you say makes sense.

I guess Grant's suggestion is the solution really:

Graham Banks wrote:Ideally when using a common opening book, it is best to let each engine play from the particular opening position that arises as both White and Black.
As far as you know, can this be done with most (or any) GUIs or tournament managers?

It can be done for sure under the ChessBase GUI....

Dr.Wael Deeb · Post by **Dr.Wael Deeb** » Mon Dec 15, 2008 5:31 pm

M ANSARI wrote:Yes, if you were to use as a standard one set of opening positions, you will encourage an engine make to tune his engine to that particular set which could not give you true insight of engine strength. Best is to use many different opening books from reputable authors and truncate the books at 7 moves or 10 moves or even 20 moves.

Personally I also like to look at the games played to see if I can identify any certain weakness or strength. For example after looking at many losses of N4, it is clear to me that the next generation of N4 will be quite a bit stronger since N4 seems to have a weakness that can quickly be fixed. N4 does not have accurate assessment of when connected pawns are stronger than pieces ... at least not as accurate as R3. That should not be too difficult to fix, but at the moment it is causing many losses to R3.

Don't know Majd,but I think that Naum is the engine making the biggest leaps toward Rybka and could be a potential candidate for knocking it down in the rating lists

Dann Corbit · Post by **Dann Corbit** » Mon Dec 15, 2008 9:29 pm

kranium wrote:I don't understand why many (or most) chess engine test tournaments and test matches are run using a common opening book...?

I can't see any benefit...
consider:

Any opening book (even the best constructed) will contain lines that are superior (or inferior) to other lines in the book.

Doesn't this introduce chance (and therefore possible innacuracy) into the test match, and its results? i.e. one engine may end up in opening positions which are in fact inferior to other positions in the book. These positions are forced upon the engine. (on the other hand, another engine might get lucky and receive a multitude of superior positions!).

what about:

1) using the most common starting positions (or perhaps Nunn, Noomen, etc.) and each pairing plays 2 games (one as white and one as black) from the start position.

or (better yet, IMHO)

2) simply letting the engines play their own opening moves ...i.e. let the engine make 100% of the moves without opening book (instead of being forced into a particular position (randomly?) chosen by the opening book).

Norm

If we are going to advance chess theory, opening books is a good way to go. We find out what works and what doesn't. It's one of the few places in chess where the human brain is truly useful, even in comparison with the computer.

Engine testing with opening books - why?

Engine testing with opening books - why?

Re: Engine testing with opening books - why?

Re: Engine testing with opening books - why?

Re: Engine testing with opening books - why?

Re: Engine testing with opening books - why?

Re: Engine testing with opening books - why?

Re: Engine testing with opening books - why?

Re: Engine testing with opening books - why?

Re: Engine testing with opening books - why?

Re: Engine testing with opening books - why?