Engine testing with opening books - why?

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

kranium
Posts: 2130
Joined: Thu May 29, 2008 10:43 am

Engine testing with opening books - why?

Post by kranium »

I don't understand why many (or most) chess engine test tournaments and test matches are run using a common opening book...?

I can't see any benefit...
consider:

Any opening book (even the best constructed) will contain lines that are superior (or inferior) to other lines in the book.

Doesn't this introduce chance (and therefore possible innacuracy) into the test match, and its results? i.e. one engine may end up in opening positions which are in fact inferior to other positions in the book. These positions are forced upon the engine. (on the other hand, another engine might get lucky and receive a multitude of superior positions!).

what about:

1) using the most common starting positions (or perhaps Nunn, Noomen, etc.) and each pairing plays 2 games (one as white and one as black) from the start position.

or (better yet, IMHO)

2) simply letting the engines play their own opening moves ...i.e. let the engine make 100% of the moves without opening book (instead of being forced into a particular position (randomly?) chosen by the opening book).

Norm
User avatar
Graham Banks
Posts: 45296
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: Engine testing with opening books - why?

Post by Graham Banks »

Ideally when using a common opening book, it is best to let each engine play from the particular opening position that arises as both White and Black.
gbanksnz at gmail.com
User avatar
Kirill Kryukov
Posts: 519
Joined: Sun Mar 19, 2006 4:12 am
Full name: Kirill Kryukov

Re: Engine testing with opening books - why?

Post by Kirill Kryukov »

kranium wrote:I don't understand why many (or most) chess engine test tournaments and test matches are run using a common opening book...?

I can't see any benefit...
consider:

Any opening book (even the best constructed) will contain lines that are superior (or inferior) to other lines in the book.

Doesn't this introduce chance (and therefore possible innacuracy) into the test match, and its results? i.e. one engine may end up in opening positions which are in fact inferior to other positions in the book. These positions are forced upon the engine. (on the other hand, another engine might get lucky and receive a multitude of superior positions!).
Yes it does introduce inaccuracy, but I think it is still better than the alternatives. See below.
kranium wrote:what about:

1) using the most common starting positions (or perhaps Nunn, Noomen, etc.) and each pairing plays 2 games (one as white and one as black) from the start position.
An engine can be tuned to play well from Nunn positions. Simple way is to just test and make sure the engine does not play bad moves from those positions. More sophisticated tuning can use a built-in mini book for those positions. This could be made to be very difficult to detect (in a closed source engine), and it would give substantial advantage to such engine in a tournament that uses Nunn positions. This is one of the ways how using a small set of positions can distort the ratings. Note that this distortion is not statistical noise, so it won't disappear when you run more games. If I will ever write an engine of my own, I will make sure that it plays well (compared to its normal play) from the popular testing positions.

I use broad opening book in my tests. It is similar to using a subset from a very large set of positions. This method may have more noise, as there are some unbalanced positions, but it is relatively free from distortion. At least it would take much more efforts to tune an engine for all of those positions, or to make a hidden built-in book for those positions.
kranium wrote:or (better yet, IMHO)

2) simply letting the engines play their own opening moves ...i.e. let the engine make 100% of the moves without opening book (instead of being forced into a particular position (randomly?) chosen by the opening book).

Norm
Engines tend to play the same moves in the same position (unless using some Monte Carlo mode). So this could work if you need just 2 games per pair of engines. If you need 30 or 100 games, it will seriously distort the results. (The games may be not completely identical due to random timing differeces). Usually we assume that the results of two different games are independent from each other. However in this case the results will be very far from being independent.

I think it is important to realize that there is no absolute "playing strength". Playing strength can be defined only for specific conditions, which include the set of opponents, time control, opening book, etc.. So when a tester is running an engine-engine match, he has to be aware that he is not measuring some absolute strength, but only strength under specific conditions he is using. With this in mind, a tester should always ask himself "What do I learn from my tests?".

I think a good test should give results that are relevant in the context in which people are using engines. Most of people are using engines to analyze some positions. Advanced chess (correspondence, freestyle, etc), opening preparation, analyzing your own games or games of your opponents - these are typical uses of engines by many people. So I try do my testing in a way that the results are meaningful for these people.

When you suggest a different testing method, a good idea would be to provide a rationale. What applications or uses are relevant to the results of your tests? Who should be interested in your results? Coming back to your suggestion to test from starting position. How many people do you know that actually let engines think in starting position of chess?

Note that from this point of view, engine tests with own books are useless to most engine users. This includes WCCC, SSDF. These tests are not answering the question which engine is better for the user, but simply allow programmers to compete, which is a totally different perspective. I think it is important to make distinction between these two perspectives. A competition of programmers may be also fun to follow, but the results are not very directly related to the user's perspective. Another thing to consider when discussing approaches to engine testing.

Best,
Kirill
kranium
Posts: 2130
Joined: Thu May 29, 2008 10:43 am

Re: Engine testing with opening books - why?

Post by kranium »

thanks Kirill for your detailed response,
what you say makes sense.

I guess Grant's suggestion is the solution really:
Graham Banks wrote:Ideally when using a common opening book, it is best to let each engine play from the particular opening position that arises as both White and Black.
As far as you know, can this be done with most (or any) GUIs or tournament managers?
Gandalf

Re: Engine testing with opening books - why?

Post by Gandalf »

kranium wrote:
Graham Banks wrote:Ideally when using a common opening book, it is best to let each engine play from the particular opening position that arises as both White and Black.
As far as you know, can this be done with most (or any) GUIs or tournament managers?
ChessBase 9 does this by default when running an engine-engine match.
User avatar
M ANSARI
Posts: 3734
Joined: Thu Mar 16, 2006 7:10 pm

Re: Engine testing with opening books - why?

Post by M ANSARI »

Yes, if you were to use as a standard one set of opening positions, you will encourage an engine make to tune his engine to that particular set which could not give you true insight of engine strength. Best is to use many different opening books from reputable authors and truncate the books at 7 moves or 10 moves or even 20 moves.

Personally I also like to look at the games played to see if I can identify any certain weakness or strength. For example after looking at many losses of N4, it is clear to me that the next generation of N4 will be quite a bit stronger since N4 seems to have a weakness that can quickly be fixed. N4 does not have accurate assessment of when connected pawns are stronger than pieces ... at least not as accurate as R3. That should not be too difficult to fix, but at the moment it is causing many losses to R3.
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: Engine testing with opening books - why?

Post by Dr.Wael Deeb »

kranium wrote:I don't understand why many (or most) chess engine test tournaments and test matches are run using a common opening book...?

I can't see any benefit...
consider:

Any opening book (even the best constructed) will contain lines that are superior (or inferior) to other lines in the book.

Doesn't this introduce chance (and therefore possible innacuracy) into the test match, and its results? i.e. one engine may end up in opening positions which are in fact inferior to other positions in the book. These positions are forced upon the engine. (on the other hand, another engine might get lucky and receive a multitude of superior positions!).

what about:

1) using the most common starting positions (or perhaps Nunn, Noomen, etc.) and each pairing plays 2 games (one as white and one as black) from the start position.

or (better yet, IMHO)

2) simply letting the engines play their own opening moves ...i.e. let the engine make 100% of the moves without opening book (instead of being forced into a particular position (randomly?) chosen by the opening book).

Norm
Well,shall I open my big mouth or better keep it closed :lol: :?:
I for one test all engines in my rating list,still private with there own opening books....
I use generic books only for the bookless engines,because........Sorry,I have to go now,I see Graham running toward me with an axe in his hand :shock:
Running for life regards,
Dr.D
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: Engine testing with opening books - why?

Post by Dr.Wael Deeb »

kranium wrote:thanks Kirill for your detailed response,
what you say makes sense.

I guess Grant's suggestion is the solution really:
Graham Banks wrote:Ideally when using a common opening book, it is best to let each engine play from the particular opening position that arises as both White and Black.
As far as you know, can this be done with most (or any) GUIs or tournament managers?
It can be done for sure under the ChessBase GUI....
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: Engine testing with opening books - why?

Post by Dr.Wael Deeb »

M ANSARI wrote:Yes, if you were to use as a standard one set of opening positions, you will encourage an engine make to tune his engine to that particular set which could not give you true insight of engine strength. Best is to use many different opening books from reputable authors and truncate the books at 7 moves or 10 moves or even 20 moves.

Personally I also like to look at the games played to see if I can identify any certain weakness or strength. For example after looking at many losses of N4, it is clear to me that the next generation of N4 will be quite a bit stronger since N4 seems to have a weakness that can quickly be fixed. N4 does not have accurate assessment of when connected pawns are stronger than pieces ... at least not as accurate as R3. That should not be too difficult to fix, but at the moment it is causing many losses to R3.
Don't know Majd,but I think that Naum is the engine making the biggest leaps toward Rybka and could be a potential candidate for knocking it down in the rating lists 8-)
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
Dann Corbit
Posts: 12817
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Engine testing with opening books - why?

Post by Dann Corbit »

kranium wrote:I don't understand why many (or most) chess engine test tournaments and test matches are run using a common opening book...?

I can't see any benefit...
consider:

Any opening book (even the best constructed) will contain lines that are superior (or inferior) to other lines in the book.

Doesn't this introduce chance (and therefore possible innacuracy) into the test match, and its results? i.e. one engine may end up in opening positions which are in fact inferior to other positions in the book. These positions are forced upon the engine. (on the other hand, another engine might get lucky and receive a multitude of superior positions!).

what about:

1) using the most common starting positions (or perhaps Nunn, Noomen, etc.) and each pairing plays 2 games (one as white and one as black) from the start position.

or (better yet, IMHO)

2) simply letting the engines play their own opening moves ...i.e. let the engine make 100% of the moves without opening book (instead of being forced into a particular position (randomly?) chosen by the opening book).

Norm
If we are going to advance chess theory, opening books is a good way to go. We find out what works and what doesn't. It's one of the few places in chess where the human brain is truly useful, even in comparison with the computer.