The influence of books on test results.

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: The influence of books on test results.

Post by Adam Hair »

Laskos wrote:
Adam Hair wrote:
I have evidence that using a large set of positions without reversed colors is much better than using a small set of positions with reversed colors.
Even better is to use a large set of positions with reversed colours. Randomnes reduces the errors as 1/sqrt(N), but reducing it by 1/sqrt(N/2) using reversed colours is even better, as the errors due to "wrong" opening positions cancels much better using reversed colours. Your argument is probably valid for matches of millions of games or so.
That would depend on the percentage of "wrong" positions. If the percentage is low enough, then not using reversed colors wins out.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: The influence of books on test results.

Post by Laskos »

Adam Hair wrote:
Laskos wrote:
Adam Hair wrote:
I have evidence that using a large set of positions without reversed colors is much better than using a small set of positions with reversed colors.
Even better is to use a large set of positions with reversed colours. Randomnes reduces the errors as 1/sqrt(N), but reducing it by 1/sqrt(N/2) using reversed colours is even better, as the errors due to "wrong" opening positions cancels much better using reversed colours. Your argument is probably valid for matches of millions of games or so.
That would depend on the percentage of "wrong" positions. If the percentage is low enough, then not using reversed colors wins out.
Yes, and the number of pretty deterministic or other "wrong" openings is non-negligible (say 5 or 10%) even for engine-analysed positions. This is especially important when testing very closely matched engines, and many tests are about that.

Kai
User avatar
Houdini
Posts: 1471
Joined: Tue Mar 16, 2010 12:00 am

Re: The influence of books on test results.

Post by Houdini »

A consideration is that at very fast TC there's not a huge correlation between the 2 games played with reversed colors from the same position. For most purposes they can be considered as more or less independent games.

So I prefer playing reversed colors, as it certainly removes any systematic bias due to improper opening positions.

Robert
User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Re: The influence of books on test results.

Post by geots »

Adam Hair wrote:
Laskos wrote:
Adam Hair wrote:
I have evidence that using a large set of positions without reversed colors is much better than using a small set of positions with reversed colors.
Even better is to use a large set of positions with reversed colours. Randomnes reduces the errors as 1/sqrt(N), but reducing it by 1/sqrt(N/2) using reversed colours is even better, as the errors due to "wrong" opening positions cancels much better using reversed colours. Your argument is probably valid for matches of millions of games or so.
That would depend on the percentage of "wrong" positions. If the percentage is low enough, then not using reversed colors wins out.


Adam, Robert may be an exception, as for his reason. But in general I think it is a very bad idea for programmers to try to influence experienced and competent testers to "see the light about their preferences" for books, depth of openings,- anything concerning it. IMO, they are treading in an area that is none of their business. If they want their engine tested by a rating group or a single competent tester- they accept the rules the testers go by and refrain from arguing about methods.


george
lkaufman
Posts: 6258
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: The influence of books on test results.

Post by lkaufman »

Sedat Canbaz wrote:
lkaufman wrote:
velmarin wrote:That system used you, Chessbase book, book Schreder GUI,
What system?.

How many moves do you want? 4,5,6,8.

How many openings diverse needs.?

It can be prepared if you give the data.

I am preparing one of 8 movements in Fritz GUI, always ends with Black movement,
the idea that always begins think the white side .
We have always made our own test books, because we need at least 10,000 positions (to run 20,000 games). I don't know any publicly available test books like this, please tell me if there are any. I think CCRL and CEGT seem to use books averaging about 8 moves per side. All positions should be ones that have occurred a reasonable number of times in master play, so we can be pretty sure that White has just a fairly normal advantage. Which publicly available books come closest to meeting this description?
Dear Larry,

Just my two cents over this issue,

I don't suggest to be used a large opening book,where the engines will be played with 10,000 positions

Because i am afraid that Engines Elo performance will suffer due to such variety openings(there are many holes in a such huge database)

Best,
Sedat
Well, if we want samples of 20,000 games to bring the error bars down to the level of typical changes, how do you propose that we do that without 10,000 positions in the book?
I also don't quite know what you mean about the Engines Elo suffering from holes. Are you saying that since some of the positions may be winning or nearly so for one side, the result will push the Elo closer to zero than it should be? If that is your point, it is probably true, but I think it is a pretty minor effect with color reversal and in any case it should not affect the ranking of the engines, just the elo differences. Or are you saying something else entirely?

I am not arguing with you here, I'm sincerely interested in your feedback. Thanks in advance.

Larry
carldaman
Posts: 2287
Joined: Sat Jun 02, 2012 2:13 am

Re: The influence of books on test results.

Post by carldaman »

I hope everyone realizes that playing each opening with reversed colors only makes sense if the resulting position (where the book ends) has a lot of fight in it, and both sides have chances.

For example, if the book ends with a clear advantage to White, then that will lead to real rating distortions, especially in matches between engines of unequal strength. The stronger engine will win with White as expected, but the weaker engine will also win far too many times with White.

Likewise, if the book ends with a very dead/drawish position, the weaker engine will again benefit by drawing too often due to the opening.

In these cases, playing the same opening with both colors would only do harm to the test. Sorry if I'm stating the obvious, but a lot of people seem to treat testing with reversed colors as being fair by default.

Regards,
CL
lkaufman
Posts: 6258
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: The influence of books on test results.

Post by lkaufman »

Adam Hair wrote:
lkaufman wrote:Of all the factors that can influence test results, such as time limit, increment vs. repeating controls, ponder, hardware, etc., the one we are currently most interested in is the effect of opening books/testsuites. Our own distributed tester uses a five move book, rather shorter than that used by most testers. Since it shows a sixteen elo lead for Komodo 5 over Houdini 1.5 (after over 11k games) which is not shown by the testing agencies, and since the only result on this forum showing Komodo 5 beating Houdini 2 in a long match used a four move book, we decided to make a new testbook that is more typical of books normally used in tests - it averages six moves, but some popular lines are much longer than this. Based on hyper-fast testing, our performance drops by 12 Elo playing against Critter (the closest opponent at hyperspeed levels) after 6700 games. So assuming this would also be true at the normal blitz levels used in the distributed test, this would appear to account for most of the discrepancy between our own test results and the others.
Has anyone else run long tests to compare the effect of different opening books on test results? The tests would have to be several thousand games long, but can be at very fast levels.
Probably we will modify our tester to use this or a similar new book, so that future results will be better predicted by it. My conclusion is that Komodo is better than other top programs at playing the early opening, but the longer the book line supplied, the less valuable this asset becomes. Perhaps switching to a more normal book for testing will gradually help Komodo as different features are tuned using this new book.
I never considered the opening book to be much of a factor in test results (assuming colors are switched for each book position tested), but I am gradually becoming a believer.
My testing used a set of 18,000 positions that all were 4 moves deep. These positions were derived from the databases of the CCRL, CEGT, SWCR, UEL, and my own games. Though I am certain that there are some unbalanced positions in this set, for the most part they are not too unbalanced nor too drawish. White score for my games have been just under 53%.

I do not use reversed colors. Doing so automatically reduces the independence of the positions used, which increases the actual error of the measurements. I depend on randomness to keep White (or Black) bias low. I think that shows in the White score of my games, which includes many more games than just those played by the Also-Ran engines.

I have evidence that using a large set of positions without reversed colors is much better than using a small set of positions with reversed colors. There is some variance that comes into play by not using reversed colors, especially if the pool of opponents is wide. But, it is more than offset (in my experience) by the large number of positions used, covering more situations that would be found in general.

I realize that you would like to adjust Komodo's testing in such a way that it would better predict the results of the rating list testers. And possibly you could achieve this. But it is not certain that it would make Komodo better (stronger). It could even make it worse.
Ironically, your testing is similar to the way we have been testing up until now. But I have to agree with Robert here, it seems better to have a book that represents openings more or less in proportion to their use in master play.
Question: Have you found that your own results differ noticeably from those of other testers who use more conventional, deeper books/ test sets? For example, have you found that Komodo scores better, worse, or about the same with your book than with standard ones? Thanks.
lkaufman
Posts: 6258
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: The influence of books on test results.

Post by lkaufman »

Houdini wrote:A consideration is that at very fast TC there's not a huge correlation between the 2 games played with reversed colors from the same position. For most purposes they can be considered as more or less independent games.

So I prefer playing reversed colors, as it certainly removes any systematic bias due to improper opening positions.

Robert
I agree completely. See, I don't always oppose you!
Sedat Canbaz
Posts: 3018
Joined: Thu Mar 09, 2006 11:58 am
Location: Antalya/Turkey

Re: The influence of books on test results.

Post by Sedat Canbaz »

lkaufman wrote:
Sedat Canbaz wrote:
lkaufman wrote:
velmarin wrote:That system used you, Chessbase book, book Schreder GUI,
What system?.

How many moves do you want? 4,5,6,8.

How many openings diverse needs.?

It can be prepared if you give the data.

I am preparing one of 8 movements in Fritz GUI, always ends with Black movement,
the idea that always begins think the white side .
We have always made our own test books, because we need at least 10,000 positions (to run 20,000 games). I don't know any publicly available test books like this, please tell me if there are any. I think CCRL and CEGT seem to use books averaging about 8 moves per side. All positions should be ones that have occurred a reasonable number of times in master play, so we can be pretty sure that White has just a fairly normal advantage. Which publicly available books come closest to meeting this description?
Dear Larry,

Just my two cents over this issue,

I don't suggest to be used a large opening book,where the engines will be played with 10,000 positions

Because i am afraid that Engines Elo performance will suffer due to such variety openings(there are many holes in a such huge database)

Best,
Sedat
Well, if we want samples of 20,000 games to bring the error bars down to the level of typical changes, how do you propose that we do that without 10,000 positions in the book?
I also don't quite know what you mean about the Engines Elo suffering from holes. Are you saying that since some of the positions may be winning or nearly so for one side, the result will push the Elo closer to zero than it should be? If that is your point, it is probably true, but I think it is a pretty minor effect with color reversal and in any case it should not affect the ranking of the engines, just the elo differences. Or are you saying something else entirely?

I am not arguing with you here, I'm sincerely interested in your feedback. Thanks in advance.

Larry
Hello dear Larry,

Since several years (maybe you know) i am organizing serious book competitions

For example,during in that period of time,probably i've tested nearly 1.000 opening books...
And unfortunately so far, i could find any X opening book,which allows to play 10.000 positions

Its true that during my Perfect book beta testings (before to start playing in Playchess server)
I've tested my book with more than 20.000 games per player,but however i did not use varied openings
Otherwise i could not reach the best performance in that period of time

About the current issue,
My advise is that: please,use a very short optimized book or positions than varied book positions
Believe you will be disappointed in case of using 10.000 positions
Actually its not so hard to know this,its just is needed to be created a book,played by those games (10.000 positions)

Frankly i should admit,if the question was:
-Can you create a superior neutral short book,which will allow to play even 1.000 opening positions ?
-My answer will be: NO
*And i would prefer to create a short book,which will allow to play around 200-300 positions

*Actually its not due to 'its not possible' to create a such varied book (based on thousands of postions)
*Its just because,the winning percentage of many Engines performance will be less than 50% Whites and 40% Blacks

In other words,
I hate those games,which are lost due to critical or weak openings,really for me each SCCT game is very important !

Btw,since several years (more than 10 years) i spent many efforts over my Perfect book series,
and unfortunately still there are some lines in my latest book,where some engines performance suffer...
My main goal is that, each opening line/positition should be with winning percentage around 55% Whites and 45 % Blacks

For more details:
http://www.sedatcanbaz.com/chess/downlo ... 011-books/




Best Wishes,
Sedat
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: The influence of books on test results.

Post by Adam Hair »

Laskos wrote:
Adam Hair wrote:
Laskos wrote:
Adam Hair wrote:
I have evidence that using a large set of positions without reversed colors is much better than using a small set of positions with reversed colors.
Even better is to use a large set of positions with reversed colours. Randomnes reduces the errors as 1/sqrt(N), but reducing it by 1/sqrt(N/2) using reversed colours is even better, as the errors due to "wrong" opening positions cancels much better using reversed colours. Your argument is probably valid for matches of millions of games or so.
That would depend on the percentage of "wrong" positions. If the percentage is low enough, then not using reversed colors wins out.
Yes, and the number of pretty deterministic or other "wrong" openings is non-negligible (say 5 or 10%) even for engine-analysed positions. This is especially important when testing very closely matched engines, and many tests are about that.

Kai
5% to 10% is not enough to tip the scale towards reversed colors.

However, the percentage of essentially deterministic openings could grow as the opponents become closer in strength.