This experiment should be used as a model for future books.

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

pichy
Posts: 2564
Joined: Thu Mar 09, 2006 3:04 am

This experiment should be used as a model for future books.

Post by pichy »

I wonder how much benefit Rybka gets from using an opening book, I will have three matches of 40 games.

I Used Rybka.abk as a testing model for my experiment, but according to Dr. Wael Deeb he has a later opening Book q8.abk which is not released yet, that could be at least 120 Elo stronger.

This is what I have found out so far, as the quality of the opening books are created from games of players that are rated lower than Rybka 232a. Even if we use a great opening book like Rybka.abk sometimes the level of play by Rybka 232a without an opening is not worse than by using the intended opening book.

Here is my experiment:

For these 40 games both programs used Rybka.abk opening Book.

Engine Score Ry Cy S-B
1: Rybka v2.3.2a.w32 26.0/40 ········································ =====100=11===0111=011101111111=101110=0 364.00
2: Cyclone xTreme 14.0/40 =====011=00===1000=100010000000=010001=1 ········································ 364.00

40 games played / Tournament is finished
Name of the tournament: Arena tournament
Site/ Country: Jorge-07E2FB46AF, United States
Level: Tournament Game in 5 Minutes
Hardware: AMD Athlon(TM) XP 2000+ 1662 MHz with 992 MB Memory
Operating system: Microsoft Windows XP Home Edition Service Pack 3 (Build 2600)
PGN-File: C:\Program Files\Arena\Books\arena.pgn
Website:
E-Mail Address:

---------------------------------------------------

For these 40 games both programs played without any opening book.

Engine Score Ry Cy S-B
1: Rybka v2.3.2a.w32 22.0/40 ········································ 101111=1=1000001=01==11===01011==0011001 396.00
2: Cyclone xTreme 18.0/40 010000=0=0111110=10==00===10100==1100110 ········································ 396.00

40 games played / Tournament is finished
Name of the tournament: Arena tournament
Site/ Country: Jorge-07E2FB46AF, United States
Level: Tournament Game in 5 Minutes
Hardware: AMD Athlon(TM) XP 2000+ 1662 MHz with 992 MB Memory
Operating system: Microsoft Windows XP Home Edition Service Pack 3 (Build 2600)
PGN-File: C:\Program Files\Arena\Books\arena.pgn
Website:
E-Mail Address:

---------------------------------------------------
For these 40 games Rybka is not using any opening book while Cyclone xTreme is using the model Rybka.abk opening Book.



Engine Score Ry Cy S-B
1: Rybka v2.3.2a.w32 25.5/40 ········································ 0=1=11=1=10=111111==0100=01====11111===0 369.75
2: Cyclone xTreme 14.5/40 1=0=00=0=01=000000==1011=10====00000===1 ········································ 369.75

40 games played / Tournament is finished
Name of the tournament: Arena tournament
Site/ Country: Jorge-07E2FB46AF, United States
Level: Tournament Game in 5 Minutes
Hardware: AMD Athlon(TM) XP 2000+ 1662 MHz with 992 MB Memory
Operating system: Microsoft Windows XP Home Edition Service Pack 3 (Build 2600)
PGN-File: C:\Program Files\Arena\Arena.pgn
Website:
E-Mail Address:
---------------------------------------------------

I Used Rybka.abk as a testing Opening Book for my experiment, but according to Dr. Wael Deeb he has a later opening Book q8.abk which is not released yet, which is at least 120 Elo stronger.

PS: My point is that before any other Opening Author released an Opening Book, they should test their opening Books against two programs of their choice and see if the programs perform better with their opening or just about the same, or worse than leaving the two programs in question play without any opening books.
pichy
Posts: 2564
Joined: Thu Mar 09, 2006 3:04 am

Re: This experiment should be used as a model for future boo

Post by pichy »

SzG wrote:In my opinion any decent opening book gives the same results. Books are only to provide variety. For testing purposes you simply use a filter condition to avoid losing opening lines. Any such claim as a 'generic high-quality book' is hocus-pocus. A high-quality book is always engine-specific.
You could be right, but you are implying that Dr. Wael Deeb private opening book q8.abk is nothing but hocus-pocus and not really 120 ELO stronger.

PS: I believe that you might be wrong, since if I compare the same engines lets say Rybka 3 with a database of Fischer opening Book lines and variations from the 70's versus Rybka 3 using Kasparov latest opening theories from 5 years ago plus the one that he has been working with Carlsen, according to your hocus-pocus theories, Rybka 3 with Kasparov openings will not score at least 30 rating points higher
:?: :roll:
Last edited by pichy on Sun Oct 25, 2009 4:06 pm, edited 2 times in total.
CRoberson
Posts: 2056
Joined: Mon Mar 13, 2006 2:31 am
Location: North Carolina, USA

Re: This experiment should be used as a model for future boo

Post by CRoberson »

You have a very good point.

However, your data doesn't show it (not really). With only 40 games the statistical Elo margins are big enough
that you can't tell if one of these is better or worse. Your results could have been luck.

I agree with your idea, just not the number of games. I would suggest 200 games. Then you can tell a difference of +/- 40 Elo.
But, how do you do it? Could the test be biased by the programs chosen? Yes, they could.

Here are some ideas to remove the bias:
1) use the same program against itself.
2) use several programs in the experiment.

If you are looking for a quick test, I'd use the same program against itself. Then you only need to run 200 games. Run the program
with book against the same program without book. If the program without a book wins by more than the margins then book = bad.

#1 doesn't eliminate the bias completely. Some programs will perform better with a given book than others. Despite common thought,
programs do have playing styles and that effect is observable.
pichy
Posts: 2564
Joined: Thu Mar 09, 2006 3:04 am

Re: This experiment should be used as a model for future boo

Post by pichy »

CRoberson wrote:You have a very good point.

However, your data doesn't show it (not really). With only 40 games the statistical Elo margins are big enough
that you can't tell if one of these is better or worse. Your results could have been luck.

I agree with your idea, just not the number of games. I would suggest 200 games. Then you can tell a difference of +/- 40 Elo.
But, how do you do it? Could the test be biased by the programs chosen? Yes, they could.

Here are some ideas to remove the bias:
1) use the same program against itself.

2) use several programs in the experiment.

If you are looking for a quick test, I'd use the same program against itself. Then you only need to run 200 games. Run the program
with book against the same program without book. If the program without a book wins by more than the margins then book = bad.

#1 doesn't eliminate the bias completely. Some programs will perform better with a given book than others. Despite common thought,
programs do have playing styles and that effect is observable.



Your first choice seems to be the most logical for a quick test, I just started 200 games Rybka 3.32a-A with Rybka abk. Vs Rybka 3.32a-B without an Opening Book.
pichy
Posts: 2564
Joined: Thu Mar 09, 2006 3:04 am

Re: This experiment should be used as a model for future boo

Post by pichy »

pichy wrote:
CRoberson wrote:You have a very good point.

However, your data doesn't show it (not really). With only 40 games the statistical Elo margins are big enough
that you can't tell if one of these is better or worse. Your results could have been luck.

I agree with your idea, just not the number of games. I would suggest 200 games. Then you can tell a difference of +/- 40 Elo.
But, how do you do it? Could the test be biased by the programs chosen? Yes, they could.

Here are some ideas to remove the bias:
1) use the same program against itself.

2) use several programs in the experiment.

If you are looking for a quick test, I'd use the same program against itself. Then you only need to run 200 games. Run the program
with book against the same program without book. If the program without a book wins by more than the margins then book = bad.

#1 doesn't eliminate the bias completely. Some programs will perform better with a given book than others. Despite common thought,
programs do have playing styles and that effect is observable.



Your first choice seems to be the most logical for a quick test, I just started 200 games Rybka 2.32a-A with Rybka abk. Vs Rybka 2.32a-B without an Opening Book.
So far Rybka 2.32a-B without an Opening Book is leading.


Engine Score Ry Ry S-B
1: Rybka v2.3.2a.w32-B 11.0/20 ········································ =1010=1=======01=1== 99.00
2: Rybka v2.3.2a.w32-A 9.0/20 =0101=0=======10=0== ········································ 99.00

20 of 200 games played
Name of the tournament: Arena tournament
Site/ Country: Jorge-07E2FB46AF, United States
Level: Tournament Game in 5 Minutes
Hardware: AMD Athlon(TM) XP 2000+ 1662 MHz with 992 MB Memory
Operating system: Microsoft Windows XP Home Edition Service Pack 3 (Build 2600)
PGN-File: C:\Program Files\Arena\Books\arena.pgn
Website:
E-Mail Address:
pichy
Posts: 2564
Joined: Thu Mar 09, 2006 3:04 am

Re: This could be the reason why Chess programs are so tough

Post by pichy »

pichy wrote:
pichy wrote:
CRoberson wrote:You have a very good point.

However, your data doesn't show it (not really). With only 40 games the statistical Elo margins are big enough
that you can't tell if one of these is better or worse. Your results could have been luck.

I agree with your idea, just not the number of games. I would suggest 200 games. Then you can tell a difference of +/- 40 Elo.
But, how do you do it? Could the test be biased by the programs chosen? Yes, they could.

Here are some ideas to remove the bias:
1) use the same program against itself.

2) use several programs in the experiment.

If you are looking for a quick test, I'd use the same program against itself. Then you only need to run 200 games. Run the program
with book against the same program without book. If the program without a book wins by more than the margins then book = bad.

#1 doesn't eliminate the bias completely. Some programs will perform better with a given book than others. Despite common thought,
programs do have playing styles and that effect is observable.



Your first choice seems to be the most logical for a quick test, I just started 200 games Rybka 2.32a-A with Rybka abk. Vs Rybka 2.32a-B without an Opening Book.
So far Rybka 2.32a-B without an Opening Book is leading. This could be the reason why Chess programs are so tough to beat in Chess960 even without opening book database they already starting to beat the best opening Books.

Engine Score Ry Ry S-B
1: Rybka v2.3.2a.w32-B 28.0/52 ···················································· =1010=1=======01=1==011100===11101=1==0==100=====1=0 672.00
2: Rybka v2.3.2a.w32-A 24.0/52 =0101=0=======10=0==100011===00010=0==1==011=====0=1 ···················································· 672.00

52 of 200 games played
Name of the tournament: Arena tournament
Site/ Country: Jorge-07E2FB46AF, United States
Level: Tournament Game in 5 Minutes
Hardware: AMD Athlon(TM) XP 2000+ 1662 MHz with 992 MB Memory
Operating system: Microsoft Windows XP Home Edition Service Pack 3 (Build 2600)
PGN-File: C:\Program Files\Arena\Books\arena.pgn
Website:
E-Mail Address: