Impressive Preliminary Results of Rybka 3 by Larry Kaufman!

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Post by bob »

Uri Blass wrote:
bob wrote:
Milton wrote:By lkaufman Date 2008-07-08 09:51 Since yesterday I've been testing a version of Rybka that is very close to Rybka 3, with the improved scaling and all my latest eval terms added. I'm running it against 2.3.2a mp. It appears that on a direct match basis, we will reach the goal of a 100 Elo gain, at least on quads. As of now, after 900 games total, the lead is 110 Elo (105 Elo on quads, 120 on my octal). This is with both programs using the same short generic book, each taking White once in every opening. To achieve this result Rybka 3 has to win about 4 games for each win by 2.3.2a on the quads and about 5 for 1 on the octal, due to draws. How this will translate to gains on the rating lists remains to be seen.
Personally I think this is a _terrible_ way of estimating Elo gain. I quit doing this years ago because it horribly inflates the ELo for a simple reason...

When you add some new piece of knowledge that might be helpful here and there, and that is the _only_ difference between the two engines, then any rating change is a direct result of that change plus the normal randomness that games between equal opponents produces. Since the two programs are identical except for the new piece of knowledge, the one with the new piece will occasionally use it to win a game.

But in real games between _different_ opponents, that new piece of knowledge might produce absolutely no improvement at all, or one so small that it takes thousands of games to measure. Once you think about it for a few minutes, you see why this is pretty meaningless. The fact that it produces _any_ improvement is certainly significant, but the fact that it produces a 100 Elo improvement is worthless...

I could probably find some test results to show this as at times, we add an old version of Crafty to our gauntlet for testing, and new changes tend to exaggerate that score compared to the scores against other programs in the mix.
Your assumption("it horribly inflates the ELo ") seems not to be correct
here.

Larry explained that the new knowledge also made rybka slower so it was outsearched by older rybka.

He claims that this reason made the improvement smaller in rybka-rybka games(relaive to rybka against other opponents).

tests against other opponents in the rybka forum suggest slightly bigger improvement relative to rybka-rybka games.

Uri
I don't get your point. _any_ knowledge added to a program will slow it down. So this test will answer the question "is the new knowledge worth the loss in speed" in a pretty effective way. No dispute there. But it will _not_ answer the question "How much _better_ is the new version than the old version?" with any degree of accuracy. The new knowledge will get an exaggerated result against the old program, unless the speed loss offsets the new knowledge or worse...

Anything is possible in computer chess. And on occasion the N vs N+1 testing might well produce more accurate answers than N+1 vs the world. But not generally, which was the point I tried to make.
Martin Thoresen
Posts: 1833
Joined: Thu Jun 22, 2006 12:07 am

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Post by Martin Thoresen »

Jeroen came up with some new numbers vs other opponents than Rybka:

I decided to run some matches against competing programs to verify whether the hundred Elo gains I've measured in direct play will show up in the rating lists which test against unrelated programs.

Based solely on 1'+1" quad games against Deep Fritz 10 and Deep Shredder 11 (and bearing in mind Rybka 3 is still not quite finalized), it appears that the answer is "yes".

Results so far: against Deep Fritz 10, +128=31-10 for +300 Elo;

against Deep Shredder 11 +98=34-8 for +265 Elo.

Based on CCRL blitz ratings (CEGT doesn't have a blitz rating for Deep Fritz 10 quad) that works out to a performance rating of 3242 against Fritz and 3296 (!) against Shredder, or about 3267 vs. 3132 for Rybka 2.3.2a quad, a gain of 135 Elo. It's only two opponents and the quick time control probably favors the stronger program slightly, but it looks pretty clear that the rating lists will confirm the 100+ rating gains (at least in blitz) for Rybka 3 quad as measured in direct play.
User avatar
George Tsavdaris
Posts: 1627
Joined: Thu Mar 09, 2006 12:35 pm

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Post by George Tsavdaris »

bob wrote: When you add some new piece of knowledge that might be helpful here and there, and that is the _only_ difference between the two engines, then any rating change is a direct result of that change plus the normal randomness that games between equal opponents produces. Since the two programs are identical except for the new piece of knowledge, the one with the new piece will occasionally use it to win a game.

But in real games between _different_ opponents, that new piece of knowledge might produce absolutely no improvement at all, or one so small that it takes thousands of games to measure. Once you think about it for a few minutes, you see why this is pretty meaningless. The fact that it produces _any_ improvement is certainly significant, but the fact that it produces a 100 Elo improvement is worthless...

I could probably find some test results to show this as at times, we add an old version of Crafty to our gauntlet for testing, and new changes tend to exaggerate that score compared to the scores against other programs in the mix.
I had the exact same objection some time ago about that.
It may be better to a Rybka_b1 versus Rybka_b2 contest, but in rating lists where Rybka_b1 and Rybka_b2 will face different opponents then things might be different.

The answer i got, was that this isn't so logical, since Rybka's game is quite balanced in all aspects of Chess, so an improvement against its previous version will be an improvement against all programs.

I don't quite accept this idea but until now judging from their testing method(if something works against Rybka_b1-Rybka_b2 ELO speaking, then keep it) it really works.
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
Nimzovik
Posts: 1831
Joined: Sat Jan 06, 2007 11:08 pm

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Post by Nimzovik »

Yes knowledge and speed are a trade off.....However with the computers of today (or tommorow) is this going to be relevant for long? Yes to program the code for knowledge (of closed positions) is daunting..... however some one is goining to plunge into the abyss sooner or later......No? Then again like shredder's triple brain concept to get the Positional engines' eval to be relevant and to compare analysis to the engine that plays open positions and perhaps primarily tactics is frightening to attempt I am sure in the regard as to which engine's eval does the program trust? DEEP SIGH........ :cry:
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Post by bob »

Nimzovik wrote:Yes knowledge and speed are a trade off.....However with the computers of today (or tommorow) is this going to be relevant for long? Yes to program the code for knowledge (of closed positions) is daunting..... however some one is goining to plunge into the abyss sooner or later......No? Then again like shredder's triple brain concept to get the Positional engines' eval to be relevant and to compare analysis to the engine that plays open positions and perhaps primarily tactics is frightening to attempt I am sure in the regard as to which engine's eval does the program trust? DEEP SIGH........ :cry:
One day, perhaps. But I doubt in my lifetime. That is _still_ a long way away from reality.
Jeroen
Posts: 501
Joined: Wed Mar 08, 2006 9:49 pm

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Post by Jeroen »

These are results from Larry's testing. Currently I have:

Rybka 3 (+ R3.ctg) - Rybka 2.3.2a (+ RII.ctg) 72-28
Rybka 3 (+ R3.ctg) - Toga II 1.4 beta5c (+ Compmaster.ctg) 80.5-19.5
Rybka 3 (+ R3.ctg) - Hiarcs 12 (+ H12 book) 79-21

Note that these are Rybka 3 beta versions, in the first 2 matches the Leiden version of Rybka played (it won the ICT Leiden end of May).
ArmyBridge

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Post by ArmyBridge »

Jeroen wrote:These are results from Larry's testing. Currently I have:

Rybka 3 (+ R3.ctg) - Rybka 2.3.2a (+ RII.ctg) 72-28
Rybka 3 (+ R3.ctg) - Toga II 1.4 beta5c (+ Compmaster.ctg) 80.5-19.5
Rybka 3 (+ R3.ctg) - Hiarcs 12 (+ H12 book) 79-21

Note that these are Rybka 3 beta versions, in the first 2 matches the Leiden version of Rybka played (it won the ICT Leiden end of May).
I think that would be better test Rybka 3 with generic book just like CEGT and CCRL do, but .. is very impressive to see Rybka 2.3.2a be totally destroyed by Rybka 3 :shock:
Henrik Dinesen
Posts: 877
Joined: Wed Mar 08, 2006 9:52 pm
Location: Denmark

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Post by Henrik Dinesen »

ArmyBridge wrote:
Jeroen wrote:These are results from Larry's testing. Currently I have:

Rybka 3 (+ R3.ctg) - Rybka 2.3.2a (+ RII.ctg) 72-28
Rybka 3 (+ R3.ctg) - Toga II 1.4 beta5c (+ Compmaster.ctg) 80.5-19.5
Rybka 3 (+ R3.ctg) - Hiarcs 12 (+ H12 book) 79-21

Note that these are Rybka 3 beta versions, in the first 2 matches the Leiden version of Rybka played (it won the ICT Leiden end of May).
I think that would be better test Rybka 3 with generic book just like CEGT and CCRL do, but .. is very impressive to see Rybka 2.3.2a be totally destroyed by Rybka 3 :shock:
Heh, which book do you expect the Rybka bookcooker to use, if not his own?
;)
Henrik
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Post by Dr.Wael Deeb »

ArmyBridge wrote:
Jeroen wrote:These are results from Larry's testing. Currently I have:

Rybka 3 (+ R3.ctg) - Rybka 2.3.2a (+ RII.ctg) 72-28
Rybka 3 (+ R3.ctg) - Toga II 1.4 beta5c (+ Compmaster.ctg) 80.5-19.5
Rybka 3 (+ R3.ctg) - Hiarcs 12 (+ H12 book) 79-21

Note that these are Rybka 3 beta versions, in the first 2 matches the Leiden version of Rybka played (it won the ICT Leiden end of May).
I think that would be better test Rybka 3 with generic book just like CEGT and CCRL do, but .. is very impressive to see Rybka 2.3.2a be totally destroyed by Rybka 3 :shock:
No,a generic opening book will harm Rybka's performance....
Note that the new Rybka will be packed with it's own commercial opening book,so what's the point of testing with a generic one :!: :?:
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
User avatar
mclane
Posts: 18753
Joined: Thu Mar 09, 2006 6:40 pm
Location: US of Europe, germany
Full name: Thorsten Czub

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Post by mclane »

so what's the point of testing with a generic one
--------------
the idea is to see how much progress comes from the engine.

nobody is really interested in the book IMO.