Some results from Larry with finalized Beta

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Dann Corbit, Harvey Williamson

Marc MP

Re: Some results from Larry with finalized Beta

Post by Marc MP »

George Tsavdaris wrote:
SzG wrote:
Roman Hartmann wrote:a new version of Rybka which is 200 points stronger I'm not so sure about that anymore.

best regards
Roman
Not 200, only 100. I leave the rest to you...
But the results so far indicate it would be about +200 ELO better than its predecessor:

Rybka 3 vs. Rybka 2.32a mp on octal, +151=83-22 for +192 Elo.
Rybka 3 vs. Rybka 2.32a mp on quad, 40/1' repeating: +47=24-7 for +196 Elo.

Why you believe the opposite?
Rybka 3 vs. Zappa Mexico II on quad, 40/1' repeating: +85=46-11 for +200 Elo. Quad Zappa blitz rating: 2982 (see http://www.husvankempen.de/nunn/40_4_Ra ... liste.html). Performance: 2982 + 200 = 3182. Quad Rybka on this list: 3081. Improvement = 3182 - 3081 = 101 elo.

Rybka 3 vs. Deep Shredder 11 on quad, 40/1' repeating:+61=21-6 for +254 Elo. Quad Shredder blitz rating: 2955. Performance: 2955 + 254 = 3209. Improvement: 3209 - 3081 = 128 elo.

Rybka 3 vs. Hiarcs12 on quad, 40/1' repeating: +72=23-5 for +282 Elo. Quad Hiarcs blitz rating: 2970. Performance: 2970 + 282 = 3252. Improvement: 3252 - 3081 = 171 elo.

The match against Naum is on an octal and I know no data about the strenght of Naum and Rybka on octals.

From the 3 matches the average increase is: 133 elo.
User avatar
M ANSARI
Posts: 3707
Joined: Thu Mar 16, 2006 7:10 pm

Re: Some results from Larry with finalized Beta

Post by M ANSARI »

So far in my testing it seems Rybka 3 is much more than 100 ELO stronger against all opponents. It is scoring 91% against DF10.1 and 84% against Zappa Mexico II and 68% against 2.3.2a ... so far around 80 games have been played and till now in this tourney Rybka 3 has lost only 1 game ... that was against 2.3.2a. This is on a Quad 4.2ghz at 5 0.

On my Octa 4.8 Ghz so far it is 4 wins 1 loss and 3 draws ... for both Rybka 2.3.2a and Zappa Mexico II. This is at 16 0 time control ... and this follows my earlier results where Rybka 2.3.2a only held the smallest edge against Zappa Mexico II on this powerful hardware and longer time controls. Looks like Rybka 3 is strong at all time controls and I am going to let them play 50 games at 16 0 gauntlet and see how things go.

I must say it is a delight to watch some of these games ... Rybka 3 really has a totally different playing style.
Last edited by M ANSARI on Thu Jul 24, 2008 1:17 pm, edited 1 time in total.
User avatar
George Tsavdaris
Posts: 1627
Joined: Thu Mar 09, 2006 12:35 pm

Re: Some results from Larry with finalized Beta

Post by George Tsavdaris »

Ovyron wrote:Agreed, I've always been against games at such time controls and think that the results are very random compared with longer time controls.
Is the time control the problem?
You should know that a time control of:
•40/1' on a OCTAL computer as this Rybka played here is equivalent(roughly) to:
•40/40 on a Pentium III 1000MHz of before some years(around 2000).

Were the games that we played with 40/40 control in 2000 random? :wink:

Even worse:
This 40/1' on a OCTAL computer as this Rybka played here is equivalent(roughly) to:
•40/120 on the Pentium 90Mhz we had at 1996.

Were the games that we played with 40/120 time control in 1995 random? :wink:

What you have to understand is the time control is irrelevant.
And if you say that all 40/1' games on an OCTAL of today are pointless and random, then you should accept that ALL the games in ALL time controls of the past, before 2000 or so, were pointless and random!

Also, note that the difference to Zappa Mexico isn't as amazing.
This result.....:
Rybka 3 vs. Zappa Mexico II on quad, 40/1' repeating: +85=46-11 for +200 Elo.
.....isn't such amazing?? :shock: :shock: :shock:

Oh well you are right, it's not +700 ELO but only +200....
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
User avatar
George Tsavdaris
Posts: 1627
Joined: Thu Mar 09, 2006 12:35 pm

Re: Some results from Larry with finalized Beta

Post by George Tsavdaris »

Marc MP wrote:
George Tsavdaris wrote:
SzG wrote:
Roman Hartmann wrote:a new version of Rybka which is 200 points stronger I'm not so sure about that anymore.

best regards
Roman
Not 200, only 100. I leave the rest to you...
But the results so far indicate it would be about +200 ELO better than its predecessor:

Rybka 3 vs. Rybka 2.32a mp on octal, +151=83-22 for +192 Elo.
Rybka 3 vs. Rybka 2.32a mp on quad, 40/1' repeating: +47=24-7 for +196 Elo.

Why you believe the opposite?
Rybka 3 vs. Zappa Mexico II on quad, 40/1' repeating: +85=46-11 for +200 Elo. Quad Zappa blitz rating: 2982 (see http://www.husvankempen.de/nunn/40_4_Ra ... liste.html). Performance: 2982 + 200 = 3182. Quad Rybka on this list: 3081. Improvement = 3182 - 3081 = 101 elo.

Rybka 3 vs. Deep Shredder 11 on quad, 40/1' repeating:+61=21-6 for +254 Elo. Quad Shredder blitz rating: 2955. Performance: 2955 + 254 = 3209. Improvement: 3209 - 3081 = 128 elo.

Rybka 3 vs. Hiarcs12 on quad, 40/1' repeating: +72=23-5 for +282 Elo. Quad Hiarcs blitz rating: 2970. Performance: 2970 + 282 = 3252. Improvement: 3252 - 3081 = 171 elo.

The match against Naum is on an octal and I know no data about the strenght of Naum and Rybka on octals.

From the 3 matches the average increase is: 133 elo.
Yes, i had to be more accurate:

What i meant with the:
----------
But the results so far indicate it would be about +200 ELO better than its predecessor:

Rybka 3 vs. Rybka 2.32a mp on octal, +151=83-22 for +192 Elo.
Rybka 3 vs. Rybka 2.32a mp on quad, 40/1' repeating: +47=24-7 for +196 Elo.
----------
Is that the results so far indicate it would be about +200 ELO better than its predecessor when playing against it.

ELO is just a way to measure performance when a player plays against another player or many players.

And those above results mean Rybka 3 will be about +200 ELO from Rybka 2.3.2a, in a Rybka 3 Vs Rybka 2.3.2a match.
This ELO value difference, cannot be transferred to another match against other engines for example that of CCRL. It simply can't!
But as we know, there is a strong correlation that says if we have an improvement in ELO between R3 Vs R2.3.2a(200 ELO) then most probably we will see it when these engines play also against other engines(like CEGT for example). Not that exact value, but the improvement.
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
Marc MP

Re: Some results from Larry with finalized Beta

Post by Marc MP »

George Tsavdaris wrote: Yes, i had to be more accurate:

What i meant with the:
----------
But the results so far indicate it would be about +200 ELO better than its predecessor:

Rybka 3 vs. Rybka 2.32a mp on octal, +151=83-22 for +192 Elo.
Rybka 3 vs. Rybka 2.32a mp on quad, 40/1' repeating: +47=24-7 for +196 Elo.
----------
Is that the results so far indicate it would be about +200 ELO better than its predecessor when playing against it.
I see. The precision is important.

George Tsavdaris wrote: ELO is just a way to measure performance when a player plays against another player or many players.

And those above results mean Rybka 3 will be about +200 ELO from Rybka 2.3.2a, in a Rybka 3 Vs Rybka 2.3.2a match.
This ELO value difference, cannot be transferred to another match against other engines for example that of CCRL. It simply can't!
But as we know, there is a strong correlation that says if we have an improvement in ELO between R3 Vs R2.3.2a(200 ELO) then most probably we will see it when these engines play also against other engines(like CEGT for example). Not that exact value, but the improvement.
I would think that measuring the improvement by comparing the engine with its previous version would biased upward the elo increase and it is what we observe here. +133 elo in average on the quad matches against other engines and +196 elo against the previous version.

Anyway, it looks like Rybka 3 will be almost certainly more than 200 elo stronger than its nearest opponent (apart from earlier rybkas). We will know with better precision soon.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: Some results from Larry with finalized Beta

Post by Ovyron »

I guess as technology advances games of the past look more and more random :lol:
George Tsavdaris wrote:Rybka 3 vs. Zappa Mexico II on quad, 40/1' repeating: +85=46-11 for +200 Elo.
.....isn't such amazing?? :shock: :shock: :shock:


I meant that results against Rybka 2.3.2a were so amazing that these pale a bit. ;)
User avatar
George Tsavdaris
Posts: 1627
Joined: Thu Mar 09, 2006 12:35 pm

Re: Some results from Larry with finalized Beta

Post by George Tsavdaris »

SzG wrote: Never compare an engine to its predecessor.
Current Rybka is about 100 above Zappa. New Rybka is 200 above. 200-100=100. Other results are similar.
No, this is not correct!

We can compare an engine to its predecessor. There are reasons to do it.

You may play only engine-engine games and may be right that comparing the +200 ELO Rybka 3 has to its predecessor will not yield such a number when you will play against other engines, BUT engine-engine play is not the only field we need Rybka. Correspondence Chess is! Analysis is!

And if Rybka 3 is +200 ELO better than its predecessor in head to head match, then most probably its analysis will be much better in quality most probably worth of +200 ELO.

And actually for seeing how much Rybka has improved for analysis purposes, playing games against its predecessor is probably the best way to find out!
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
User avatar
GenoM
Posts: 910
Joined: Wed Mar 08, 2006 9:46 pm
Location: Plovdiv, Bulgaria

Re: Some results from Larry with finalized Beta

Post by GenoM »

may be the book has its +50-100 elo, don't you think?
take it easy :)
Uri Blass
Posts: 10102
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Some results from Larry with finalized Beta

Post by Uri Blass »

GenoM wrote:may be the book has its +50-100 elo, don't you think?
The book is not relevant because many tests are from fixed positions with no book(I believe that it is the same for this test).

Uri
User avatar
George Tsavdaris
Posts: 1627
Joined: Thu Mar 09, 2006 12:35 pm

Re: Some results from Larry with finalized Beta

Post by George Tsavdaris »

GenoM wrote:may be the book has its +50-100 elo, don't you think?
No. As Uri said this is wrong for these amazing results, since as Larry(he is the one who did the tests) said he used the HS200 set, with each opening played twice (alternating colors).
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....