Some results from Larry with finalized Beta

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Dann Corbit, Harvey Williamson

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Some results from Larry with finalized Beta

Post by bob »

Albert Silver wrote:
bob wrote:Correct me if I am wrong, but the numbers he is giving is not _improvement_. For example, the result against Naum simply says rybka 3 is 330 elo _better_ than naum. But I didn't see any results with previous rybka against same opponent so that it is possible to see the improvement...
You are not wrong.

Albert
So we are in the middle of a marketing hyperbole blitz???
Uri Blass
Posts: 10102
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Some results from Larry with finalized Beta

Post by Uri Blass »

bob wrote:
Uri Blass wrote:
bob wrote:
Eelco de Groot wrote:Results as of this morning, they are still being updated:
I now have what is supposed to be the final Rybka 3 engine (not GUI), except perhaps for cosmetic things and any bug fixes. I'm running test matches with opposing programs and with Rybka 2.32a mp. All tests are on my two quads and one octal. I'll report results here generally when they reach 100 games.

First result: Rybka 3 vs. Rybka 2.32a mp on octal, game/1': after 106 games, +54=43-9 for +157 Elo.
Update: now +98=61-15 for +180 Elo (!).
Update: now +136=76-19 for +193 Elo (!).
Final: +151=83-22 for +192 Elo.

Second result: Rybka 3 vs. Rybka 2.32a mp on quad, 40/1' repeating: test stopped with score +47=24-7 for +196 Elo (!!).

Third result: Rybka 3 vs. Zappa Mexico II on quad, 40/1' repeating: +59=36-7 for +195 Elo.
Update: +64=41-9 for +182 Elo.
Final: +85=46-11 for +200 Elo.

Comment: It seems that the much better time management helps much more against other Rybkas than against unrelated programs. Do not expect anywhere near a 200 Elo gain over 2.3.2a against programs unrelated to Rybka.

Fourth result: Rybka 3 vs. Hiarcs12 on quad, 40/1' repeating: test stopped after 100 games at +72=23-5 for +282 Elo.

Fifth result: Rybka 3 vs. Naum 3.1 on octal, 40/1' repeating: test stopped after 100 games at +76=22-2 for +330 Elo (!!).

Comment: overnight I'm switching the Naum and Hiarcs tests between machines, in case Naum has some problem utilizing octal computer. Results in the morning (US).

Sixth result: Rybka 3 vs. Deep Shredder 11 on quad, 40/1' repeating: +61=21-6 for +254 Elo. Will run overnight.
I think we could call this a Quantum Leap, wouldn't you agree? The numbers speak for themselves. Pretty devastating for now when you view it from the competition's side... But in the longer run I think this is good for computer-chess. Congratulations to Larry, Vas, Jeroen and all other members of the Rybka team!

Eelco
Correct me if I am wrong, but the numbers he is giving is not _improvement_. For example, the result against Naum simply says rybka 3 is 330 elo _better_ than naum. But I didn't see any results with previous rybka against same opponent so that it is possible to see the improvement...
It is possible to compare with ccrl results against the same opponents
The opening choice may be different and the time control is also different(40/4 against 40/1) so the comparison is not perfect
but both played from fixed opening positions.

Larry used openings that are used by part of the cegt testers.

http://computerchess.org.uk/ccrl/4040.l ... 4-bit_4CPU
My point was that his "+306 Elo" is incorrect. That is suggesting that the new Rybka is 306 Elo better than the old to most readers. It would have been far more informative to say "rybka 2 +175 Rybka 3 +306. For a gain of +131 elo over the old version. I can't begin to interpret the current numbers and certainly don't believe any +300 Elo claims...
It is clear that he did not mean +306 elo relative to the previous version and he also edited his post to give estimate of +166 relative to previous version(see the overall summary at the bottom of the post).

Of course things may be different at slower time control but 166 is an estimate based on comparison with CEGT blitz rating and comparing with CCRL or including rybka-rybka games could give bigger number than 166.

http://rybkaforum.net/cgi-bin/rybkaforu ... 5#pid78311

Uri
Albert Silver
Posts: 3019
Joined: Wed Mar 08, 2006 9:57 pm
Location: Rio de Janeiro, Brazil

Re: Some results from Larry with finalized Beta

Post by Albert Silver »

bob wrote:
Albert Silver wrote:
bob wrote:Correct me if I am wrong, but the numbers he is giving is not _improvement_. For example, the result against Naum simply says rybka 3 is 330 elo _better_ than naum. But I didn't see any results with previous rybka against same opponent so that it is possible to see the improvement...
You are not wrong.

Albert
So we are in the middle of a marketing hyperbole blitz???
Don't know that I'd go that far. The stats and results don't suggest a 300 elo improvement anywhere. After all, a score of +76=22-2 does indeed represent a performance of +330 Elo.

I think others are the ones responsible for misinterpreting.
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."
Marc MP

Re: Some results from Larry with finalized Beta

Post by Marc MP »

bob wrote:
Uri Blass wrote:
bob wrote:
Eelco de Groot wrote:Results as of this morning, they are still being updated:
I now have what is supposed to be the final Rybka 3 engine (not GUI), except perhaps for cosmetic things and any bug fixes. I'm running test matches with opposing programs and with Rybka 2.32a mp. All tests are on my two quads and one octal. I'll report results here generally when they reach 100 games.

First result: Rybka 3 vs. Rybka 2.32a mp on octal, game/1': after 106 games, +54=43-9 for +157 Elo.
Update: now +98=61-15 for +180 Elo (!).
Update: now +136=76-19 for +193 Elo (!).
Final: +151=83-22 for +192 Elo.

Second result: Rybka 3 vs. Rybka 2.32a mp on quad, 40/1' repeating: test stopped with score +47=24-7 for +196 Elo (!!).

Third result: Rybka 3 vs. Zappa Mexico II on quad, 40/1' repeating: +59=36-7 for +195 Elo.
Update: +64=41-9 for +182 Elo.
Final: +85=46-11 for +200 Elo.

Comment: It seems that the much better time management helps much more against other Rybkas than against unrelated programs. Do not expect anywhere near a 200 Elo gain over 2.3.2a against programs unrelated to Rybka.

Fourth result: Rybka 3 vs. Hiarcs12 on quad, 40/1' repeating: test stopped after 100 games at +72=23-5 for +282 Elo.

Fifth result: Rybka 3 vs. Naum 3.1 on octal, 40/1' repeating: test stopped after 100 games at +76=22-2 for +330 Elo (!!).

Comment: overnight I'm switching the Naum and Hiarcs tests between machines, in case Naum has some problem utilizing octal computer. Results in the morning (US).

Sixth result: Rybka 3 vs. Deep Shredder 11 on quad, 40/1' repeating: +61=21-6 for +254 Elo. Will run overnight.
I think we could call this a Quantum Leap, wouldn't you agree? The numbers speak for themselves. Pretty devastating for now when you view it from the competition's side... But in the longer run I think this is good for computer-chess. Congratulations to Larry, Vas, Jeroen and all other members of the Rybka team!

Eelco
Correct me if I am wrong, but the numbers he is giving is not _improvement_. For example, the result against Naum simply says rybka 3 is 330 elo _better_ than naum. But I didn't see any results with previous rybka against same opponent so that it is possible to see the improvement...
It is possible to compare with ccrl results against the same opponents
The opening choice may be different and the time control is also different(40/4 against 40/1) so the comparison is not perfect
but both played from fixed opening positions.

Larry used openings that are used by part of the cegt testers.

http://computerchess.org.uk/ccrl/4040.l ... 4-bit_4CPU
My point was that his "+306 Elo" is incorrect. That is suggesting that the new Rybka is 306 Elo better than the old to most readers. It would have been far more informative to say "rybka 2 +175 Rybka 3 +306. For a gain of +131 elo over the old version. I can't begin to interpret the current numbers and certainly don't believe any +300 Elo claims...
I find the way they present the information very misleading. I wonder if the "beta testers" from the Rybka team realize this, or maybe should I call them "sales representatives".

It starts like that, comparing the new version to the old:


Update: now +98=61-15 for +180 Elo (!).
Update: now +136=76-19 for +193 Elo (!).
Final: +151=83-22 for +192 Elo.

Second result: Rybka 3 vs. Rybka 2.32a mp on quad, 40/1' repeating: test stopped with score +47=24-7 for +196 Elo (!!).
So they start by showing the elo difference between the two versions in self play. Ok, I find that to be a doubtful exercise, but ok.
Third result: Rybka 3 vs. Zappa Mexico II on quad, 40/1' repeating: +59=36-7 for +195 Elo.
Update: +64=41-9 for +182 Elo.
Final: +85=46-11 for +200 Elo.

Comment: It seems that the much better time management helps much more against other Rybkas than against unrelated programs. Do not expect anywhere near a 200 Elo gain over 2.3.2a against programs unrelated to Rybka.

Fourth result: Rybka 3 vs. Hiarcs12 on quad, 40/1' repeating: test stopped after 100 games at +72=23-5 for +282 Elo.

Fifth result: Rybka 3 vs. Naum 3.1 on octal, 40/1' repeating: test stopped after 100 games at +76=22-2 for +330 Elo (!!).
And now the elo gains reported are: "+200", "+282", "+330".

That is immensly misleading to me because it compares the actual elo difference between Rybka 3 and the given opponents. while it should compare the "actual elo difference between Rybka 3 and the given opponents" MINUS "actual elo difference between Rybka 2.3.2a and the given opponents".

I did the math for those interested here: http://64.68.157.89/forum/viewtopic.php ... =&start=20

But it seems we are still plagued with "Rybka sales representatives".
User avatar
George Tsavdaris
Posts: 1627
Joined: Thu Mar 09, 2006 12:35 pm

Re: Some results from Larry with finalized Beta

Post by George Tsavdaris »

Marc MP wrote: I find the way they present the information very misleading. I wonder if the "beta testers" from the Rybka team realize this, or maybe should I call them "sales representatives".

It starts like that, comparing the new version to the old:


Update: now +98=61-15 for +180 Elo (!).
Update: now +136=76-19 for +193 Elo (!).
Final: +151=83-22 for +192 Elo.

Second result: Rybka 3 vs. Rybka 2.32a mp on quad, 40/1' repeating: test stopped with score +47=24-7 for +196 Elo (!!).
So they start by showing the elo difference between the two versions in self play. Ok, I find that to be a doubtful exercise, but ok.
Third result: Rybka 3 vs. Zappa Mexico II on quad, 40/1' repeating: +59=36-7 for +195 Elo.
Update: +64=41-9 for +182 Elo.
Final: +85=46-11 for +200 Elo.

Comment: It seems that the much better time management helps much more against other Rybkas than against unrelated programs. Do not expect anywhere near a 200 Elo gain over 2.3.2a against programs unrelated to Rybka.

Fourth result: Rybka 3 vs. Hiarcs12 on quad, 40/1' repeating: test stopped after 100 games at +72=23-5 for +282 Elo.

Fifth result: Rybka 3 vs. Naum 3.1 on octal, 40/1' repeating: test stopped after 100 games at +76=22-2 for +330 Elo (!!).
And now the elo gains reported are: "+200", "+282", "+330".

That is immensly misleading to me because it compares the actual elo difference between Rybka 3 and the given opponents.
This doesn't seem misleading to me.
What it compares is plain clear!
You have a series of matches and next to them the ELO for this match.
If you take it the way you take it, then the responsibility is not their, but yours. :D

while it should compare the "actual elo difference between Rybka 3 and the given opponents" MINUS "actual elo difference between Rybka 2.3.2a and the given opponents".
And why it SHOULD compare that?
It's more valuable information yes, for sure, but what they do now is also clear and not misleading.
Of course and does not answer the question how much Rybka 3 is improved in comparison to Rybka 2.3.2a when playing other programs, but still not misleading.

And more than that what you quoted from them above, clearly shows what you supported that they are trying to mislead us by not saying. But they clearly say it!!
------
Comment(By Larry.K): It seems that the much better time management helps much more against other Rybkas than against unrelated programs. Do not expect anywhere near a 200 Elo gain over 2.3.2a against programs unrelated to Rybka.
------
How more clear they should be?
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
User avatar
Eelco de Groot
Posts: 4556
Joined: Sun Mar 12, 2006 2:40 am
Full name:   

Re: Some results from Larry with finalized Beta

Post by Eelco de Groot »

bob wrote:
Eelco de Groot wrote:Results as of this morning, they are still being updated:
I now have what is supposed to be the final Rybka 3 engine (not GUI), except perhaps for cosmetic things and any bug fixes. I'm running test matches with opposing programs and with Rybka 2.32a mp. All tests are on my two quads and one octal. I'll report results here generally when they reach 100 games.

First result: Rybka 3 vs. Rybka 2.32a mp on octal, game/1': after 106 games, +54=43-9 for +157 Elo.
Update: now +98=61-15 for +180 Elo (!).
Update: now +136=76-19 for +193 Elo (!).
Final: +151=83-22 for +192 Elo.

Second result: Rybka 3 vs. Rybka 2.32a mp on quad, 40/1' repeating: test stopped with score +47=24-7 for +196 Elo (!!).

Third result: Rybka 3 vs. Zappa Mexico II on quad, 40/1' repeating: +59=36-7 for +195 Elo.
Update: +64=41-9 for +182 Elo.
Final: +85=46-11 for +200 Elo.

Comment: It seems that the much better time management helps much more against other Rybkas than against unrelated programs. Do not expect anywhere near a 200 Elo gain over 2.3.2a against programs unrelated to Rybka.

Fourth result: Rybka 3 vs. Hiarcs12 on quad, 40/1' repeating: test stopped after 100 games at +72=23-5 for +282 Elo.

Fifth result: Rybka 3 vs. Naum 3.1 on octal, 40/1' repeating: test stopped after 100 games at +76=22-2 for +330 Elo (!!).

Comment: overnight I'm switching the Naum and Hiarcs tests between machines, in case Naum has some problem utilizing octal computer. Results in the morning (US).

Sixth result: Rybka 3 vs. Deep Shredder 11 on quad, 40/1' repeating: +61=21-6 for +254 Elo. Will run overnight.
I think we could call this a Quantum Leap, wouldn't you agree? The numbers speak for themselves. Pretty devastating for now when you view it from the competition's side... But in the longer run I think this is good for computer-chess. Congratulations to Larry, Vas, Jeroen and all other members of the Rybka team!

Eelco
Correct me if I am wrong, but the numbers he is giving is not _improvement_. For example, the result against Naum simply says rybka 3 is 330 elo _better_ than naum. But I didn't see any results with previous rybka against same opponent so that it is possible to see the improvement...
Yes, that is true. Larry does all his eval testing against older versions of Rybka, and because the Beta was still being changed by Vas' work on the search and finally changes in the timing algorithm, these are really just the first results of the final version against other programs, in TPR as you can see from the gameresults, not improvement compared to Rybka 2.3.2. Because part of the last improvements are in the timing algorithm it would probably be a good idea to check if you would get similar results with Fischer time controls for instance. But the 40 moves in one minute control, repeating, was chosen because CCRL and CEGT also test with this type of control.

I don't see any deliberate marketing hyperbole here, this was just a forum post from Larry.

I think it is an interesting question whether it would be a good idea for CEGT and CCRL to test Rybka 3 also against older versions of Rybka? On the one hand this will give a higher, arguably somewhat skewed rating for Rybka 3 because of good results against its older sibling. But the question is also if Rybka 2.3.2 rating is at the moment inflated because there is not enough competition for it at the top of the list. And you can argue that it is much better to have the 'direct' results even between related programs, if you are going to include them in the same list in the first place.

For CCRL I know this can be chosen by displaying the "pure" or "complete" lists, that will be interesting to compare after Rybka 3 results will come in.

I'm not sure how strong a possible effect of increased elo because of not having opponents of about equal strength is, I suppose it already depends on whether you use Bayeselo or other systems for elocalculation, among other things, and I don't know to which degree. It should be possible to get an indication of this comparing results between SP versions only for instance, compared to results of Rybka SP against opponents on 2 and 4 threads, opponents that are relatively stronger. Most are still not strong enough though.

Eelco
Marc MP

Re: Some results from Larry with finalized Beta

Post by Marc MP »

George Tsavdaris wrote:
Marc MP wrote: I find the way they present the information very misleading. I wonder if the "beta testers" from the Rybka team realize this, or maybe should I call them "sales representatives".

It starts like that, comparing the new version to the old:


Update: now +98=61-15 for +180 Elo (!).
Update: now +136=76-19 for +193 Elo (!).
Final: +151=83-22 for +192 Elo.

Second result: Rybka 3 vs. Rybka 2.32a mp on quad, 40/1' repeating: test stopped with score +47=24-7 for +196 Elo (!!).
So they start by showing the elo difference between the two versions in self play. Ok, I find that to be a doubtful exercise, but ok.
Third result: Rybka 3 vs. Zappa Mexico II on quad, 40/1' repeating: +59=36-7 for +195 Elo.
Update: +64=41-9 for +182 Elo.
Final: +85=46-11 for +200 Elo.

Comment: It seems that the much better time management helps much more against other Rybkas than against unrelated programs. Do not expect anywhere near a 200 Elo gain over 2.3.2a against programs unrelated to Rybka.

Fourth result: Rybka 3 vs. Hiarcs12 on quad, 40/1' repeating: test stopped after 100 games at +72=23-5 for +282 Elo.

Fifth result: Rybka 3 vs. Naum 3.1 on octal, 40/1' repeating: test stopped after 100 games at +76=22-2 for +330 Elo (!!).
And now the elo gains reported are: "+200", "+282", "+330".

That is immensly misleading to me because it compares the actual elo difference between Rybka 3 and the given opponents.
This doesn't seem misleading to me.
What it compares is plain clear!
You have a series of matches and next to them the ELO for this match.
If you take it the way you take it, then the responsibility is not their, but yours. :D

while it should compare the "actual elo difference between Rybka 3 and the given opponents" MINUS "actual elo difference between Rybka 2.3.2a and the given opponents".
And why it SHOULD compare that?
It's more valuable information yes, for sure, but what they do now is also clear and not misleading.
Of course and does not answer the question how much Rybka 3 is improved in comparison to Rybka 2.3.2a when playing other programs, but still not misleading.

And more than that what you quoted from them above, clearly shows what you supported that they are trying to mislead us by not saying. But they clearly say it!!
------
Comment(By Larry.K): It seems that the much better time management helps much more against other Rybkas than against unrelated programs. Do not expect anywhere near a 200 Elo gain over 2.3.2a against programs unrelated to Rybka.
------
How more clear they should be?
In my opinion, people are interested in buying a new product if they think there is an improvement versus the old product. I think in my opinion (I stress that),that you inflate the improvement. As simple as that. But I mentionned somewhere on the forum that the new Rybka will be an improvement. Just that you exagerate the improvement.
User avatar
M ANSARI
Posts: 3707
Joined: Thu Mar 16, 2006 7:10 pm

Re: Some results from Larry with finalized Beta

Post by M ANSARI »

I am not sure how many ELO's stronger Rybka 3 is in comparison to other programs ... but I am quite sure that it is at least 100 ELO stronger than any other chess engine out there ... and that is what Rybka 3 was advertised as. It is scoring approximately 85% against both DF10.1 and ZM_2 and more than 70% against Rybka 2.3.2a... at time control of 5 0 on a very overclocked Quad ... which would probably be 2x or even 3x a normal Quad. Also on Octa at 16 0 it is scoring around 74% against ZM_2 and 70% against 2.3.2a ... this would probably be equal to 30 0 or 60 0 time control on a normal Octa.

This all points to a very significant increase in strength throught the time control range. No hype here just facts ... and this will all be confirmed very soon at CCRL and CEGT.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Some results from Larry with finalized Beta

Post by bob »

Albert Silver wrote:
bob wrote:
Albert Silver wrote:
bob wrote:Correct me if I am wrong, but the numbers he is giving is not _improvement_. For example, the result against Naum simply says rybka 3 is 330 elo _better_ than naum. But I didn't see any results with previous rybka against same opponent so that it is possible to see the improvement...
You are not wrong.

Albert
So we are in the middle of a marketing hyperbole blitz???
Don't know that I'd go that far. The stats and results don't suggest a 300 elo improvement anywhere. After all, a score of +76=22-2 does indeed represent a performance of +330 Elo.

I think others are the ones responsible for misinterpreting.
I'd still think that the way the numbers are presented, without the old numbers for comparison, is, to the casual reader, suggesting a +330 elo performance boost. I've on rare occasions reported Elo improvements in Crafty, but always in the form "Version X tests to be about +n Elo stronger than version Y, against a suite of common opponents." Hard to misinterpret or misrepresent that kind of statement. It would be particularly ugly if the old version is +280 against the same opponent. Hard to say who is misrepresenting things, but it really could be done in a more scientific way...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Some results from Larry with finalized Beta

Post by bob »

M ANSARI wrote:I am not sure how many ELO's stronger Rybka 3 is in comparison to other programs ... but I am quite sure that it is at least 100 ELO stronger than any other chess engine out there ... and that is what Rybka 3 was advertised as. It is scoring approximately 85% against both DF10.1 and ZM_2 and more than 70% against Rybka 2.3.2a... at time control of 5 0 on a very overclocked Quad ... which would probably be 2x or even 3x a normal Quad. Also on Octa at 16 0 it is scoring around 74% against ZM_2 and 70% against 2.3.2a ... this would probably be equal to 30 0 or 60 0 time control on a normal Octa.

This all points to a very significant increase in strength throught the time control range. No hype here just facts ... and this will all be confirmed very soon at CCRL and CEGT.
There is no quad on the planet that can be overclocked to 2x-3x so I am not sure what you are talking about. You will be lucky to overclock by 30% or so, which is not going to make the overclocked box 2x-3x faster than a normal quad...

what exactly are you trying to say, since what you actually wrote makes no sense???