Some results from Larry with finalized Beta

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Dann Corbit, Harvey Williamson

Marc MP

Re: Some results from Larry with finalized Beta

Post by Marc MP »

M ANSARI wrote:I am not sure how many ELO's stronger Rybka 3 is in comparison to other programs ... but I am quite sure that it is at least 100 ELO stronger than any other chess engine out there ... and that is what Rybka 3 was advertised as. It is scoring approximately 85% against both DF10.1 and ZM_2 and more than 70% against Rybka 2.3.2a... at time control of 5 0 on a very overclocked Quad ... which would probably be 2x or even 3x a normal Quad. Also on Octa at 16 0 it is scoring around 74% against ZM_2 and 70% against 2.3.2a ... this would probably be equal to 30 0 or 60 0 time control on a normal Octa.

This all points to a very significant increase in strength throught the time control range. No hype here just facts ... and this will all be confirmed very soon at CCRL and CEGT.
Hi M. Ansari,

May I tell you something?

I will follow almost eyes blinds your advices you gave on the Rybka forum about hardware for the comp I'm planing to buy.

I'm just saying this in case some do not understand that I'm not against Rybka's improvement, just against the way it ia presented...

I will, for sure buy the product, but it is presented as more than 300 elo superior to what existed before. I find that misleading, but you can think I'm wrong. No problem. This is personal opinion. Thank you for the numerous advices on the rybka forum that will help me to overclock my cpu. Sincerely.
User avatar
M ANSARI
Posts: 3707
Joined: Thu Mar 16, 2006 7:10 pm

Re: Some results from Larry with finalized Beta

Post by M ANSARI »

I did not mean a 2x or 3x speed up in Mhz ... I meant "results" would be the equivalent to the time control that is 2x or 3x faster. So a 3 0 result on a overclocked Quad would give an indication of a 5 0 or a 10 0 result on an normally clocked system. This was a personal guess by me of what I am guessing results would be like if a "normal" clocked processor would be used... which could be very well inaccurate but I bet I am pretty close. I might just do an exact same tourney with reduced Mhz speed and increased timing and compare the results. The point I was trying to make is that the tests that Larry K had done at 40 moves in 1' seem to be an accurate representation of the strength of the program ... and that as you go up in time control the strength stays pretty linear. This for me was quite a revelation as I used to totally dismiss any usefullness of 1 minute games.

By the way ... you can overclock a Quad and an Octa to 100% or more of a normal Quad's or a normal Octa's speed ... I would say a "normal" Quad or Octa today would be running at 2.4Ghz. I will admit that the Octa's are more difficult to overclock due to their use of FBDIMMS ... but with Skulltrail and an unlocked CPU you can do pretty well. I regularly play my Octa at 8 cores running 4.8Ghz on Playchess.

One more thing ... I think LK mentioned that he thought the ELO improvement was around 100 ELO ... not 300 or 400 ELO as was mentioned. It was after some strong results in some tournaments that were run that some started speculating on those elevated ELO's. I think both Vas and LK won't even consider claiming more than 100 or 110 ELO improvement. After running a few tournaments and watching R3 play I can tell you that I am quite certain that R3 will be confirmed by CCRL and CEGT to be at least 100 ELO points stronger.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Some results from Larry with finalized Beta

Post by bob »

M ANSARI wrote:I did not mean a 2x or 3x speed up in Mhz ... I meant "results" would be the equivalent to the time control that is 2x or 3x faster. So a 3 0 result on a overclocked Quad would give an indication of a 5 0 or a 10 0 result on an normally clocked system. This was a personal guess by me of what I am guessing results would be like if a "normal" clocked processor would be used... which could be very well inaccurate but I bet I am pretty close. I might just do an exact same tourney with reduced Mhz speed and increased timing and compare the results. The point I was trying to make is that the tests that Larry K had done at 40 moves in 1' seem to be an accurate representation of the strength of the program ... and that as you go up in time control the strength stays pretty linear. This for me was quite a revelation as I used to totally dismiss any usefullness of 1 minute games.

By the way ... you can overclock a Quad and an Octa to 100% or more of a normal Quad's or a normal Octa's speed ... I would say a "normal" Quad or Octa today would be running at 2.4Ghz. I will admit that the Octa's are more difficult to overclock due to their use of FBDIMMS ... but with Skulltrail and an unlocked CPU you can do pretty well. I regularly play my Octa at 8 cores running 4.8Ghz on Playchess.

One more thing ... I think LK mentioned that he thought the ELO improvement was around 100 ELO ... not 300 or 400 ELO as was mentioned. It was after some strong results in some tournaments that were run that some started speculating on those elevated ELO's. I think both Vas and LK won't even consider claiming more than 100 or 110 ELO improvement. After running a few tournaments and watching R3 play I can tell you that I am quite certain that R3 will be confirmed by CCRL and CEGT to be at least 100 ELO points stronger.
so comparing 3 0 to 10 0 and saying they are equivalent is not saying 3 0 is 3x faster than 10 0?

Looks to me as though that says it is _over_ 3x faster. And that's not the case. Overclocking by 25% would make a machine running 6 0 games act like it was actually playing 8 0 games...
User avatar
Graham Banks
Posts: 41198
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: Some results from Larry with finalized Beta

Post by Graham Banks »

M ANSARI wrote:One more thing ... I think LK mentioned that he thought the ELO improvement was around 100 ELO ... not 300 or 400 ELO as was mentioned.
From what I've read and what I've been told by reliable sources, Rybka 3 is probably a 100+ elo improvement on 1CPU, but a bigger improvement on multi-cpu.

Regards, Graham.
gbanksnz at gmail.com
Uri Blass
Posts: 10102
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Some results from Larry with finalized Beta

Post by Uri Blass »

Uri Blass wrote:
bob wrote:
Uri Blass wrote:
bob wrote:
Eelco de Groot wrote:Results as of this morning, they are still being updated:
I now have what is supposed to be the final Rybka 3 engine (not GUI), except perhaps for cosmetic things and any bug fixes. I'm running test matches with opposing programs and with Rybka 2.32a mp. All tests are on my two quads and one octal. I'll report results here generally when they reach 100 games.

First result: Rybka 3 vs. Rybka 2.32a mp on octal, game/1': after 106 games, +54=43-9 for +157 Elo.
Update: now +98=61-15 for +180 Elo (!).
Update: now +136=76-19 for +193 Elo (!).
Final: +151=83-22 for +192 Elo.

Second result: Rybka 3 vs. Rybka 2.32a mp on quad, 40/1' repeating: test stopped with score +47=24-7 for +196 Elo (!!).

Third result: Rybka 3 vs. Zappa Mexico II on quad, 40/1' repeating: +59=36-7 for +195 Elo.
Update: +64=41-9 for +182 Elo.
Final: +85=46-11 for +200 Elo.

Comment: It seems that the much better time management helps much more against other Rybkas than against unrelated programs. Do not expect anywhere near a 200 Elo gain over 2.3.2a against programs unrelated to Rybka.

Fourth result: Rybka 3 vs. Hiarcs12 on quad, 40/1' repeating: test stopped after 100 games at +72=23-5 for +282 Elo.

Fifth result: Rybka 3 vs. Naum 3.1 on octal, 40/1' repeating: test stopped after 100 games at +76=22-2 for +330 Elo (!!).

Comment: overnight I'm switching the Naum and Hiarcs tests between machines, in case Naum has some problem utilizing octal computer. Results in the morning (US).

Sixth result: Rybka 3 vs. Deep Shredder 11 on quad, 40/1' repeating: +61=21-6 for +254 Elo. Will run overnight.
I think we could call this a Quantum Leap, wouldn't you agree? The numbers speak for themselves. Pretty devastating for now when you view it from the competition's side... But in the longer run I think this is good for computer-chess. Congratulations to Larry, Vas, Jeroen and all other members of the Rybka team!

Eelco
Correct me if I am wrong, but the numbers he is giving is not _improvement_. For example, the result against Naum simply says rybka 3 is 330 elo _better_ than naum. But I didn't see any results with previous rybka against same opponent so that it is possible to see the improvement...
It is possible to compare with ccrl results against the same opponents
The opening choice may be different and the time control is also different(40/4 against 40/1) so the comparison is not perfect
but both played from fixed opening positions.

Larry used openings that are used by part of the cegt testers.

http://computerchess.org.uk/ccrl/4040.l ... 4-bit_4CPU
My point was that his "+306 Elo" is incorrect. That is suggesting that the new Rybka is 306 Elo better than the old to most readers. It would have been far more informative to say "rybka 2 +175 Rybka 3 +306. For a gain of +131 elo over the old version. I can't begin to interpret the current numbers and certainly don't believe any +300 Elo claims...
It is clear that he did not mean +306 elo relative to the previous version and he also edited his post to give estimate of +166 relative to previous version(see the overall summary at the bottom of the post).

Of course things may be different at slower time control but 166 is an estimate based on comparison with CEGT blitz rating and comparing with CCRL or including rybka-rybka games could give bigger number than 166.

http://rybkaforum.net/cgi-bin/rybkaforu ... 5#pid78311

Uri
update
They reduced the 166 number by not including games against naum.

Larry kaufman says that naum is significantly weaker in 1 minute/40 moves based on his testing at 2 minutes/40 moves.
He says that his results not including naum suggest
3202 CEGT and 3265 CCRL that mean +119 and +132 elo relative to these lists and considering the fact that the time control is slower his estimate is 3200 CEGT and 3250 CCRL rating list.

Uri
PauloSoare
Posts: 1335
Joined: Thu Mar 09, 2006 5:30 am
Location: Cabo Frio, Brasil

Re: Some results from Larry with finalized Beta

Post by PauloSoare »

Graham Banks wrote:
M ANSARI wrote:One more thing ... I think LK mentioned that he thought the ELO improvement was around 100 ELO ... not 300 or 400 ELO as was mentioned.
From what I've read and what I've been told by reliable sources, Rybka 3 is probably a 100+ elo improvement on 1CPU, but a bigger improvement on multi-cpu.

Regards, Graham.
Mon Dieu! I do not remember a great improvement between a version of a
top engine to another.
Anyone remember?

Paulo Soares
Dirt
Posts: 2851
Joined: Wed Mar 08, 2006 10:01 pm
Location: Irvine, CA, USA

Re: Some results from Larry with finalized Beta

Post by Dirt »

Marc MP wrote:I will, for sure buy the product, but it is presented as more than 300 elo superior to what existed before. I find that misleading...
I agree, but I'm willing to believe it was inadvertent. This was a quick post in a forum and not a carefully prepared marketing statement. The actual results are compelling enough without exaggeration.
Marc MP

Re: Some results from Larry with finalized Beta

Post by Marc MP »

PauloSoare wrote:
Graham Banks wrote:
M ANSARI wrote:One more thing ... I think LK mentioned that he thought the ELO improvement was around 100 ELO ... not 300 or 400 ELO as was mentioned.
From what I've read and what I've been told by reliable sources, Rybka 3 is probably a 100+ elo improvement on 1CPU, but a bigger improvement on multi-cpu.

Regards, Graham.
Mon Dieu! I do not remember a great improvement between a version of a
top engine to another.
Anyone remember?

Paulo Soares
Deep Sjeng 3.0 vs Deep Sjeng 2.7, released 3 weeks ago. :wink:
User avatar
George Tsavdaris
Posts: 1627
Joined: Thu Mar 09, 2006 12:35 pm

Re: Some results from Larry with finalized Beta

Post by George Tsavdaris »

Marc MP wrote:
PauloSoare wrote:
Graham Banks wrote:
M ANSARI wrote:One more thing ... I think LK mentioned that he thought the ELO improvement was around 100 ELO ... not 300 or 400 ELO as was mentioned.
From what I've read and what I've been told by reliable sources, Rybka 3 is probably a 100+ elo improvement on 1CPU, but a bigger improvement on multi-cpu.

Regards, Graham.
Mon Dieu! I do not remember a great improvement between a version of a
top engine to another.
Anyone remember?

Paulo Soares
Deep Sjeng 3.0 vs Deep Sjeng 2.7, released 3 weeks ago. :wink:
It depends on what he means with "top engine".
Perhaps he has as top engines the top 3 on CEGT/CCRL so in that case Sjeng is not a top engine. But i can't be sure of course what he exactly means....
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
Marc MP

Re: Some results from Larry with finalized Beta

Post by Marc MP »

George Tsavdaris wrote:
Marc MP wrote:
...

And now the elo gains reported are: "+200", "+282", "+330".

That is immensly misleading to me because it compares the actual elo difference between Rybka 3 and the given opponents.

This doesn't seem misleading to me.
What it compares is plain clear!
You have a series of matches and next to them the ELO for this match.
If you take it the way you take it, then the responsibility is not their, but yours. :D
So this is what you claim "plain clear". I'm sorry, but in my books a "plain clear" statement looks like: "Rybka 3 is estimated to be 130 elo superior to its predecessor".

Marc MP wrote: while it should compare the "actual elo difference between Rybka 3 and the given opponents" MINUS "actual elo difference between Rybka 2.3.2a and the given opponents".
George Tsavdaris wrote: And why it SHOULD compare that?
Because that yields more information about the strength improvement, as simple as that.
George Tsavdaris wrote: It's more valuable information yes, for sure, but what they do now is also clear and not misleading.
Of course and does not answer the question how much Rybka 3 is improved in comparison to Rybka 2.3.2a when playing other programs, but still not misleading.

And more than that what you quoted from them above, clearly shows what you supported that they are trying to mislead us by not saying. But they clearly say it!!
------
Comment(By Larry.K): It seems that the much better time management helps much more against other Rybkas than against unrelated programs. Do not expect anywhere near a 200 Elo gain over 2.3.2a against programs unrelated to Rybka.
------
How more clear they should be?
Larry K. is an administrator on the rybka forum and have the chance to get its posts edited whenever he likes. As I'm writing this the last edit is from: "19:20 Edité 2008-07-25 02:06". That is after your latest post to me.

The quotes from Larry K. you mentionned (and highlighted in bold and red), were posted after I made my statement. Then you ask me "How more clear should that be?"

From what I understand the Rybka Team (represented here by Larry K.), made corrections following to the point I made (or others before or after me, it doesn't matter).

I hope you weren't misleading the reader by showing statements by Larry K. that weren't written yet when I posted my critique about rybka 3 elo improvement methodology. I understand it could be a mistake from you.

Have a good day,