So we are in the middle of a marketing hyperbole blitz???Albert Silver wrote:You are not wrong.bob wrote:Correct me if I am wrong, but the numbers he is giving is not _improvement_. For example, the result against Naum simply says rybka 3 is 330 elo _better_ than naum. But I didn't see any results with previous rybka against same opponent so that it is possible to see the improvement...
Albert
Some results from Larry with finalized Beta
Moderators: hgm, Dann Corbit, Harvey Williamson
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Some results from Larry with finalized Beta
-
Uri Blass
- Posts: 10102
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Some results from Larry with finalized Beta
It is clear that he did not mean +306 elo relative to the previous version and he also edited his post to give estimate of +166 relative to previous version(see the overall summary at the bottom of the post).bob wrote:My point was that his "+306 Elo" is incorrect. That is suggesting that the new Rybka is 306 Elo better than the old to most readers. It would have been far more informative to say "rybka 2 +175 Rybka 3 +306. For a gain of +131 elo over the old version. I can't begin to interpret the current numbers and certainly don't believe any +300 Elo claims...Uri Blass wrote:It is possible to compare with ccrl results against the same opponentsbob wrote:Correct me if I am wrong, but the numbers he is giving is not _improvement_. For example, the result against Naum simply says rybka 3 is 330 elo _better_ than naum. But I didn't see any results with previous rybka against same opponent so that it is possible to see the improvement...Eelco de Groot wrote:Results as of this morning, they are still being updated:
I think we could call this a Quantum Leap, wouldn't you agree? The numbers speak for themselves. Pretty devastating for now when you view it from the competition's side... But in the longer run I think this is good for computer-chess. Congratulations to Larry, Vas, Jeroen and all other members of the Rybka team!I now have what is supposed to be the final Rybka 3 engine (not GUI), except perhaps for cosmetic things and any bug fixes. I'm running test matches with opposing programs and with Rybka 2.32a mp. All tests are on my two quads and one octal. I'll report results here generally when they reach 100 games.
First result: Rybka 3 vs. Rybka 2.32a mp on octal, game/1': after 106 games, +54=43-9 for +157 Elo.
Update: now +98=61-15 for +180 Elo (!).
Update: now +136=76-19 for +193 Elo (!).
Final: +151=83-22 for +192 Elo.
Second result: Rybka 3 vs. Rybka 2.32a mp on quad, 40/1' repeating: test stopped with score +47=24-7 for +196 Elo (!!).
Third result: Rybka 3 vs. Zappa Mexico II on quad, 40/1' repeating: +59=36-7 for +195 Elo.
Update: +64=41-9 for +182 Elo.
Final: +85=46-11 for +200 Elo.
Comment: It seems that the much better time management helps much more against other Rybkas than against unrelated programs. Do not expect anywhere near a 200 Elo gain over 2.3.2a against programs unrelated to Rybka.
Fourth result: Rybka 3 vs. Hiarcs12 on quad, 40/1' repeating: test stopped after 100 games at +72=23-5 for +282 Elo.
Fifth result: Rybka 3 vs. Naum 3.1 on octal, 40/1' repeating: test stopped after 100 games at +76=22-2 for +330 Elo (!!).
Comment: overnight I'm switching the Naum and Hiarcs tests between machines, in case Naum has some problem utilizing octal computer. Results in the morning (US).
Sixth result: Rybka 3 vs. Deep Shredder 11 on quad, 40/1' repeating: +61=21-6 for +254 Elo. Will run overnight.
Eelco
The opening choice may be different and the time control is also different(40/4 against 40/1) so the comparison is not perfect
but both played from fixed opening positions.
Larry used openings that are used by part of the cegt testers.
http://computerchess.org.uk/ccrl/4040.l ... 4-bit_4CPU
Of course things may be different at slower time control but 166 is an estimate based on comparison with CEGT blitz rating and comparing with CCRL or including rybka-rybka games could give bigger number than 166.
http://rybkaforum.net/cgi-bin/rybkaforu ... 5#pid78311
Uri
-
Albert Silver
- Posts: 3019
- Joined: Wed Mar 08, 2006 9:57 pm
- Location: Rio de Janeiro, Brazil
Re: Some results from Larry with finalized Beta
Don't know that I'd go that far. The stats and results don't suggest a 300 elo improvement anywhere. After all, a score of +76=22-2 does indeed represent a performance of +330 Elo.bob wrote:So we are in the middle of a marketing hyperbole blitz???Albert Silver wrote:You are not wrong.bob wrote:Correct me if I am wrong, but the numbers he is giving is not _improvement_. For example, the result against Naum simply says rybka 3 is 330 elo _better_ than naum. But I didn't see any results with previous rybka against same opponent so that it is possible to see the improvement...
Albert
I think others are the ones responsible for misinterpreting.
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."
-
Marc MP
Re: Some results from Larry with finalized Beta
I find the way they present the information very misleading. I wonder if the "beta testers" from the Rybka team realize this, or maybe should I call them "sales representatives".bob wrote:My point was that his "+306 Elo" is incorrect. That is suggesting that the new Rybka is 306 Elo better than the old to most readers. It would have been far more informative to say "rybka 2 +175 Rybka 3 +306. For a gain of +131 elo over the old version. I can't begin to interpret the current numbers and certainly don't believe any +300 Elo claims...Uri Blass wrote:It is possible to compare with ccrl results against the same opponentsbob wrote:Correct me if I am wrong, but the numbers he is giving is not _improvement_. For example, the result against Naum simply says rybka 3 is 330 elo _better_ than naum. But I didn't see any results with previous rybka against same opponent so that it is possible to see the improvement...Eelco de Groot wrote:Results as of this morning, they are still being updated:
I think we could call this a Quantum Leap, wouldn't you agree? The numbers speak for themselves. Pretty devastating for now when you view it from the competition's side... But in the longer run I think this is good for computer-chess. Congratulations to Larry, Vas, Jeroen and all other members of the Rybka team!I now have what is supposed to be the final Rybka 3 engine (not GUI), except perhaps for cosmetic things and any bug fixes. I'm running test matches with opposing programs and with Rybka 2.32a mp. All tests are on my two quads and one octal. I'll report results here generally when they reach 100 games.
First result: Rybka 3 vs. Rybka 2.32a mp on octal, game/1': after 106 games, +54=43-9 for +157 Elo.
Update: now +98=61-15 for +180 Elo (!).
Update: now +136=76-19 for +193 Elo (!).
Final: +151=83-22 for +192 Elo.
Second result: Rybka 3 vs. Rybka 2.32a mp on quad, 40/1' repeating: test stopped with score +47=24-7 for +196 Elo (!!).
Third result: Rybka 3 vs. Zappa Mexico II on quad, 40/1' repeating: +59=36-7 for +195 Elo.
Update: +64=41-9 for +182 Elo.
Final: +85=46-11 for +200 Elo.
Comment: It seems that the much better time management helps much more against other Rybkas than against unrelated programs. Do not expect anywhere near a 200 Elo gain over 2.3.2a against programs unrelated to Rybka.
Fourth result: Rybka 3 vs. Hiarcs12 on quad, 40/1' repeating: test stopped after 100 games at +72=23-5 for +282 Elo.
Fifth result: Rybka 3 vs. Naum 3.1 on octal, 40/1' repeating: test stopped after 100 games at +76=22-2 for +330 Elo (!!).
Comment: overnight I'm switching the Naum and Hiarcs tests between machines, in case Naum has some problem utilizing octal computer. Results in the morning (US).
Sixth result: Rybka 3 vs. Deep Shredder 11 on quad, 40/1' repeating: +61=21-6 for +254 Elo. Will run overnight.
Eelco
The opening choice may be different and the time control is also different(40/4 against 40/1) so the comparison is not perfect
but both played from fixed opening positions.
Larry used openings that are used by part of the cegt testers.
http://computerchess.org.uk/ccrl/4040.l ... 4-bit_4CPU
It starts like that, comparing the new version to the old:
So they start by showing the elo difference between the two versions in self play. Ok, I find that to be a doubtful exercise, but ok.
Update: now +98=61-15 for +180 Elo (!).
Update: now +136=76-19 for +193 Elo (!).
Final: +151=83-22 for +192 Elo.
Second result: Rybka 3 vs. Rybka 2.32a mp on quad, 40/1' repeating: test stopped with score +47=24-7 for +196 Elo (!!).
And now the elo gains reported are: "+200", "+282", "+330".Third result: Rybka 3 vs. Zappa Mexico II on quad, 40/1' repeating: +59=36-7 for +195 Elo.
Update: +64=41-9 for +182 Elo.
Final: +85=46-11 for +200 Elo.
Comment: It seems that the much better time management helps much more against other Rybkas than against unrelated programs. Do not expect anywhere near a 200 Elo gain over 2.3.2a against programs unrelated to Rybka.
Fourth result: Rybka 3 vs. Hiarcs12 on quad, 40/1' repeating: test stopped after 100 games at +72=23-5 for +282 Elo.
Fifth result: Rybka 3 vs. Naum 3.1 on octal, 40/1' repeating: test stopped after 100 games at +76=22-2 for +330 Elo (!!).
That is immensly misleading to me because it compares the actual elo difference between Rybka 3 and the given opponents. while it should compare the "actual elo difference between Rybka 3 and the given opponents" MINUS "actual elo difference between Rybka 2.3.2a and the given opponents".
I did the math for those interested here: http://64.68.157.89/forum/viewtopic.php ... =&start=20
But it seems we are still plagued with "Rybka sales representatives".
-
George Tsavdaris
- Posts: 1627
- Joined: Thu Mar 09, 2006 12:35 pm
Re: Some results from Larry with finalized Beta
This doesn't seem misleading to me.Marc MP wrote: I find the way they present the information very misleading. I wonder if the "beta testers" from the Rybka team realize this, or maybe should I call them "sales representatives".
It starts like that, comparing the new version to the old:
So they start by showing the elo difference between the two versions in self play. Ok, I find that to be a doubtful exercise, but ok.
Update: now +98=61-15 for +180 Elo (!).
Update: now +136=76-19 for +193 Elo (!).
Final: +151=83-22 for +192 Elo.
Second result: Rybka 3 vs. Rybka 2.32a mp on quad, 40/1' repeating: test stopped with score +47=24-7 for +196 Elo (!!).
And now the elo gains reported are: "+200", "+282", "+330".Third result: Rybka 3 vs. Zappa Mexico II on quad, 40/1' repeating: +59=36-7 for +195 Elo.
Update: +64=41-9 for +182 Elo.
Final: +85=46-11 for +200 Elo.
Comment: It seems that the much better time management helps much more against other Rybkas than against unrelated programs. Do not expect anywhere near a 200 Elo gain over 2.3.2a against programs unrelated to Rybka.
Fourth result: Rybka 3 vs. Hiarcs12 on quad, 40/1' repeating: test stopped after 100 games at +72=23-5 for +282 Elo.
Fifth result: Rybka 3 vs. Naum 3.1 on octal, 40/1' repeating: test stopped after 100 games at +76=22-2 for +330 Elo (!!).
That is immensly misleading to me because it compares the actual elo difference between Rybka 3 and the given opponents.
What it compares is plain clear!
You have a series of matches and next to them the ELO for this match.
If you take it the way you take it, then the responsibility is not their, but yours.
And why it SHOULD compare that?while it should compare the "actual elo difference between Rybka 3 and the given opponents" MINUS "actual elo difference between Rybka 2.3.2a and the given opponents".
It's more valuable information yes, for sure, but what they do now is also clear and not misleading.
Of course and does not answer the question how much Rybka 3 is improved in comparison to Rybka 2.3.2a when playing other programs, but still not misleading.
And more than that what you quoted from them above, clearly shows what you supported that they are trying to mislead us by not saying. But they clearly say it!!
------
Comment(By Larry.K): It seems that the much better time management helps much more against other Rybkas than against unrelated programs. Do not expect anywhere near a 200 Elo gain over 2.3.2a against programs unrelated to Rybka.
------
How more clear they should be?
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
"Is it a boy or girl?"
YES! He replied.....
-
Eelco de Groot
- Posts: 4556
- Joined: Sun Mar 12, 2006 2:40 am
- Full name:
Re: Some results from Larry with finalized Beta
Yes, that is true. Larry does all his eval testing against older versions of Rybka, and because the Beta was still being changed by Vas' work on the search and finally changes in the timing algorithm, these are really just the first results of the final version against other programs, in TPR as you can see from the gameresults, not improvement compared to Rybka 2.3.2. Because part of the last improvements are in the timing algorithm it would probably be a good idea to check if you would get similar results with Fischer time controls for instance. But the 40 moves in one minute control, repeating, was chosen because CCRL and CEGT also test with this type of control.bob wrote:Correct me if I am wrong, but the numbers he is giving is not _improvement_. For example, the result against Naum simply says rybka 3 is 330 elo _better_ than naum. But I didn't see any results with previous rybka against same opponent so that it is possible to see the improvement...Eelco de Groot wrote:Results as of this morning, they are still being updated:
I think we could call this a Quantum Leap, wouldn't you agree? The numbers speak for themselves. Pretty devastating for now when you view it from the competition's side... But in the longer run I think this is good for computer-chess. Congratulations to Larry, Vas, Jeroen and all other members of the Rybka team!I now have what is supposed to be the final Rybka 3 engine (not GUI), except perhaps for cosmetic things and any bug fixes. I'm running test matches with opposing programs and with Rybka 2.32a mp. All tests are on my two quads and one octal. I'll report results here generally when they reach 100 games.
First result: Rybka 3 vs. Rybka 2.32a mp on octal, game/1': after 106 games, +54=43-9 for +157 Elo.
Update: now +98=61-15 for +180 Elo (!).
Update: now +136=76-19 for +193 Elo (!).
Final: +151=83-22 for +192 Elo.
Second result: Rybka 3 vs. Rybka 2.32a mp on quad, 40/1' repeating: test stopped with score +47=24-7 for +196 Elo (!!).
Third result: Rybka 3 vs. Zappa Mexico II on quad, 40/1' repeating: +59=36-7 for +195 Elo.
Update: +64=41-9 for +182 Elo.
Final: +85=46-11 for +200 Elo.
Comment: It seems that the much better time management helps much more against other Rybkas than against unrelated programs. Do not expect anywhere near a 200 Elo gain over 2.3.2a against programs unrelated to Rybka.
Fourth result: Rybka 3 vs. Hiarcs12 on quad, 40/1' repeating: test stopped after 100 games at +72=23-5 for +282 Elo.
Fifth result: Rybka 3 vs. Naum 3.1 on octal, 40/1' repeating: test stopped after 100 games at +76=22-2 for +330 Elo (!!).
Comment: overnight I'm switching the Naum and Hiarcs tests between machines, in case Naum has some problem utilizing octal computer. Results in the morning (US).
Sixth result: Rybka 3 vs. Deep Shredder 11 on quad, 40/1' repeating: +61=21-6 for +254 Elo. Will run overnight.
Eelco
I don't see any deliberate marketing hyperbole here, this was just a forum post from Larry.
I think it is an interesting question whether it would be a good idea for CEGT and CCRL to test Rybka 3 also against older versions of Rybka? On the one hand this will give a higher, arguably somewhat skewed rating for Rybka 3 because of good results against its older sibling. But the question is also if Rybka 2.3.2 rating is at the moment inflated because there is not enough competition for it at the top of the list. And you can argue that it is much better to have the 'direct' results even between related programs, if you are going to include them in the same list in the first place.
For CCRL I know this can be chosen by displaying the "pure" or "complete" lists, that will be interesting to compare after Rybka 3 results will come in.
I'm not sure how strong a possible effect of increased elo because of not having opponents of about equal strength is, I suppose it already depends on whether you use Bayeselo or other systems for elocalculation, among other things, and I don't know to which degree. It should be possible to get an indication of this comparing results between SP versions only for instance, compared to results of Rybka SP against opponents on 2 and 4 threads, opponents that are relatively stronger. Most are still not strong enough though.
Eelco
-
Marc MP
Re: Some results from Larry with finalized Beta
In my opinion, people are interested in buying a new product if they think there is an improvement versus the old product. I think in my opinion (I stress that),that you inflate the improvement. As simple as that. But I mentionned somewhere on the forum that the new Rybka will be an improvement. Just that you exagerate the improvement.George Tsavdaris wrote:This doesn't seem misleading to me.Marc MP wrote: I find the way they present the information very misleading. I wonder if the "beta testers" from the Rybka team realize this, or maybe should I call them "sales representatives".
It starts like that, comparing the new version to the old:
So they start by showing the elo difference between the two versions in self play. Ok, I find that to be a doubtful exercise, but ok.
Update: now +98=61-15 for +180 Elo (!).
Update: now +136=76-19 for +193 Elo (!).
Final: +151=83-22 for +192 Elo.
Second result: Rybka 3 vs. Rybka 2.32a mp on quad, 40/1' repeating: test stopped with score +47=24-7 for +196 Elo (!!).
And now the elo gains reported are: "+200", "+282", "+330".Third result: Rybka 3 vs. Zappa Mexico II on quad, 40/1' repeating: +59=36-7 for +195 Elo.
Update: +64=41-9 for +182 Elo.
Final: +85=46-11 for +200 Elo.
Comment: It seems that the much better time management helps much more against other Rybkas than against unrelated programs. Do not expect anywhere near a 200 Elo gain over 2.3.2a against programs unrelated to Rybka.
Fourth result: Rybka 3 vs. Hiarcs12 on quad, 40/1' repeating: test stopped after 100 games at +72=23-5 for +282 Elo.
Fifth result: Rybka 3 vs. Naum 3.1 on octal, 40/1' repeating: test stopped after 100 games at +76=22-2 for +330 Elo (!!).
That is immensly misleading to me because it compares the actual elo difference between Rybka 3 and the given opponents.
What it compares is plain clear!
You have a series of matches and next to them the ELO for this match.
If you take it the way you take it, then the responsibility is not their, but yours.
And why it SHOULD compare that?while it should compare the "actual elo difference between Rybka 3 and the given opponents" MINUS "actual elo difference between Rybka 2.3.2a and the given opponents".
It's more valuable information yes, for sure, but what they do now is also clear and not misleading.
Of course and does not answer the question how much Rybka 3 is improved in comparison to Rybka 2.3.2a when playing other programs, but still not misleading.
And more than that what you quoted from them above, clearly shows what you supported that they are trying to mislead us by not saying. But they clearly say it!!
------
Comment(By Larry.K): It seems that the much better time management helps much more against other Rybkas than against unrelated programs. Do not expect anywhere near a 200 Elo gain over 2.3.2a against programs unrelated to Rybka.
------
How more clear they should be?
-
M ANSARI
- Posts: 3707
- Joined: Thu Mar 16, 2006 7:10 pm
Re: Some results from Larry with finalized Beta
I am not sure how many ELO's stronger Rybka 3 is in comparison to other programs ... but I am quite sure that it is at least 100 ELO stronger than any other chess engine out there ... and that is what Rybka 3 was advertised as. It is scoring approximately 85% against both DF10.1 and ZM_2 and more than 70% against Rybka 2.3.2a... at time control of 5 0 on a very overclocked Quad ... which would probably be 2x or even 3x a normal Quad. Also on Octa at 16 0 it is scoring around 74% against ZM_2 and 70% against 2.3.2a ... this would probably be equal to 30 0 or 60 0 time control on a normal Octa.
This all points to a very significant increase in strength throught the time control range. No hype here just facts ... and this will all be confirmed very soon at CCRL and CEGT.
This all points to a very significant increase in strength throught the time control range. No hype here just facts ... and this will all be confirmed very soon at CCRL and CEGT.
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Some results from Larry with finalized Beta
I'd still think that the way the numbers are presented, without the old numbers for comparison, is, to the casual reader, suggesting a +330 elo performance boost. I've on rare occasions reported Elo improvements in Crafty, but always in the form "Version X tests to be about +n Elo stronger than version Y, against a suite of common opponents." Hard to misinterpret or misrepresent that kind of statement. It would be particularly ugly if the old version is +280 against the same opponent. Hard to say who is misrepresenting things, but it really could be done in a more scientific way...Albert Silver wrote:Don't know that I'd go that far. The stats and results don't suggest a 300 elo improvement anywhere. After all, a score of +76=22-2 does indeed represent a performance of +330 Elo.bob wrote:So we are in the middle of a marketing hyperbole blitz???Albert Silver wrote:You are not wrong.bob wrote:Correct me if I am wrong, but the numbers he is giving is not _improvement_. For example, the result against Naum simply says rybka 3 is 330 elo _better_ than naum. But I didn't see any results with previous rybka against same opponent so that it is possible to see the improvement...
Albert
I think others are the ones responsible for misinterpreting.
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Some results from Larry with finalized Beta
There is no quad on the planet that can be overclocked to 2x-3x so I am not sure what you are talking about. You will be lucky to overclock by 30% or so, which is not going to make the overclocked box 2x-3x faster than a normal quad...M ANSARI wrote:I am not sure how many ELO's stronger Rybka 3 is in comparison to other programs ... but I am quite sure that it is at least 100 ELO stronger than any other chess engine out there ... and that is what Rybka 3 was advertised as. It is scoring approximately 85% against both DF10.1 and ZM_2 and more than 70% against Rybka 2.3.2a... at time control of 5 0 on a very overclocked Quad ... which would probably be 2x or even 3x a normal Quad. Also on Octa at 16 0 it is scoring around 74% against ZM_2 and 70% against 2.3.2a ... this would probably be equal to 30 0 or 60 0 time control on a normal Octa.
This all points to a very significant increase in strength throught the time control range. No hype here just facts ... and this will all be confirmed very soon at CCRL and CEGT.
what exactly are you trying to say, since what you actually wrote makes no sense???