Rybka 3: The dark truth behind the hype

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
George Tsavdaris
Posts: 1627
Joined: Thu Mar 09, 2006 12:35 pm

Re: Rybka 3: The dark truth behind the hype

Post by George Tsavdaris »

M ANSARI wrote:These positions were from Rybka 3 on 8 core Skulltrail at 4.8Ghz

Analysis by Rybka 3 :
1.Bxa7 Bxe4 2.dxe4 e5 3.bxc5 Nxc5 4.Rd1 Qb4 5.Bb6 Rxd1+ 6.Qxd1 Nc6 7.h4 Be7 8.Kh2 Qc3 9.Ne3 Bf8 10.Nd5 Qb2
± (1.16) Depth: 17 00:00:35 20688kN
1.Qxf6+
+- (1.44) Depth: 17 00:01:00 36506kN
OK thanks for analyzing this.

So 1 minute. I'm a bit puzzled by this since i thought an octal Skultraill QX9775 overclocked, as yours, would be around 14-15 times faster than Albert's 7.5 minutes solution that was on a 2.2GHz Athlon64.

Let's see:
•Albert Silver, using 2.2GHz Athlon64 did 469 seconds and had 38 043 000 nodes.
•You, using Skulltrail at 4.8Ghz did 60 seconds and had 36 506 000 nodes.

Athlon64 sinlge CPU 2.2GHz had 81 kN/s
Skulltrail at 4.8 GHz had 608 kN/s

That is a difference of a factor 7.5 only!!!!
I would expect something like 14-16.

If we take the time factor we get:
469/60 = 7.8

Seeing this page i see:
Intel QX9775 Yorkfield 8 x 4410, 40,75 relative speed(to a Pentium III - 1000).
Athlon 64 3500+ 2.2GHz 2,59 relative speed.

So 40.75/2.59 = 15.7 speed factor.

I wonder why i don't see it in this case. Is Rybka's 3 bad speedup the reason?

Anyway since you requested them here are some positions to see Rybka's tactical strength:

Avoid 1.hxg5? that brings white into many troubles and play 1.Be2! that probably wins.
[d]r2q1rk1/1b1nbpnp/1p4p1/p2p2PP/2pP4/2NBPN2/PPQB1P2/1K1R3R w - - 0 1

Play 1.Qg5! that wins.
[d]5q2/n2P1k2/2b5/8/8/3N4/4BK2/6Q1 w - - 0 1

And after 1.Qg5! Ke6+ play 2.Kg1!! that wins and not 2.Ke1? that draws.
[d]5q2/n2P4/2b1k3/6Q1/8/3N4/4BK2/8 w - - 0 2

Play 1.Rxg7+!! that wins instantly.
[d]1r1rb1k1/5ppp/4p3/1p1p3P/1q2P2Q/pN3P2/PPP4P/1K1R2R1 w - - 0 1

Avoid 1...Rxg2? and play 1...Rxd2!! that wins instantly!
[d]5rk1/pp3p2/7b/2pR4/8/2P4P/P1PNr1P1/2K4R b - - 0 1

12...b3! most probably wins.
[d]rnbqk2r/p3ppb1/3p3p/P1pP2pn/1pP1P3/5NB1/RP1N1PPP/3QKB1R b Kkq - 0 12

1.Ba3!! wins instantly.
[d]8/p3q1kp/1p2Pnp1/3pQ3/2pP4/1nP3N1/1B4PP/6K1 w - - 0 0

1.Rd8! wins.
[d]r1b1qr1k/2p3pp/4p3/1pb1PpN1/pn3N1P/6B1/PPP1QPP1/2KR3R w - - 0 1

1...Qe4!! wins instantly.
[d]1r1r2k1/2p1qp1p/p5p1/1pQB1b2/5Pn1/N1R1P1P1/PP5P/R1B3K1 b - - 0 1

1.Kh2!! wins instantly.
[d]4rrk1/1bpR1p2/1pq1pQp1/p3P2p/P1PR3P/5N2/2P2PP1/6K1 w - - 0 1
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
Dirt
Posts: 2851
Joined: Wed Mar 08, 2006 10:01 pm
Location: Irvine, CA, USA

Re: Rybka 3: The dark truth behind the hype

Post by Dirt »

George Tsavdaris wrote: So 1 minute. I'm a bit puzzled by this since i thought an octal Skultraill QX9775 overclocked, as yours, would be around 14-15 times faster than Albert's 7.5 minutes solution that was on a 2.2GHz Athlon64.
An MP program generally has to look at more nodes to evaluate a position than an SP program. This is because cutoff values aren't known precisely. I have read that, in order to make comparing hardware strength easier, Rybka doesn't count the extra nodes the MP program searches.

There is also a lot of randomness in time it takes the MP program to find the critical move, so you can't project the speedup with any accuracy.
Albert Silver
Posts: 3026
Joined: Wed Mar 08, 2006 9:57 pm
Location: Rio de Janeiro, Brazil

Re: Rybka 3: The dark truth behind the hype

Post by Albert Silver »

Dirt wrote:
George Tsavdaris wrote: So 1 minute. I'm a bit puzzled by this since i thought an octal Skultraill QX9775 overclocked, as yours, would be around 14-15 times faster than Albert's 7.5 minutes solution that was on a 2.2GHz Athlon64.
An MP program generally has to look at more nodes to evaluate a position than an SP program. This is because cutoff values aren't known precisely. I have read that, in order to make comparing hardware strength easier, Rybka doesn't count the extra nodes the MP program searches.

There is also a lot of randomness in time it takes the MP program to find the critical move, so you can't project the speedup with any accuracy.
If it was the 32-bit engine, it would account for some of the difference too.

It is worth noting, that I suspect Vas and Ryan (Benitez) have the same view on the 64-bit and 32-bit versions of the engine. I have been testing Rybka 3 against Fritz 11, the engine that had given Rybka 232a the toughest time of it (not this time, before you ask), and I asked Ryan which version he thought I should use. He instantly said the 64-bit version. I joked, "What? no mercy for the opponents?". He replied he didn't like the description of 64-bit "speeding up" an engine. If it was designed as a 64-bit engine, as Rybka clearly was, then being forced to run in 32-bits is a handicap. The reason I think Vas shares this sentiment is that the engines he provides are labeled either Rybka 3, or Rybka 3 32-bit.

Albert
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."
User avatar
George Tsavdaris
Posts: 1627
Joined: Thu Mar 09, 2006 12:35 pm

Re: Rybka 3: The dark truth behind the hype

Post by George Tsavdaris »

Dirt wrote:
George Tsavdaris wrote: So 1 minute. I'm a bit puzzled by this since i thought an octal Skultraill QX9775 overclocked, as yours, would be around 14-15 times faster than Albert's 7.5 minutes solution that was on a 2.2GHz Athlon64.
An MP program generally has to look at more nodes to evaluate a position than an SP program. This is because cutoff values aren't known precisely. I have read that, in order to make comparing hardware strength easier, Rybka doesn't count the extra nodes the MP program searches.

There is also a lot of randomness in time it takes the MP program to find the critical move, so you can't project the speedup with any accuracy.
Yes when comparing time to find critical lines, but what about when comparing nodes per seconds?

According to Vasik he says about the Rybka 2.2n2:
"This updated version of Rybka will display node counts which represent the multi-processing efficiency. (They are adjusted to take into account the wasted work.) So, you'll be able to compare the strength of different hardware, ie. you can compare a 4-core machine with a 2-core machine. The one with the better nodes-per-second is better "


Also i remember him saying that for next versions of Rybka 2.2n2 that(the aforementioned) will be the standard way of measuring speedup efficiency on multi-processors.

So nodes per second should give the true efficiency for Rybka!

So why the nodes per second does show only a 7.5 factor on a Skulltrail with 8 cores 4.8GHz Versus an Athlon 1 CPU 2.2GHz?

If we compare them:
•Athlon is 1 single CPU while Skulltrail is 8 CPUs(cores). So we have a speedup factor of 8x from this.
•But also this Athlon is 2.2GHz while the octal is 4.8 GHz. So we have a speedup factor of about 2x from this.

The "nodes per second" speedup(that Vasik said it should give the efficiency of Rybka) factor is 7.5 as i've said earlier.

If we now remove the second factor of speedup(the different GHz values) we then have:
Theoretical speedup: 1 CPU versus 8 CPUs gives 8x
Actual speedup based on "nodes per second": 1 CPU versus 8 CPUs for Rybka 3 gives a 7.5/2 = 3.75x

So Rybka's speedup from 1 to 8 CPUs seems to be for that position 3.75.
I don't understand why it's so low. What is going on wrong?
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
Uri
Posts: 522
Joined: Thu Dec 27, 2007 9:34 pm

Re: Rybka 3: The dark truth behind the hype

Post by Uri »

M ANSARI wrote:I would have liked to test these positions on very fast hardware ... either 8 core at 4.8ghz or Quad at 4.1ghz but they are running tournaments now. Here is result on a Quad 3.2ghz
By 8 cores at 4.8 Ghz you mean that each core is operating at 4.8 Ghz or that the 8 cores combined yield 4.8 Ghz?

Anyway the fastest hardware which i played Rybka was a pair of Intel Core 2 Extreme QX9775.
User avatar
George Tsavdaris
Posts: 1627
Joined: Thu Mar 09, 2006 12:35 pm

Re: Rybka 3: The dark truth behind the hype

Post by George Tsavdaris »

Uri wrote:
M ANSARI wrote:I would have liked to test these positions on very fast hardware ... either 8 core at 4.8ghz or Quad at 4.1ghz but they are running tournaments now. Here is result on a Quad 3.2ghz
By 8 cores at 4.8 Ghz you mean that each core is operating at 4.8 Ghz or that the 8 cores combined yield 4.8 Ghz?
Each core of course is 4.8 GHz.
8 x 4.8 GHz = 38.4 GHz! Impressive right? :D
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
Nimzovik
Posts: 1831
Joined: Sat Jan 06, 2007 11:08 pm

Re: Rybka 3: The dark truth behind the hype

Post by Nimzovik »

Hmmm......... I had thought that previus versipons of Rybka gave a deliberately inaccurate node count if I remember reading a previous post correctly ...... Is this So with Rybka 3? :?:
User avatar
M ANSARI
Posts: 3734
Joined: Thu Mar 16, 2006 7:10 pm

Re: Rybka 3: The dark truth behind the hype

Post by M ANSARI »

I think much more accurate way of testing speed up of multiple cores is to check the output in KNs at certain depth between the Quad at 3.2 Ghz and the 8 core at 4.8 Ghz I had posted ... you will notice almost 3X speed up at equivalent depth ... which would make me think that it is actually scaling very well. If you try and find time to finding solution you will find that it will vary quite a lot due to the non determenistic characteristics of MP scaling.
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: Rybka 3: The dark truth behind the hype

Post by Zach Wegner »

No! There is only one way to measure speedup: time to depth. Only one. NPS is meaningless! Using this metric Toga has the best possible speedup, when it's about the worst.

The other measure that has any meaning is Elo per core.

Reading this:
"This updated version of Rybka will display node counts which represent the multi-processing efficiency. (They are adjusted to take into account the wasted work.) So, you'll be able to compare the strength of different hardware, ie. you can compare a 4-core machine with a 2-core machine. The one with the better nodes-per-second is better "
...is very disheartening. I wish Vas would just use accurate node counts.

This also makes me worry about Rybka 3's speedup. He could have a terrible parallel algorithm, but he could hide it in the output. Furthermore, since he's made the claim that Rybka 3 will scale the best, now it is quite possible that he has artificially weakened the single processor version so that quad/octal results will look better. I didn't want to think that Vas would do something like this, but now I get the feeling he will...
Dirt
Posts: 2851
Joined: Wed Mar 08, 2006 10:01 pm
Location: Irvine, CA, USA

Re: Rybka 3: The dark truth behind the hype

Post by Dirt »

George Tsavdaris wrote:
Dirt wrote:An MP program generally has to look at more nodes to evaluate a position than an SP program. This is because cutoff values aren't known precisely. I have read that, in order to make comparing hardware strength easier, Rybka doesn't count the extra nodes the MP program searches.

There is also a lot of randomness in time it takes the MP program to find the critical move, so you can't project the speedup with any accuracy.
Yes when comparing time to find critical lines, but what about when comparing nodes per seconds?

According to Vasik he says about the Rybka 2.2n2:
"This updated version of Rybka will display node counts which represent the multi-processing efficiency. (They are adjusted to take into account the wasted work.) So, you'll be able to compare the strength of different hardware, ie. you can compare a 4-core machine with a 2-core machine. The one with the better nodes-per-second is better "


Also i remember him saying that for next versions of Rybka 2.2n2 that(the aforementioned) will be the standard way of measuring speedup efficiency on multi-processors.

So nodes per second should give the true efficiency for Rybka!

So why the nodes per second does show only a 7.5 factor on a Skulltrail with 8 cores 4.8GHz Versus an Athlon 1 CPU 2.2GHz?

If we compare them:
•Athlon is 1 single CPU while Skulltrail is 8 CPUs(cores). So we have a speedup factor of 8x from this.
•But also this Athlon is 2.2GHz while the octal is 4.8 GHz. So we have a speedup factor of about 2x from this.

The "nodes per second" speedup(that Vasik said it should give the efficiency of Rybka) factor is 7.5 as i've said earlier.

If we now remove the second factor of speedup(the different GHz values) we then have:
Theoretical speedup: 1 CPU versus 8 CPUs gives 8x
Actual speedup based on "nodes per second": 1 CPU versus 8 CPUs for Rybka 3 gives a 7.5/2 = 3.75x

So Rybka's speedup from 1 to 8 CPUs seems to be for that position 3.75.
I don't understand why it's so low. What is going on wrong?
That quote from Vas is what I remembered reading. It implies that you should not expect to see the full difference in the hardware reflected in the nps. Getting divided by two is a little worse than I would expect, given the effort Vas put into MP scaling for this release, but not outlandishly so.

Even the raw nps wouldn't scale linearly with the number of processors, for both hardware and software reasons. When you add in the inefficiency of a parallel alpha-beta algorithm it gets pretty bad. Vas is just letting you see this easier, since otherwise you would have to run a large number of positional tests to see what the speedup is.

It's also possible that Vas is being too conservative in his numbers. I don't see any way for him to get the precise number nodes wasted, I'd guess he's just using a factor that he got from testing. Maybe he's even still using the numbers appropriate for Rybka 2.3.2, which would no longer be accurate. That might bd be a good question for the Rybka forum.