Cluster Rybka

bob · Post by **bob** » Sat Nov 22, 2008 12:01 am

[quote="Ovyron"]Or GridChess?[/quote
And Pconnors. And a couple of others. Using a real cluster, with message passing, is a challenge. A _big_ challenge.

swami · Post by **swami** » Sat Nov 22, 2008 2:48 am

bob wrote:
Ovyron wrote:Or GridChess?[/quote
And Pconnors. And a couple of others. Using a real cluster, with message passing, is a challenge. A _big_ challenge.
What about Brutus in Graz? Hydra?

M ANSARI · Post by **M ANSARI** » Sat Nov 22, 2008 7:49 am

Although I am reading "splitting at the root" a lot I am curious to know if this means that this is simply the master doing a multi variation and throwing out the variations to independent equivalently endowed engines. I had long thought of that as being one possible way of doing a cluster ... and when manually done I find that some engines work much better than others. I guess the critical part is that the master engine that decides what the MV are must be quite accurate otherwise it might miss an important variation. When you do a multi variation analysis on some positions, the much reduced analysis sometimes finds much better moves than the single PV ... but then again I would guess that the single PV engine would look much deeper and sometimes be able to outsearch the multi PV version.

So what if you have one single PV master working passively ... then you have another identical system doing multi variations and feeding those variations out to different slaves to search them single PV. Once one of the slaves gets a hit it can then send this move out to the Master. This communication silence would only be broken if a certain pre-determined trigger is reached. With this setup you most definetely will never have the master playing worse than without the help of its slaves ... but am not sure how much stronger it will play. What is absolutely certain though is that the search tree will be dramatically more robust and should in theory play quite a bit stronger. While this probably is not the most efficient way to take advantage of all the extra cores, it does circumvent the need of trying to figure a way to overcome all the latencies and headaches and bugs of trying to synchronize massively parallel search.

In this game from Lieden ... Joker vs. Rybka cluster we have this position

[Event "28th Dutch CC"]
[Site "Leiden NED"]
[Date "2008.11.16"]
[Round "9"]
[White "Joker"]
[Black "Rybka"]
[Result "0-1"]
[ECO "B43"]
[PlyCount "158"]
[EventDate "2008.??.??"]

1. e4 c5 2. Nf3 e6 3. Nc3 a6 4. Be2 Qc7 5. O-O b5 6. d4 cxd4 7. Nxd4 Bb7 8. Bf3 Nc6 9. Nxc6 dxc6 10. e5 Qxe5 11. Re1 Qc7 12. Ne4 Rd8 13. Qe2 Be7 14. Bh5 Bc8 15. Qf3 Nf6 16. Nxf6+ Bxf6 17. Bf4 Qb7 18. Qa3 b4 19. Qb3 O-O 20. Bf3 Qb5 21. Be2 Qc5 22. Be3 Bd4 23. Bd3 a5 24. a3 Bxe3 25. Rxe3 Rd4 26. axb4 axb4 27. Rh3 g6 28. Re3 Qd6 29. Re4 Bb7 30. Rxd4 Qxd4 31. Bf1 c5 32. c3 Qf4 33. Ra7 Bd5 34. Qa4 Qd2 35. c4 Be4 36. Qa1 b3 37. Qa3 Qc2 38. Rd7 Ra8 39. Qxc5 Qxb2 40. Qe7 Rf8 41. Qb4 Qa2 42. Rd2 Ra8 43. Qc3 Qa5 44. Qb2 Bc2 45. Rxc2 bxc2 46. Qxc2 Qe1 47. Qe2 Qxe2 48. Bxe2 Ra1+ 49. Bf1 Kf8 50. h4 e5 51. f3 f5 52. Kf2 Ke7 53. Bd3 Kd6 54. h5 Ra2+ 55. Kf1 Rd2 56. Be2 gxh5 57. f4 e4 58. Bxh5 Kc5 59. g3 e3 60. Be2 Kd4 61. g4 Rxe2 62. Kxe2 fxg4 63. c5 Kxc5 64. Kxe3 Kd5 65. Kd3 h5 66. Ke3 h4 67. f5 Ke5 68. f6 Kxf6 69. Kf2 Kg5 70. Ke2 h3 71. Kf1 g3 72. Kg1 Kf4 73. Kf1 Ke3 74. Ke1 h2 75. Kd1 Kd3 76. Kc1 h1=Q+ 77. Kb2 Qb7+ 78. Kc1 Qb6 79. Kd1 Qg1#
0-1

Unfortunately I don't have a chess program loaded in the computer I am posting from, but if you look at move 61. g4

Here Rybka cluster played Rxe2! ... while there are other winning moves this particular move shows a little of how the cluster works. I have an 8 core Skulltrail running at 4.2 Ghz (which is similar to the computers used in the cluster) and the only way my Rybka 3 plays this move quickly is if I put MV mode at 2 or 3 variations. Once I do that it quickly sees that Rxe2! ends resistance much quicker. So in this case I would speculate that one of the slave Rybka's that was fed Rxe2! to analyze quickly reached a communication trigger point and fed that move to the Master which then picked up the move.

It would be great if someone could write a script or driver that could do exactly that. Force the UCI engine output to say x variations ... then have those variations fed to x motherboards with a pre set trigger communication silence. This way many people could try out different setups and experiment.

M ANSARI · Post by **M ANSARI** » Sat Nov 22, 2008 7:50 am

nt/ double post

bob · Post by **bob** » Sun Nov 23, 2008 7:53 am

M ANSARI wrote:Although I am reading "splitting at the root" a lot I am curious to know if this means that this is simply the master doing a multi variation and throwing out the variations to independent equivalently endowed engines. I had long thought of that as being one possible way of doing a cluster ... and when manually done I find that some engines work much better than others. I guess the critical part is that the master engine that decides what the MV are must be quite accurate otherwise it might miss an important variation. When you do a multi variation analysis on some positions, the much reduced analysis sometimes finds much better moves than the single PV ... but then again I would guess that the single PV engine would look much deeper and sometimes be able to outsearch the multi PV version.

So what if you have one single PV master working passively ... then you have another identical system doing multi variations and feeding those variations out to different slaves to search them single PV. Once one of the slaves gets a hit it can then send this move out to the Master. This communication silence would only be broken if a certain pre-determined trigger is reached. With this setup you most definetely will never have the master playing worse than without the help of its slaves ... but am not sure how much stronger it will play. What is absolutely certain though is that the search tree will be dramatically more robust and should in theory play quite a bit stronger. While this probably is not the most efficient way to take advantage of all the extra cores, it does circumvent the need of trying to figure a way to overcome all the latencies and headaches and bugs of trying to synchronize massively parallel search.

In this game from Lieden ... Joker vs. Rybka cluster we have this position

[Event "28th Dutch CC"]
[Site "Leiden NED"]
[Date "2008.11.16"]
[Round "9"]
[White "Joker"]
[Black "Rybka"]
[Result "0-1"]
[ECO "B43"]
[PlyCount "158"]
[EventDate "2008.??.??"]

1. e4 c5 2. Nf3 e6 3. Nc3 a6 4. Be2 Qc7 5. O-O b5 6. d4 cxd4 7. Nxd4 Bb7 8. Bf3 Nc6 9. Nxc6 dxc6 10. e5 Qxe5 11. Re1 Qc7 12. Ne4 Rd8 13. Qe2 Be7 14. Bh5 Bc8 15. Qf3 Nf6 16. Nxf6+ Bxf6 17. Bf4 Qb7 18. Qa3 b4 19. Qb3 O-O 20. Bf3 Qb5 21. Be2 Qc5 22. Be3 Bd4 23. Bd3 a5 24. a3 Bxe3 25. Rxe3 Rd4 26. axb4 axb4 27. Rh3 g6 28. Re3 Qd6 29. Re4 Bb7 30. Rxd4 Qxd4 31. Bf1 c5 32. c3 Qf4 33. Ra7 Bd5 34. Qa4 Qd2 35. c4 Be4 36. Qa1 b3 37. Qa3 Qc2 38. Rd7 Ra8 39. Qxc5 Qxb2 40. Qe7 Rf8 41. Qb4 Qa2 42. Rd2 Ra8 43. Qc3 Qa5 44. Qb2 Bc2 45. Rxc2 bxc2 46. Qxc2 Qe1 47. Qe2 Qxe2 48. Bxe2 Ra1+ 49. Bf1 Kf8 50. h4 e5 51. f3 f5 52. Kf2 Ke7 53. Bd3 Kd6 54. h5 Ra2+ 55. Kf1 Rd2 56. Be2 gxh5 57. f4 e4 58. Bxh5 Kc5 59. g3 e3 60. Be2 Kd4 61. g4 Rxe2 62. Kxe2 fxg4 63. c5 Kxc5 64. Kxe3 Kd5 65. Kd3 h5 66. Ke3 h4 67. f5 Ke5 68. f6 Kxf6 69. Kf2 Kg5 70. Ke2 h3 71. Kf1 g3 72. Kg1 Kf4 73. Kf1 Ke3 74. Ke1 h2 75. Kd1 Kd3 76. Kc1 h1=Q+ 77. Kb2 Qb7+ 78. Kc1 Qb6 79. Kd1 Qg1#
0-1

Unfortunately I don't have a chess program loaded in the computer I am posting from, but if you look at move 61. g4

Here Rybka cluster played Rxe2! ... while there are other winning moves this particular move shows a little of how the cluster works. I have an 8 core Skulltrail running at 4.2 Ghz (which is similar to the computers used in the cluster) and the only way my Rybka 3 plays this move quickly is if I put MV mode at 2 or 3 variations. Once I do that it quickly sees that Rxe2! ends resistance much quicker. So in this case I would speculate that one of the slave Rybka's that was fed Rxe2! to analyze quickly reached a communication trigger point and fed that move to the Master which then picked up the move.

It would be great if someone could write a script or driver that could do exactly that. Force the UCI engine output to say x variations ... then have those variations fed to x motherboards with a pre set trigger communication silence. This way many people could try out different setups and experiment.

What it does is divide the root move list up into N batches, and gives each batch to a different node. The nodes search just the batch of moves it was given, until time is up. You get each node searching to a different depth, which is the first problem. The other more serious issue is that this doesn't speed up the search although it does give you five PVs to look at, although they won't be the "best 5" by any measure...

M ANSARI · Post by **M ANSARI** » Sun Nov 23, 2008 8:42 am

Yes the 5 variations could change but that can be accounted for and Rybka is especially good at choosing relevant moves quite quickly. I guess you could sort change any variation that becomes out of favor with the motherboard that is giving out the variations on the fly. What is certain is that the node that the Master computer is searching will remain there and thus there will be no reduction in the strength of the original Master computer. I have looked at many of the cluster games in Lieden and this is quite apparent ... I would guess that 90% or more of the moves are the Master computer alone, thus a very small percentage of moves are changed. But in top level chess one move that gives a tiny increase in evaluation could be the difference in changin a drawn game into a won game. I don't know if that is worth 100 ELO points but in blitz that would be very easy to get I would think. As for long time control I doubt the 100 ELO gain will hold up.

Dave Gomboc · Post by **Dave Gomboc** » Tue Mar 24, 2009 6:51 am

bob wrote:I saw his output. There is one, and only one way that output can be produced. By splitting only at the root. That is a poor parallel search approach, and it is poor regardless of the program using it. It's been discussed for 30 years, and dismissed for 30 years. The data I showed explains why.

But there is absolutely _zero_ doubt about how his cluster search works. And I do mean _zero_.

I have not seen any cluster rybka output: I'm obviously coming into this thread very late. My question is if the output you saw might be explained by searching using a different evaluation function on each processor (ignoring the issue of how a move is selected from the five searches)?

Rolf · Post by **Rolf** » Tue Mar 24, 2009 6:34 pm

bob wrote:
Eizenhammer wrote:
wgarvin wrote: That was a pretty trollish post.
No
wgarvin wrote: Even if bob were a little bit biased against Rybka (and I'm not saying whether he is), the business of speedup from parallel algorithms has a lot more to do with how AlphaBeta search fundamentally works, than the specific details of any particular engine.

Dr. Hyatt been working on parallel search techniques for decades, so he has lots of experience in this area. If he looks at the output of Cluster Rybka and it clearly indicates to him that Rybka is splitting only at the root, then he is very probably correct (and someone who believes otherwise, should have to bring forward some convincing interpretation of the same data to prove their view).

And if Rybka is splitting only at the root, there is no possible way (regardless of whatever details went into its implementation) for the speedup not to suck. Splitting only at the root just doesn't work.
I understand how the chain of arguments is intended to work and just wanted to hint at the fact that it is flawed, one either understands this or not.
In the end there is the simple argument: Look, what I do does not work with crafty, so what he does with rybka cannot work either. By summing up unproved assumptions you can prove everything, and this statement is ridiculous as an argument.
Rybkas gain by the cluster may suck or not, I don't know, but crafty's results are totally independent from it: this is really easy and should not be argued about at all.
Your statement is simply wrong, and more than one has tried to explain why. Alpha/beta is _well_ understood. You can look at the Knuth/Moore paper to understand exactly how it works and why. Then you can find the only really mathematical analysis I have ever done and published in a paper in the Journal of Parallel Computing, which took the Knuth/Moore analysis and stretched it into a parallel search analysis. And splitting alpha/beta trees at the root is bad for _all_ alpha/beta tree searchers. Now if you want to offer proof that Rybka doesn't use alpha/beta, then you have a leg to stand on. But we already know it does alpha/beta, and it doesn't matter whether it is an alpha/beta tree for chess, checkers, go, othello, or you-name-it, the parallel algorithm has to conform to the restrictions imposed by alpha/beta.

Splitting at the root provides little if any speedup. I posted recent numbers. I posted such numbers in 1983. Monty Newborn posted such numbers back in the late 1970's. This is not new. It is well-understood. It just doesn't work very well. Certainly not to the +100 Elo claimed which means a factor of 4 using 5 cluster nodes. Not going to happen. Getting a factor of 2 is almost impossible with today's narrow branching factor.

So you may as well drop your misinformed argument about what works or doesn't work in Crafty doesn't mean a thing about Rybka. If you were talking about evaluation, or about search methodology, I would agree. But parallel search is parallel search, there is ample math to support this.

Bob perhaps I can explain what is going on although I dont even know what splitting at the root and above this in acluster - really is!

You are talking about your only really mathematical analysis 25 years ago and you obviously claim that even today that holds water and is believed in CC like sort of natural constants.

Is it absolutely impossible that actually such constants are different? How can you prove that?

The reasoning by Peter E. sounds better and besides personal feelings I would also say that you believe in something that cant be the same after 25 years and for different hardware.

What you are doing as a teacher that is ok, you ask "if thzere is something new then please he should step forward and explain."

But in the meantime you cant just wave hands.

Let me please repeat what I would be able to imagine, but it's really my last idea. Please explain this to me.

How sure can you be that the output of such a cluster shows today in 2009 what he's really doing? If I were Vas I would know what you know and how I could play so that you see exactly what you believe and expect to see.

Why dont you work in a more opened style where even your data cant be taken for the njudgement of new stuff?

What exactly brought you to the point that you act like those astronoms who showed what's absolutely impossible to exist?

I write that after having read the thread backwards to page 8.

mhull · Post by **mhull** » Tue Mar 24, 2009 6:57 pm

Rolf wrote: But in the meantime you cant just wave hands.

One of these days in the not too distant future, cluster crafty will play cluster rybka. Then we'll see who's been waving their hands.

bob · Post by **bob** » Tue Mar 24, 2009 6:58 pm

Dave Gomboc wrote:
bob wrote:I saw his output. There is one, and only one way that output can be produced. By splitting only at the root. That is a poor parallel search approach, and it is poor regardless of the program using it. It's been discussed for 30 years, and dismissed for 30 years. The data I showed explains why.

But there is absolutely _zero_ doubt about how his cluster search works. And I do mean _zero_.
I have not seen any cluster rybka output: I'm obviously coming into this thread very late. My question is if the output you saw might be explained by searching using a different evaluation function on each processor (ignoring the issue of how a move is selected from the five searches)?

No. based on the most obvious case where Crafty captured Rybka's queen, and Rybka had only one way to recapture. So one cluster node was kibitzing normal-looking scores, while the others were kibitzing scores of -9.0 and worse. When you split at the root and only one of the moves is good, the other nodes are obviously going to get bad scores because they are all a queen down.

Cluster Rybka

Re: Cluster Rybka (real data)

Re: Cluster Rybka (real data)

Re: Cluster Rybka (real data)

nt/ double post

Re: Cluster Rybka (real data)

Re: Cluster Rybka (real data)

Re: Cluster Rybka (real data)

Re: Cluster Rybka (real data)

Re: Cluster Rybka (real data)

Re: Cluster Rybka (real data)