Cluster Rybka

wgarvin · Post by **wgarvin** » Thu Nov 20, 2008 3:29 am

P. Eizenhammer wrote:It took you several years to notice that fruit is a great program, but by looking at a single output of rybka you know what's going on. People will notice that you are just angry and biased, it is soo obvious ...

The simple fact remains: you can show data from crafty all day, and it proves nothing at all about rybka. But this kind of simple fact usually needs about 200 postings and some insults here in this forum and probably the help of a neutral mathematician, want to make a bet?

That was a pretty trollish post. Even if bob were a little bit biased against Rybka (and I'm not saying whether he is), the business of speedup from parallel algorithms has a lot more to do with how AlphaBeta search fundamentally works, than the specific details of any particular engine.

Dr. Hyatt been working on parallel search techniques for decades, so he has lots of experience in this area. If he looks at the output of Cluster Rybka and it clearly indicates to him that Rybka is splitting only at the root, then he is very probably correct (and someone who believes otherwise, should have to bring forward some convincing interpretation of the same data to prove their view).

And if Rybka is splitting only at the root, there is no possible way (regardless of whatever details went into its implementation) for the speedup not to suck. Splitting only at the root just doesn't work.

Eizenhammer · Post by **Eizenhammer** » Thu Nov 20, 2008 7:03 pm

wgarvin wrote: That was a pretty trollish post.

No

wgarvin wrote: Even if bob were a little bit biased against Rybka (and I'm not saying whether he is), the business of speedup from parallel algorithms has a lot more to do with how AlphaBeta search fundamentally works, than the specific details of any particular engine.

Dr. Hyatt been working on parallel search techniques for decades, so he has lots of experience in this area. If he looks at the output of Cluster Rybka and it clearly indicates to him that Rybka is splitting only at the root, then he is very probably correct (and someone who believes otherwise, should have to bring forward some convincing interpretation of the same data to prove their view).

And if Rybka is splitting only at the root, there is no possible way (regardless of whatever details went into its implementation) for the speedup not to suck. Splitting only at the root just doesn't work.

I understand how the chain of arguments is intended to work and just wanted to hint at the fact that it is flawed, one either understands this or not.
In the end there is the simple argument: Look, what I do does not work with crafty, so what he does with rybka cannot work either. By summing up unproved assumptions you can prove everything, and this statement is ridiculous as an argument.
Rybkas gain by the cluster may suck or not, I don't know, but crafty's results are totally independent from it: this is really easy and should not be argued about at all.

bob · Post by **bob** » Thu Nov 20, 2008 7:55 pm

Eizenhammer wrote:
wgarvin wrote: That was a pretty trollish post.
No
wgarvin wrote: Even if bob were a little bit biased against Rybka (and I'm not saying whether he is), the business of speedup from parallel algorithms has a lot more to do with how AlphaBeta search fundamentally works, than the specific details of any particular engine.

Dr. Hyatt been working on parallel search techniques for decades, so he has lots of experience in this area. If he looks at the output of Cluster Rybka and it clearly indicates to him that Rybka is splitting only at the root, then he is very probably correct (and someone who believes otherwise, should have to bring forward some convincing interpretation of the same data to prove their view).

And if Rybka is splitting only at the root, there is no possible way (regardless of whatever details went into its implementation) for the speedup not to suck. Splitting only at the root just doesn't work.
I understand how the chain of arguments is intended to work and just wanted to hint at the fact that it is flawed, one either understands this or not.
In the end there is the simple argument: Look, what I do does not work with crafty, so what he does with rybka cannot work either. By summing up unproved assumptions you can prove everything, and this statement is ridiculous as an argument.
Rybkas gain by the cluster may suck or not, I don't know, but crafty's results are totally independent from it: this is really easy and should not be argued about at all.

Your statement is simply wrong, and more than one has tried to explain why. Alpha/beta is _well_ understood. You can look at the Knuth/Moore paper to understand exactly how it works and why. Then you can find the only really mathematical analysis I have ever done and published in a paper in the Journal of Parallel Computing, which took the Knuth/Moore analysis and stretched it into a parallel search analysis. And splitting alpha/beta trees at the root is bad for _all_ alpha/beta tree searchers. Now if you want to offer proof that Rybka doesn't use alpha/beta, then you have a leg to stand on. But we already know it does alpha/beta, and it doesn't matter whether it is an alpha/beta tree for chess, checkers, go, othello, or you-name-it, the parallel algorithm has to conform to the restrictions imposed by alpha/beta.

Splitting at the root provides little if any speedup. I posted recent numbers. I posted such numbers in 1983. Monty Newborn posted such numbers back in the late 1970's. This is not new. It is well-understood. It just doesn't work very well. Certainly not to the +100 Elo claimed which means a factor of 4 using 5 cluster nodes. Not going to happen. Getting a factor of 2 is almost impossible with today's narrow branching factor.

So you may as well drop your misinformed argument about what works or doesn't work in Crafty doesn't mean a thing about Rybka. If you were talking about evaluation, or about search methodology, I would agree. But parallel search is parallel search, there is ample math to support this.

Mark · Post by **Mark** » Thu Nov 20, 2008 9:20 pm

Since splitting at the root doesn't help much, what's the best way to make use of a cluster? Would the best method, whatever it is, be generally the same for most programs?

bob · Post by **bob** » Thu Nov 20, 2008 11:28 pm

Mark wrote:Since splitting at the root doesn't help much, what's the best way to make use of a cluster? Would the best method, whatever it is, be generally the same for most programs?

In general, the best approach is to do just what we do for normal parallel machines, but you have to limit cluster "splitting" to something closer to the root since the inter-process communication is orders of magnitude slower (on a normal box, shared memory is essentially instantaneous communication, while on a cluster even infiniband has a latency that hurts and we can't share everything we would like to share (such as hash tables)).

DB used a two-level search, they split in the first N plies across their cluster, and in the last M plies between the chess processors... It is very much a non-trivial thing to make work, much less work well...

mhull · Post by **mhull** » Fri Nov 21, 2008 4:59 pm

bob wrote:
Mark wrote:Since splitting at the root doesn't help much, what's the best way to make use of a cluster? Would the best method, whatever it is, be generally the same for most programs?
In general, the best approach is to do just what we do for normal parallel machines, but you have to limit cluster "splitting" to something closer to the root since the inter-process communication is orders of magnitude slower (on a normal box, shared memory is essentially instantaneous communication, while on a cluster even infiniband has a latency that hurts and we can't share everything we would like to share (such as hash tables)).

DB used a two-level search, they split in the first N plies across their cluster, and in the last M plies between the chess processors... It is very much a non-trivial thing to make work, much less work well...

Besides the IBM team, have any chess projects optimally exploited a cluster to a significant degree?

bob · Post by **bob** » Fri Nov 21, 2008 5:53 pm

mhull wrote:
bob wrote:
Mark wrote:Since splitting at the root doesn't help much, what's the best way to make use of a cluster? Would the best method, whatever it is, be generally the same for most programs?
In general, the best approach is to do just what we do for normal parallel machines, but you have to limit cluster "splitting" to something closer to the root since the inter-process communication is orders of magnitude slower (on a normal box, shared memory is essentially instantaneous communication, while on a cluster even infiniband has a latency that hurts and we can't share everything we would like to share (such as hash tables)).

DB used a two-level search, they split in the first N plies across their cluster, and in the last M plies between the chess processors... It is very much a non-trivial thing to make work, much less work well...
Besides the IBM team, have any chess projects optimally exploited a cluster to a significant degree?

Sun Phoenix by Jonathan Schaeffer, although the number of nodes was limited. There were a couple of cluster competitors in previous WCCC events, although the names elude me...

mhull · Post by **mhull** » Fri Nov 21, 2008 5:57 pm

bob wrote:
mhull wrote:
bob wrote:
Mark wrote:Since splitting at the root doesn't help much, what's the best way to make use of a cluster? Would the best method, whatever it is, be generally the same for most programs?
In general, the best approach is to do just what we do for normal parallel machines, but you have to limit cluster "splitting" to something closer to the root since the inter-process communication is orders of magnitude slower (on a normal box, shared memory is essentially instantaneous communication, while on a cluster even infiniband has a latency that hurts and we can't share everything we would like to share (such as hash tables)).

DB used a two-level search, they split in the first N plies across their cluster, and in the last M plies between the chess processors... It is very much a non-trivial thing to make work, much less work well...
Besides the IBM team, have any chess projects optimally exploited a cluster to a significant degree?
Sun Phoenix by Jonathan Schaeffer, although the number of nodes was limited. There were a couple of cluster competitors in previous WCCC events, although the names elude me...

Maybe Star Socrates, by Don Dailey?

Ovyron · Post by **Ovyron** » Fri Nov 21, 2008 10:11 pm

Or GridChess?

George Tsavdaris · Post by **George Tsavdaris** » Fri Nov 21, 2008 10:31 pm

bob wrote:Elo is a measure, but not for parallel search, so I can't tell a thing about how the thing scales in terms of 1, 2 4 and 8 processors...

But isn't the measure of ELO(and especially against a wide range of engines like CCRL,CEGT plays) the most important/useful way of measuring the effectiveness of an engine from X to Y CPUs?

Ultimately the ELO(against a wide range of opponents) gain is what we are interested for.

Cluster Rybka

Re: Cluster Rybka (real data)

Re: Cluster Rybka (real data)

Re: Cluster Rybka (real data)

Re: Cluster Rybka (real data)

Re: Cluster Rybka (real data)

Re: Cluster Rybka (real data)

Re: Cluster Rybka (real data)

Re: Cluster Rybka (real data)

Re: Cluster Rybka (real data)

Re: Cluster Rybka