I have tried to explain that to others for years (splitting at the root). I split everywhere, and particularly like splitting at the root (once I work my way back to the root) since there is almost zero overhead in such searches (there is always critical hash information that might or might not be shared from one move to the next depending on how things get done in time.Vasik Rajlich wrote:That's right. You'll need to consider whether you want to change the shape of your search tree to maximize the stability of the work distribution (ie. the degree to which the same cluster units handle the same parts of the search tree).bob wrote:That's what I thought (scout search).Vasik Rajlich wrote:Aha, thanks. Like many things, this won't work without shared memory. Without shared memory, if 'processor A' (to use your terminology) takes moves x at some iteration, it needs to take the same move at the next iteration as well.bob wrote:What we did back then was more dynamic that that. I had the root move list sorted just as I do today (previous iteration node count). Processor 1 searched the first move with a normal aspiration window. Processor 2 did the same. When one of them reported back with a new best move, the alpha/beta value was shipped to the other processor, which would take a short "break", update all the bounds, and then check to see if anything will now cutoff where it didn't when we started. We then searched the root moves one at a time, whenever a processor finished a move it grabbed another (dynamic load balancing) until the list was gone.Vasik Rajlich wrote:Actually, I'd be curious to hear more about it. How did you split the root moves? Which moves got PV searches and which ones got scout (null-window) searches? What beta did you use for the scout searches?bob wrote:
I _did_ an algorithm like that in 1983. I put it together in under 2 weeks and won the 1983 WCCC championship with it. And I _know_ how it works and how the PVs look.
I couldn't find anything useful in the public domain about splitting work without shared memory, except for the Cluster Toga algorithm, which wasn't really suitable for Rybka (among other things, it doesn't apply to analysis and would be useless for users).
We always did an aspiration-type search in CB, where the root window was very narrow, something like +/- 0.25 pawns, for the first move. At the time we were using PVS (this was something from the late 70's and was used by belle and ourselves as well as others) which is (I assume) what you mean by "scout" (which is not quite the same as PVS but close enough for discussion).
I should add that both the univac 1100 and Cray were both shared-memory. I'm looking at cluster-based search right now because of this 70 node 8-core cluster we have. I can search about 20M nodes per sec on one node, which gives a potential for something faster than deep blue (we could possibly hit 1400M nodes per second although there are great issues to overcome. But Hsu and I did talk a lot about his parallel search and I like his two-level approach (split at early plies among the nodes, each node splits at late plies among the local processors)...
Basically, without shared memory, you want to have the same units processing the same parts of the tree throughout the entire search. I agree with Hsu's comment - without shared memory, work needs to be split at or near the root.
ps. I use the term "scout search" to refer to any search with a zero-width window.
The way I am looking at splitting on a cluster is to use a "split hash table". I store the hash signature and node ID when I split, so that the same node searches the same sub-tree each time to take advantage of the hash, killers, etc. There are issues with this as the next iteration can greatly alter the shape of the tree and give one node a set of moves that are much harder to search than others, producing an imbalance. I'm working on sorting out the details for this kind of stuff as the code gets written...
This is where root splitting is special. Root moves have the unique property that they are guaranteed to require searching.
Of course, root splitting alone isn't the answer.
I'm working on the "two-level" approach, although I am also considering the more complicated issues encountered on a cluster that has NUMA (AMD) nodes. Because you also have issues on a single node dealing with trying to maximize local memory accesses...
But my current cluster approach (good way from working yet however) is to split everywhere (although not too far from the root due ot the overhead of a cluster split even with infiniband interconnect) but try to repeat with respect to which node searches which position so that at least the hash info they have will be as helpful as possible, even though it is way less efficient than a pure shared-memory "share everything" search.
I have seen reluctance to split at the root for many, because it adds another level of race conditions since the root move has to be updated...