I don't follow. When thread 2 finishes SP B, I detach it from that SP. If it is the last thread, then one of the threads would have already unsplitted every SP below that one. In my approach I assume that whenever an SP is deallocated every SP below it in the tree is already gone. I also unsplit as soon as the number of processors at an SP drops to 1. So, whenever a processor finishes working on a node, it either unsplits in the case that it is the last one (which only happens when it finishes before it gets the chance to receive the UNSPLIT message), or it detaches from every split point and enters a waiting state after telling the last processor to unsplit. I don't see any more split point usage than in your scheme, just more children on each split point. It seems that the way I do it is similar to the way you do it in Crafty, but handling the split stack and the search stack separately so that the master processor doesn't have to hold onto a split point until all the others are done.bob wrote:I don't like the stack idea although I had originally considered it. The problem is this. A single thread splits at A, then B, then C, then D, then E. And while it is working on E, the other thread working with it at B completes and there is nothing else left to do. I simply clean that up immediately, where in a stack approach you either leave it sitting around which causes wildly excessive split block usage on large systems, or you violate the stack principle and squeeze that entry out instead.
I see the same thing. Every active processor in my scheme has the exact same first split point. The only problem with mine is that I can only see the search state below the split point, because the stack actively being used for each process isn't in shared memory. But I can still see the tree of processors, and which split points each is working for...It is just an issue of data structures to me, and with my approach, it is easy to take a snapshot to see who is doing what while debugging, since any processor/thread can see every split block and dump the output as it exists in that instant... From experience (my first parallel search was done on a Univac 1100/42 in 1978) I wanted to make debugging as easy as possible, which means when something hangs, "somebody" can display the state of the tree, who is working where and for whom, etc...
By the way, this gives me ideas for my interactive debugging facility. Currently it only works with single processors, but I suppose I could change it so that at every debug break point, I could set a "pause" flag. Every child process could then copy its tree state to shared memory and wait for the flag to be cleared. I still have a long way to go with that...