threads vs processes

bob · Post by **bob** » Fri Jul 18, 2008 5:56 pm

LoopList wrote:In order to get rid of the "board" pointer I implemented parallel processes. The engine became 10% faster in 32-bit mode and even 5% faster in 64-bit mode. But starting and terminating multiple slave processes is quite more complicated and ugly. Therefore the thread-powered engine with the additional "board" pointer was easier to develop and cleaner to tune. Furthermore, by using the "board" pointer I had the possibility to organise the "board"-class with private and public members. Passing "board" via const pointer enabled further security enhancements!

And DTS-like techniques are much too comlicated to be based on processes - communication overhead will increase compared to multiple threads. At least my research had shown me these results, perhaps I am wrong...

Fritz (Loop)

I don't see why there should be any difference. Anything you can do in threads, you can do in processes. Shared memory for whatever you want is included.

The only way I can see to eliminate the board pointer is to overuse or misuse processes so that at every split point you create new processes to search. that's horribly for performance. I create processes (or threads) exactly one time and just re-use them over and over (process/thread pool concept) to avoid the fork()/clone() overhead, so that a process can split with itself and others and then use a different board structure after the split...

The board pointer didn't cost anything to speak of when I first added it to Crafty when I started the parallel search stuff. And in 64 bit mode with 8 extra registers, the cost is even lower.

Zach Wegner · Post by **Zach Wegner** » Fri Jul 18, 2008 6:09 pm

bob wrote:The only way I can see to eliminate the board pointer is to overuse or misuse processes so that at every split point you create new processes to search. that's horribly for performance. I create processes (or threads) exactly one time and just re-use them over and over (process/thread pool concept) to avoid the fork()/clone() overhead, so that a process can split with itself and others and then use a different board structure after the split...

Huh? I don't use a board pointer, but I use a process pool. It's really simple actually, the split point copies the move list from the root to the split point, and then the child processor makes them. As the split points are usually close to the root, there's pretty low overhead, and a low copying cost. Before I did that, I copied the board to the split point, and then the children copied it again into local memory. That's obviously less efficient than your approach, which is why I changed to move lists. Move lists are also friendlier when you get to message-passing architectures, if I ever do...

bob · Post by **bob** » Fri Jul 18, 2008 7:43 pm

Zach Wegner wrote:
bob wrote:The only way I can see to eliminate the board pointer is to overuse or misuse processes so that at every split point you create new processes to search. that's horribly for performance. I create processes (or threads) exactly one time and just re-use them over and over (process/thread pool concept) to avoid the fork()/clone() overhead, so that a process can split with itself and others and then use a different board structure after the split...
Huh? I don't use a board pointer, but I use a process pool. It's really simple actually, the split point copies the move list from the root to the split point, and then the child processor makes them. As the split points are usually close to the root, there's pretty low overhead, and a low copying cost. Before I did that, I copied the board to the split point, and then the children copied it again into local memory. That's obviously less efficient than your approach, which is why I changed to move lists. Move lists are also friendlier when you get to message-passing architectures, if I ever do...

Apples and oranges. You are using an iterated search. Which is in effect using a pointer since you have all the references to data[ply] as I did in Cray Blitz. So you tie up a register using a ply index, or you tie up a register with the pointer to the local search data. Don't see how it makes any difference performance-wise. I've done it both ways and as I said, I don't see any inherent advantage in using either approach (processes or threads). My current data structures work just fine with either method. Threads are a little simpler since everything is shared and there is no mucking around with shared memory objects, but that was a tiny bit of code in the previous version... the pointer to the local memory is not a performance impediment as measured in the old crafty with and without the pointer when I did the initial parallel search there.

But iterated or recursive has nothing to do with threads vs processes. I used both in Cray Blitz and in Crafty with no problems of any kind... The only reason I converted back to threads was to be compatible with how windows does things so that I could make the "smpnice" facility work in both systems without some additional ugly #if defined()'s...

As far as how far from the root you split, that's a matter of tuning. I split at the root, and can split all the way out to the tips. I restrict the latter a bit to avoid too much thrashing/splitting for tiny search trees where the split overhead might be larger than the time required to search the small tree itself.

threads vs processes

Re: threads vs processes

Re: threads vs processes

Re: threads vs processes