zamar wrote:Hi Bob,
bob wrote:
#1 is not quite so simple. If you split below a split point, you still incur about the same amount of overhead. This is a sort of similarity one might find with speculative execution in a modern CPU. Remember that the only overhead you incur is the overhead searched by this new thread, if the split turns out to be wrong. It really doesn't matter where it happens.
You missed the point. If the thread is searching under a splitpoint which is under existing splitpoint which is under existing split point, and so on (called spLevel in SF), you are speculating over speculations over speculations. If any of those speculations in the chain turn out to be wrong, the work of the thread is lost. Just don't do that.
From a real life:
- If an idiot helps a wise man, something good can still result from it.
- If an idiot helps an idiot who helps an idiot who helps an idiot who helps a wice man, nothin good comes out of it.
#2 is also not quite so simple. Better to have 8 threads at a sure ALL node than having 4 at that node and the other 4 somewhere else. One condition I didn't see being checked was one of the conditions I used in the DTS algorithm for Cray Blitz, which was "work left to be done at a split point." Even in Crafty I have been measuring "futile splits" which can happen in multiple ways.
Well, there is no such a thing as a sure ALL node, but in in principle I agree. An ideal splitting algorithm would take the node type and the count of already searched moves into account as this affects the probability of cut-off.
(a) a "late joiner" joins but does nothing because all work has already been completed or scheduled by other threads and is in progress;
SF sets a flag when there is no more work to be done.
(b) N threads split but only N-m have work to do due to a small branching factor at that node (this is particularly bad in endgames and is why I still use a minimum thread group as I did in Cray Blitz to avoid having too many threads at one point.)
In SF this is controlled by minimum split depth and max slaves per split point parameters.
I understood the spLevel idea. You missed what I wrote. We are talking about a single thread. Wherever you stick it, if you are wrong its effort will be wasted. If you stick it below one or more split points, yes if one of those is wrong then this will waste effort. But the probability of one of those split points being bad is no worse than the probability of a brand new split point being bad. So no matter where you stick this thread, all you risk is making its effort wasted.
When I worked on my dissertation, using up to 30 CPUs at the time, I didn't have much luck trying to take advantage of unpredictable behavior. The obvious idea of trying to split near the root is problematic, because any mis-ordering can turn that ALL node into a CUT node in unpredictable ways. The only really good split point is the root since that is a guaranteed ALL node (PV nodes are also ALL).
But there are so many issues to deal with and so many solutions, that doing things automatically is hard. You have to be very careful with cache management and how you share things to avoid all sorts of cache invalidation/forwarding traffic, memory bandwidth on a single chip is limited, you don't want to have a thread bounce to a different CPU so processor affinity is important, and then there are other things starting with NUMA/memory allocation.
I've tried repeatedly to do something automatic in regard to setting things up optimally. No luck so far, because there is too much variability in the hardware. I tested on a dual 8 core a few weeks back and it really ran well. I tested on a dual 12 core a little later and the plain NPS scaling was bad and took a lot of tweaking to get right. I see this almost every time I run on something newer...