I'm still here, and I'll try to give a crude, not too technical explanation of what the two parameters do:zullil wrote:I did indeed leave MNTpSP=5. I also monitored CPU during some initial testing, and indeed each program was at 800% (=8 cores in use) almost always.
If Tord is still around---should I do some benchmarking/testing with larger values of MNTpSP? Crude early testing seemed to indicate that changing this parameter had little effect on nps, compared to changes in MSD. I suppose it can't hurt to try.
In a YBW search, the search at every node in the search tree always starts with a single CPU searching the first move alone. If this move refutes the move directly before it, the search at the current node stops immediately. If not, other CPUs are invited to join the work in searching the remaining moves, if they aren't already busy searching some other part of the tree. A node where the work is shared between two or more CPUs is called a "split point".
Splitting is a very expensive operation. The chess board and numerous other big chunks of data must be copied between all the CPUs cooperating at the split point. It gets even more expensive when the number of CPUs increase, because even more data must be copied.
Because splitting is so expensive, we don't want to do it unless the sub-tree below the current node is so big that we save a significant amount of work by having several CPUs working on the sub-tree. Therefore, we want to avoid splitting to close to the leaves of the tree. The "Minimum Split Depth" parameter controls how close to the leaves splitting is allowed. With the default setting of 4, the program will not try to split when the remaining depth is less than 4 plies. Setting this parameter too high or too low will hurt the performance. If it is too low, the program will spend too much time on copying and synchronization at split points. If it is too high, some of the CPUs will spend too much time being idle and waiting for work to do. It seems reasonable that a higher value is better with a higher number of CPUs, but the optimal values can only be determined by experimentation.
The meaning of the other parameter, "Maximum Number of Threads per Split Point", should now be obvious. I have no idea and no intuition about what the optimal value should be.
I must admit that I am a little less optimistic than Marco about how much can be achieved by simply fine-tuning these two parameters. In order to make the search really efficient on more than 4 CPUs, I think we need to make changes to the actual code.