Actually, the implementation I am using is much simpler than you are describing. The *only* communication between threads is through the various hash tables and a flag to tell the threads when to stop.
I don't know what you are talking about now. Didn't you say you divide up the root move list??? The 1st processor searches the 1st,3rd,5th move...etc , and the second processor 2nd,4th,6th...etc. Those were your words not mine. Then I suggested that you use a lock to share the work as that will allow any processor to work on any move. Hardly rocket science. This is what you
said
The 3rd and 4th threads are actually searching iteration depth+1, again with even/odd threads searching the even/odd moves at the root, which is why the average depth searched is a bit more than 12 in those cases. In principle I might speed the whole thing up further by having each of 3 threads search 1/3rd of the root moves and each of 4 threads searching 1/4th of the root moves, but that is more work to implement, and I haven't tried it yet.
So when I start a new iteration, I send the same alpha beta limits (+/- 15 from the previous iteration score) and same root move list to each thread. The threads each search independently using a normal PVS algorithm from that point onward until one of the threads either fails high or has searched all the root moves.
You are seriously confused sorry. I think you should back up read what you said before read what you are saying now and judge for yourself. You should really see by now how difficult it is without the YBW criteria. And please don't pretend I am doing some rocket science when I am not...