bob wrote:This:
Your conclusion b) is false. You could not search those nodes in any obvious way in a non-HT environment, because those nodes were searched using the 15% nps increase.
So explain how to search extra nodes in the same time on the same hardware. Emulating two threads by a signle thread through interleaving isn't going to do that.
Do you REALLY think that by reducing the pruning or reductions, one can gain 5%? If so, why not just modify the reduction/pruning code to reduce or prune less in those same places where you would normally do a parallel split.
Because then you lose nodes=time by reducing/pruning less.....
The argument is that with the highly selective searches of today, a parallel search will be somewhat less selective. Some reductions won't be triggered. This would also explain why parallel search overhead today is higher than in the past, as you insist it is.
Of course you can try to simulate this in a serial search by not triggering some reductions, but the result will just be that you search more nodes, so take more time. If the programmer has tested his reductions well, turning them off will be a bad trade off. The reductions were not put in for nothing. But if HT turns them off "for free", or at least at lesser cost, then this is a factor that may help.
I don't like this "guesswork" approach about "the extra width or nodes might help..."
I am just offering a possible explanation for why it is not completely unthinkable that HT could work even if the nps speed increase might not completely outweigh the extra nodes searched due to parallel overhead.
One way to test it is as follows. Let an engine play itself using fixed depth searches, one side searching with 4 threads and the other side searching with 8 threads. (This should be done on a machine with at least 4 cores and HT on, because 8 threads with just 4 hardware threads might give quite different trees.) If the engine using 8 threads clearly outperforms the same engine using 4 threads, that proves that the extra nodes of the 8-thread searches add to the quality of the search.
Of course even if this tests shows that 8-thread searches to the same depth are somehow of higher quality, this does not mean that HT is a win. But it would show there is some effect that should not be overlooked.
I am not convinced that the effect exists, but I am also not convinced that it does not exist.
One could always test this easily enough. Take a program that does not use spin locks, and play it against a gauntlet, using N cores and N threads, and then again using N cores and 2N threads. That will precisely define what the effect of the search overhead is (note that for 2N threads, I mean NO HT. Just double threads (2 per physical/logical cpu). Now we know how much the search overhead hurts. Then one can carefully measure how much HT affects search speed (gaining some back). The net will show whether it is a gain or loss.
Running 8 threads on 4 physical cores without HT seems a very poor simulation of using HT.
Play a series of matches to fixed depth using 1 cpu, then 2 cpus, then 4. Since the depth is fixed, the time to complete is irrelevant, and the only change will be the search overhead. If it does help, the 4 cpu fixed depth program should perform better than the 1 cpu fixed depth program.
This test I agree with. (I wrote my test proposal before I read this.)
Still it is beyond me why you would think that if the latter test shows a clear win for the 4 cpu fixed depth program, this could be of any help for improving a serial search.