syzygy wrote:bob wrote:syzygy wrote:IQ wrote:Assumptions:
1) HT gain does not outweigh parallel overhead. Lets say HT gains 15% and parallel overhead going from lets say 6 to 12 threads is 30%. Usually this would lead to the NON-HT version perfoming somewhat better than the HT version. Not arguing specific numbers here, but this is for example what Bob says.
What do you mean by "HT gains 15%"?
What do you mean by "parallel overhead is 30%"?
Define your terms...
I suppose the following definitions make sense:
- HT leads to 15% higher nps.
- doubling the number of threads leads to 30% extra nodes searched to reach the same depth.
Doubling the number of threads, everything else staying the same, is ALWAYS bad. All programmers agree on this. Only the 15% higher nps could possibly offset it. Now the argument is already over, because removing the HT hardware removes the 15% higher nps.
What is not clear about this?
Your "conclusion a)" is more or less copied from what I wrote much earlier in this thread. If 1 and 2 are both true, then a) is the explanation.
Your conclusion b) is false. You could not search those nodes in any obvious way in a non-HT environment, because those nodes were searched using the 15% nps increase.
That 15% HT gain might offset 30% parallel overhead does NOT mean that 0% HT gain might offset 30% parallel overhead.
Of course if you are a very good programmer you can take H3, completely redesign the extension/reduction scheme, and release a stronger H4, but nothing in the way HT works will point you in the right direction. HT is nothing else than "double the number of threads, some increase in nps". Somehow pretending that one could simulate the effects of doubling the number of threads without parallel overhead is not going to help.
You are wrong as this is a basic discussion that comes up in parallel search.
What exactly is wrong. Please point to a specific statement.
I'm pretty sure we agree on what I wrote above...
In case you mean this:
syzygy wrote:Your conclusion b) is false. You could not search those nodes in any obvious way in a non-HT environment, because those nodes were searched using the 15% nps increase.
Clearly I meant you can't search those extra nodes in a non-HT environment without spending more time...
This:
Your conclusion b) is false. You could not search those nodes in any obvious way in a non-HT environment, because those nodes were searched using the 15% nps increase.
If you believe those extra nodes make a qualitative improvement in play, then they have to offset the net loss between HT-parallel search and the sequential search. If I avoid that 30% overhead, I gain whatever elo running 30% faster provides . That's non-trivial. I'd make a quick guess of at least 20 Elo, but maybe 25 or a little more.
With HT on, you lose overall in terms of time, so far as I have measured. Which is a net loss of speed that is the difference between the 30% overhead and whatever you can recover from HT. Let's just hope you get 25% of that back, if the search has a few issues that HT helps with. You are down 5%. Do you REALLY think that by reducing the pruning or reductions, one can gain 5%? If so, why not just modify the reduction/pruning code to reduce or prune less in those same places where you would normally do a parallel split.
I simply do not agree that there is a measurable improvement by searching a tree that is made larger in a somewhat random and unpredictable way. I've made that mistake way too many times. That is, find cases where something works well and assume it is a win, when in reality, it is worse in all those non-related cases. I've already reported a couple of those ideas previously. My "easy move" code is a zero Elo improvement. I have used it since the 70's, as has most everyone else. Yet careful testing/measurement has proven that it doesn't do a thing, good or bad.
I don't like this "guesswork" approach about "the extra width or nodes might help..." One could always test this easily enough. Take a program that does not use spin locks, and play it against a gauntlet, using N cores and N threads, and then again using N cores and 2N threads. That will precisely define what the effect of the search overhead is (note that for 2N threads, I mean NO HT. Just double threads (2 per physical/logical cpu). Now we know how much the search overhead hurts. Then one can carefully measure how much HT affects search speed (gaining some back). The net will show whether it is a gain or loss.
To measure this "the extra search nodes can hall" one can do a very simple test.
Play a series of matches to fixed depth using 1 cpu, then 2 cpus, then 4. Since the depth is fixed, the time to complete is irrelevant, and the only change will be the search overhead. If it does help, the 4 cpu fixed depth program should perform better than the 1 cpu fixed depth program.
I can run that test although I already am almost 100% certain about the outcome, thanks to past testing...