Modern Times wrote:zullil wrote:
If the number of threads does not exceed the number of physical cores, disabling HT should be unnecessary---provided your operating system is smart enough to schedule the threads onto different physical cores. It's an OS issue, not a CPU issue.
Windows 7 and upwards should be doing this correctly, not sure about Vista.
It is likely the overhead being measured is OS related; however, running tests without an OS to prove this would be difficult.
Ray, I don't share your optimism about Windoze 7. All the tests I ran were on Win7. In fact, I experimented with setting Affinity and that is where Win7's insanity truly shows.
On my 6 core i7-3970X (HT On) I set affinity to the even numbered cores (0,2,4,6,8,10) and turned on a chess engine with 6 threads. Instead of seeing the CPU load go to a solid 50% as it does without affinity set, it went to 33.3% as Win7 scheduled the six threads on 4 of the available 6 cores. I used affinity to force the threads onto the unused cores and back. I found that getting it to schedule 5 of 6 cores I selected wasn't too hard, but getting 6 of 6 was very difficult and random. Each test Win7 randomly would decide not to schedule a different core.
I ran this test because I suspected that Windoze moving the 6 threads around during analysis might have been causing the overhead and my theory was that if I could prevent it from happening the overhead would be reduced. I was never able to successfully lock the engine to the cores I selected. This prevented me from running that test with the required repetition to show significant results. Try it yourself.
It would be interesting for someone running native Linux to report what happens when setting affinity and what if any overhead there is with HT On versus Off. (I only have Linux VMs).
All that said, I still don't find a 3% or lower hit from running HT On enough to turn off this useful feature. One program I wrote that searches a puzzle space that can be clearly divided into n CPU sections with no overlap in the search space between threads ran on 12 threads with a speed as if it was running on 7 physical cores when compared to spawning 6 threads. This obviously was not a chess program and didn't have the inherent issues of such. The calculations this program completed took weeks and therefore the speed increase was quite significant for me. I'll take a free extra core any day.