bnculp wrote:ouachita wrote:This issue has turned out to be a challenge to describe in writing. Nevertheless, since my 16 core CPU allows me to enable or disable HT, my assumptions therefore would be:ouachita wrote:logical processors is the key. 16 real cores with HT On means 32 logical processors.
A. HT enabled:
1. Both SF and H can run on 1 thru 32 threads;
2. Both SF and H are stronger as core usage increases from 2 to 16;
2. SF may or may not be stronger while using threads 17 thru 32 (ongoing testing indicates HT to 32 does not help SF);
3. H will not benefit and will likely be adversely affected from using threads 17 thru 32 (per RH);
B. HT disabled:
1. Both SF and H can run on only 1 thru 16 cores/threads;
Larry can tell me how much of this applies to Kr.
I agree with most although in my testing with HT enabled on a 4 core system, SF 8-threads did edge out SF 4-threads by 7 ELO. One must also be careful about drawing firm conclusions based on small sample sizes. Even a match of 1000 games which I have been using, has error bars about +- 12 ELO. Another variable in your case is 16 real cores and how each engine handles that kind of powerful hardware. There has been some speculation and some evidence (TCEC and Clemens tours both with no HT or HT off) that Stockfish and Komodo perform better than Houdini on higher powered machines. YMMV
It is possible that and likely that for example using HT to go to 32 threads on a 16 core machine may not do anything for stockfish or threads far below that 32 number.
Chess programs lose efficiency very quickly as you add more cores or logical cores, or threads. What ever term you wish to use.
The point is you lose any more gains as you increase the threads, it does not matter if they are real cores or logical cores. I would expect real cores to perform better because you are not splitting the real core into 2 logical cores. But the scaling problem still exist for both. It is the nature of the chess engine, and one that has not been solved, and my not be possible to solve on todays systems.
The scaling problem exist because many times all the threads are looking and evaluating the exact same position, and no matter how many CPU you have looking at the same position in the search tree. It gains you nothing, just spinning your wheels so to speak. And you can only split the workload effectively so many times.
That is why you gain much more going from 1 core to 8 cores, but much much less going from 8 cores to 16 cores on a 16 cores system.
The effect can be seen anytime you use MP and add threads. 1 to 2, 2 to 4, 4 to 8....