FlavusSnow wrote:Bob, I appreciate your comments and certainly respect your research. I generally agree with your assessment of hyperthreading. I want to note a simple example though, citing published benchmarks of Fritz from Tom's Hardware's 2012 CPU charts:
Intel Core i7-2600K 3.4 Ghz 4c/8t = 12,986 nps
We have to adjust this for speedup, based on your DTS speedups from your publications, Bob, I'm assuming a speedup of 6.6 times - so to normalize these nps I will multiply by 6.6/8 = 10,713 'effective' nodes per second.
Intel Core i5-2500K 3.3 ghz 4c/4t = 10,146 nps
We have to adjust this for clock speed so I will multiply by 3.4/3.3 = 10,453 nps
We now have to get 'effective' nodes by applying a true speedup, so I will multiply by 3.7/4 = 9,669.
As 10,713 > 9,669 I would have to conclude that HT does indeed increase the search speed for Fritz. Now this is a 11% increase which means maybe a ~5 ELO increase in strength, but an increase nonetheless.
This can't be conclusive for all engines. Its interesting to note that activating HT for Fritz only increases the raw nps by about 20-25%, but this seems to be enough to make it worth while despite the increased inefficiencies introduced by adding threads.
The DTS speedup numbers really have nothing to do with what Fritz, crafty and such do today, because (a) they don't use DTS and (b) the trees are far different (more variable) than the trees from back then.
For example, the last time I ran an 8 cpu test and posted the results (I think the data is still on my ftp box for those interested, but it is a ton of Crafty log files so it takes some time to filter through) the speedup was maybe 6 (I would have to locate the results myself.) The data did pretty well fit my linear approximation of
speedup = 1 + .7 * (NCPUS - 1)
which would give 5.9...
I am going to post some data concerning search overhead, by running a ton of positions to fixed depth, using 1, 2, 4 and 8 threads. That will pretty well show the search overhead, which I have typically measured at 30%. Which means if HT doesn't give you a 30% NPS increase, it will lose. At least for Crafty. I've seen nothing to suggest that Fritz's parallel search is any better, and I don't even know if it is as good as Crafty's search. Based on your numbers (25%) it would appear to be a losing idea.
You can certainly compute Fritz's search overhead by doing the same test I am doing... Search a significant number of test positions (NOT tactical positions, my opening positions for cluster testing would be OK although I would like to see some later middle-game and even endgame positions thrown in) to a fixed depth with one cpu and record the total nodes searched for all positions. Repeat using 2 cores. the 2 core run will search more nodes. The last time I tested this on Crafty, admittedly a few years ago, it was 30%. If it averages less than your 25% speedup, then HT is a gain. However, I will point out that if you gain just 1 elo, there is a risk, because doubling the number of cores certainly increases the variability on the search times, which can hurt at the wrong time.
Let me know if you run this, because I am doing it myself, and it would be interesting to see how the two compare. My numbers will follow in a day or two...
BTW, I am almost certain you will not see anything near 3.7x speedup using 4 cores. My numbers are closer to 3.2 or 3.2 (or they were the last time I ran this kind of test, which was several years ago). I doubt my numbers are any better, and just hope they haven't gone down with the more selective trees we search.