Hyperthreading and Computer Chess: Intel i5-3210M

Rebel · Post by **Rebel** » Wed Apr 24, 2013 8:38 pm

Laskos wrote:
bob. wrote:
...blah-blah...

The only exception to the "hyper threading is not good for chess" would be a poorly implemented program which gets an unusual boost from HT, that a well-designed implementation would not get. Such tricks (HT) tend to help poorly written code more than code that has been optimized to efficiently access memory and to reduce as much as possible unnecessary data dependencies or unnecessary computation that stalls/clogs pipelines...
i7 2600, 4 physical cores, 1,000 games 10s + 0.2s in cutechess-cli, Houdini 3, conclusive result:

8 threads vs 4 threads
Code: Select all
+291 =462 -247
+16 Elo points for 8 threads
LOS = 97.1%

Hope Bob comes with less blah-blah, this is already the third conclusive test on my i7 that HT gives 10-20 Elo points. Just get used to it.

Kai

Fits my own observations.

bleh-bleh

bob · Post by **bob** » Wed Apr 24, 2013 9:06 pm

Rebel wrote:
Laskos wrote:
bob. wrote:
...blah-blah...

The only exception to the "hyper threading is not good for chess" would be a poorly implemented program which gets an unusual boost from HT, that a well-designed implementation would not get. Such tricks (HT) tend to help poorly written code more than code that has been optimized to efficiently access memory and to reduce as much as possible unnecessary data dependencies or unnecessary computation that stalls/clogs pipelines...
i7 2600, 4 physical cores, 1,000 games 10s + 0.2s in cutechess-cli, Houdini 3, conclusive result:

8 threads vs 4 threads
Code: Select all
+291 =462 -247
+16 Elo points for 8 threads
LOS = 97.1%

Hope Bob comes with less blah-blah, this is already the third conclusive test on my i7 that HT gives 10-20 Elo points. Just get used to it.

Kai
Fits my own observations.

bleh-bleh

And maybe ONE day a little science will work its way into these discussions.

(hint: "ONE PROGRAM"?

And based on 1K games, with ONE program, HT is a good idea for all?

HT offers more to programs that do excessive memory accesses, particularly those that miss L1/L2 and even L3 cache. As well as programs that are written in a style that suffers from significant data dependencies between successive instructions. That means it varies significantly from program to program.

As I said...

syzygy · Post by **syzygy** » Wed Apr 24, 2013 9:54 pm

bob wrote:And maybe ONE day a little science will work its way into these discussions.

(hint: "ONE PROGRAM"?

And based on 1K games, with ONE program, HT is a good idea for all?

Let's see:

bob wrote:Regardless of urban legend, I have NEVER seen one example where using hyper threading improves the performance of a chess engine. Not a single one.

bob wrote:There's something badly wrong with your testing. I can post a ton of data relative to Crafty and SMT (hyper-threading). And it has ALWAYS been worse on than off. Including the recent test on my macbook dual i7 with SMT enabled (I can't turn it off).

If it doesn't work for Crafty, it can't work for any other engine? (Well, except for "poorly implemented" engines like Houdini, I guess.)

Btw, I do agree that Mike's results are not statistically significant because of the small sample size. But is it much different for the old papers on which most "conventional knowledge" is based? Just an example: 24 test positions?

Rebel · Post by **Rebel** » Thu Apr 25, 2013 8:02 am

rbarreira wrote:
bob wrote:
Rebel wrote:Before the average reader gets the impression HT is worth not much, the following:

I don't want to be involved in the SMP-HT discussion but contribute that HT (hyper-threading) speeds-up SP (single processor) testing significantly. That is on my Intel I7 quad with HT using Windows 7.

First I run 2 equal engines on 4 cores, 100 games each (total 400 games) on fixed depth. This means every reverse game is exactly the same. Same moves, same depths, same score, same number of nodes searched, the final score is of course 200-200 each match ending in 50-50.

Then in stage-2 I run the same 400 games spread over 8 threads with HT and compare the running times of the 2 matches. Result:
Code: Select all
Match-1 (4 cores)   1:03:42
Match-2 (8 threads)   48:56
That's not quite the same thing. First, the fixed depth eliminates variability, which is fine, but for parallel search, it is not worth doing since the point of using multiple cores is to go deeper. Second, the threads do not interact at all the way you are doing things, while a true parallel search shares data, synchronizes with locks, etc... And finally, the HT overhead varies all over the place, making actual timed testing more volatile, which is not something particularly useful.
You have several good points, but I just don't get how "making timed testing more volatile" is a problem. Randomness is a good thing in testing, surely you're not suggesting that "too much randomness" is something that the statistical elo models can't handle?

Randomness is the sole reason why we play so many games to proof a program change positive. The lower the randomness in your testing environment the lesser games you need.

hgm · Post by **hgm** » Thu Apr 25, 2013 8:12 am

Sorry, that is nonsense. If A and B would always play the same game against each other, and A happened to win it, it would not prove that A is stronger at all. It could very well be that starting from every position that is not in the game B would win. (In practice this could even occur, e.g. because the opening book of the far stronger engine B contains an error that allows a book win.)

Ricardo is right, with the caveat that one should not go to extremes: if the randomness would be so high that it starts to randomly decide the result (e.g. by randomly starving processes for CPU so they would always lose on time before they could complete 20 moves), that would qualify as "too much randomness". But in typical testing conditions we are very far from this limit.

Modern Times · Post by **Modern Times** » Thu Apr 25, 2013 8:47 am

syzygy wrote:If it doesn't work for Crafty, it can't work for any other engine? (Well, except for "poorly implemented" engines like Houdini, I guess.)

Indeed. Mark Uniacke (Hiarcs) clearly considers that Hiarcs benefits from hyperthreading, because they used it in tournament play a couple of years back. Not sure if it was WCCC or another tournament, but I'm positive they used it.

IQ · Post by IQ » Thu Apr 25, 2013 10:03 am

Laskos wrote:
bob. wrote:
...blah-blah...

The only exception to the "hyper threading is not good for chess" would be a poorly implemented program which gets an unusual boost from HT, that a well-designed implementation would not get. Such tricks (HT) tend to help poorly written code more than code that has been optimized to efficiently access memory and to reduce as much as possible unnecessary data dependencies or unnecessary computation that stalls/clogs pipelines...
i7 2600, 4 physical cores, 1,000 games 10s + 0.2s in cutechess-cli, Houdini 3, conclusive result:

8 threads vs 4 threads
Code: Select all
+291 =462 -247
+16 Elo points for 8 threads
LOS = 97.1%

Hope Bob comes with less blah-blah, this is already the third conclusive test on my i7 that HT gives 10-20 Elo points. Just get used to it.

Kai

Let me weigh in here from the perspective of somebody who topped the playchess bullet list and was in the top 5 blitz for some time a while back when testing a 6 core 980x processor with Houdini 3. There are unfortunatlely some issues you should be aware of:

I. The ONLY valid test is to use one machine with hyperthreading turned off in the bios (and turbo boost etc.) using all real cores playing against another machine with hyperthreading turned on using all threads. You cannot just play Houdini with threads equalling real cores when hyperthreading is turned on to emulate no hyperthreading due to a long standing race condition bug in Houdini which would sometimes drop NPS from for example 20 M N/s to just 2 M N/s. This would persist for a couple of moves spoiling and loosing some games (approx 1 out of 10). This is confimed by a number of people besides myself, there is also a thread about this somewhere on this board.

II. You are comparing Apples with Oranges. Usually with Hyperthreading turned off you can overclock much higher gaining - depending on your cooling - 30% in all situations while avoiding the search overhead of additional threads.

III. If your results would hold under I + II - Lets for the argument for now just assume they would hold under I. contrary to many of the best machine players on playchess, then Robert would have a very easy way to improve his program. This hypothetical result would imply that he could replicate the splitting, extension, reduction behaviour in software. This would result in a new program which would behave as theoretical expected: the hyperthreading gains would be offset by the parallel overhead while playing strength would now be higher on the non hyperthreaded machine. A naive approach would be to skip reductions randomly with a probability of 20%^depth or something similar.

IV. Now that the cloud service on playchess enables one to offer the engine for rents many people reenable hyperthreading to get higher N/s and more cores displayed in the engine rooster. This is pure marketing because many people believe more is better

I love to see your results using two machines where on one hyperthreading is completly turned off in bios. Up to that point i am unconvinced.

hgm · Post by **hgm** » Thu Apr 25, 2013 10:12 am

Can you explain why a 'race-condition bug' would care whether the cores are full cores or hyper threads? Or are you claiming that this bug would also strike when Houdini was running on a true 8-core machine (with HT off) if it only uses 4 threads there? If so, would this apply to any situation where the number of threads was smaller than the number of cores (e.g. using 3 threads on a 4-core HT-off machine)?

I don't follow the logic of your point III. What you say there sounds very non-sensical. Software can never emulate extra computational power delivered by the hardware (in this case through HT).

Laskos · Post by **Laskos** » Thu Apr 25, 2013 10:16 am

IQ wrote:
Laskos wrote:
bob. wrote:
...blah-blah...

The only exception to the "hyper threading is not good for chess" would be a poorly implemented program which gets an unusual boost from HT, that a well-designed implementation would not get. Such tricks (HT) tend to help poorly written code more than code that has been optimized to efficiently access memory and to reduce as much as possible unnecessary data dependencies or unnecessary computation that stalls/clogs pipelines...
i7 2600, 4 physical cores, 1,000 games 10s + 0.2s in cutechess-cli, Houdini 3, conclusive result:

8 threads vs 4 threads
Code: Select all
+291 =462 -247
+16 Elo points for 8 threads
LOS = 97.1%

Hope Bob comes with less blah-blah, this is already the third conclusive test on my i7 that HT gives 10-20 Elo points. Just get used to it.

Kai
Let me weigh in here from the perspective of somebody who topped the playchess bullet list and was in the top 5 blitz for some time a while back when testing a 6 core 980x processor with Houdini 3. There are unfortunatlely some issues you should be aware of:

I. The ONLY valid test is to use one machine with hyperthreading turned off in the bios (and turbo boost etc.) using all real cores playing against another machine with hyperthreading turned on using all threads. You cannot just play Houdini with threads equalling real cores when hyperthreading is turned on to emulate no hyperthreading due to a long standing race condition bug in Houdini which would sometimes drop NPS from for example 20 M N/s to just 2 M N/s. This would persist for a couple of moves spoiling and loosing some games (approx 1 out of 10). This is confimed by a number of people besides myself, there is also a thread about this somewhere on this board.

II. You are comparing Apples with Oranges. Usually with Hyperthreading turned off you can overclock much higher gaining - depending on your cooling - 30% in all situations while avoiding the search overhead of additional threads.

III. If your results would hold under I + II - Lets for the argument for now just assume they would hold under I. contrary to many of the best machine players on playchess, then Robert would have a very easy way to improve his program. This hypothetical result would imply that he could replicate the splitting, extension, reduction behaviour in software. This would result in a new program which would behave as theoretical expected: the hyperthreading gains would be offset by the parallel overhead while playing strength would now be higher on the non hyperthreaded machine. A naive approach would be to skip reductions randomly with a probability of 20%^depth or something similar.

IV. Now that the cloud service on playchess enables one to offer the engine for rents many people reenable hyperthreading to get higher N/s and more cores displayed in the engine rooster. This is pure marketing because many people believe more is better

I love to see your results using two machines where on one hyperthreading is completly turned off in bios. Up to that point i am unconvinced.

I don't have two identical machines to enable HT in one and disable in another. I could do a Fritz benchmark with HT on and off with 4 threads, and extrapolate results, but this is pretty lousy. I was not aware that Houdini has problems dealing with 4 threads on 4 cores with HT on. Would it have problems with 8 threads on 4 cores then too?

hgm · Post by **hgm** » Thu Apr 25, 2013 10:21 am

I don't see how Houdini could perceive a situation where it is only allowed to interact with 4 of the 8 hyper threads different from a situation where there only are 4 unsplit cores in total, no matter what bugs it might have. What goes on on the remaining HT would just be outside its 'field of view'. So if there are any problems, it seems that setting an affinity mask for the 4-threads instance to logical core 0,2,4,6 should cure them.

Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M