Hyperthreading and Computer Chess: Intel i5-3210M

geots · Post by **geots** » Wed Apr 17, 2013 1:20 am

bob wrote:
geots wrote:
bob wrote:
Modern Times wrote:
bob wrote: Bottom line is to NOT use hyper threading when playing chess. You don't have to disable it as new operating system process schedulers understand the issues and will make sure each thread runs on a physical core, unless you run more threads than physical cores. At that point, you start to hurt performance.
Which new operating systems are you referring to ? Do you include Windows 7 in that ?
Yes. Windows 7/8, recent linux kernels (no more than 3 years old), Mac OS x. Etc.

I thank everyone who posted a thread here with their thoughts. Here is what I come away with: First, I always end up wishing I knew just 1/1,000,000 of what Bob does. Second, I take away from this that HT could possibly-maybe in some situations help a little bit- but you have to be careful- because if mishandled- it could hurt also.

But what no one has thought about- or at least no one has mentioned- is another important issue, I would think. When, for example, you are testing for CCRL, we all benchmarked our computers using the "crafty benchmark" to try and get the hardware as close to the same as possible. There is no perfection there- you just have to do all you can. I have no idea if this is applicable, but right now I am running 2- intel i5 4-core systems and neither have HT. When you are always trying to get rid of as many hardware variables as possible in testing- it looks like it could be a bad idea to run 2 machines with no HT and then turn around and use another system WITH HT. And then all the results go in the same batch. I would think that ideally, you should run HP on all 3 (which I cannot), or run all 3 with NO HT.

Maybe I am wrong, but this is something I considered. I would be very interested to read Dr. Hyatt's response to this thought.

And best to all of you,

george
Equal hardware is not easy. HT changes things. As does turbo-boost. As does different memory sizes/speeds. As does different cache sizes. About the only way I get sanity here is that each of our clusters is made of identical nodes, so as long as I test on a single cluster, I can eliminate that single point of worry.

I agree fully. It is not that much of a problem with the HT, because I can disable it on the new i7 6-core. Then all 3 systems will have no HT. But what to do about turbo-boost on these 2 i5s and also the new i7 is about to drive me up the wall. I don't guess it matters if you have 1 system only, or you are not trying to bench 3 as close together as possible- because any speed-up or slow-down from turbo boost will affect both engines the same. But if you are trying to bench- you got problems- I just don't know how much it throws things off in benching. Basically, I am lost on the issue of turbo-boost and what to do........... What is that we say down south- "as lost as a goose in a shitstorm!"

Best and thanks,

george

bob · Post by **bob** » Wed Apr 17, 2013 8:51 pm

Mike S. wrote:
bob wrote:One has to use some level of statistical rigor to measure these things and make statements concerning whether the test shows something to be good or bad.
Yes... I'd rather like to have 16,000 data pairs of that quality, than just 16. It is a problem, e.g. I tested that non-automated. Anyway, thinking twice I was not satisfied with these few samples. So I did run a testsuite, six times total. The SwissTest 4 (64 pos.), max. 10 seconds per position.

Dualcore i5-3210M/2.5...2.9 GHz, 512 MB hashtables
Code: Select all
            Houdini 1.5a 4T	 Stockfish 100413 4T  Critter 1.6a 4T  Stockfish 100413 2T  Houdini 1.5a 2T  Critter 1.6a 2T	
--------------------------------------------------------------------------------------------------------------------------
Total time&#58; 00&#58;02&#58;23           00&#58;03&#58;11             00&#58;03&#58;46         00&#58;03&#58;18             00&#58;03&#58;42         00&#58;04&#58;04	
solved&#58;     56                 55                   54               53                   50               47	
All 3 engines have gained from hyperthreading, some more, some less. That is based on 6*64 = 384 single tests. I think all points into the same direction.

(Critter's session file was off.)

I just finished running a test where after 500 games, a new idea looked to be a +15 elo improvement. After 30K games, it was a -8 elo change.

Again, using tactical positions is NOT the way to measure Elo improvement when you are talking about parallel search. Tactical positions, by their very nature, result in poor move ordering where the key move is usually sorted late in the move list because it is not obvious. A parallel search might well pick that up a lot quicker by splitting at the root. But normal positions are a bit different.

Rebel · Post by **Rebel** » Sat Apr 20, 2013 1:23 pm

Before the average reader gets the impression HT is worth not much, the following:

I don't want to be involved in the SMP-HT discussion but contribute that HT (hyper-threading) speeds-up SP (single processor) testing significantly. That is on my Intel I7 quad with HT using Windows 7.

First I run 2 equal engines on 4 cores, 100 games each (total 400 games) on fixed depth. This means every reverse game is exactly the same. Same moves, same depths, same score, same number of nodes searched, the final score is of course 200-200 each match ending in 50-50.

Then in stage-2 I run the same 400 games spread over 8 threads with HT and compare the running times of the 2 matches. Result:

Code: Select all

Match-1 &#40;4 cores&#41;   1&#58;03&#58;42
Match-2 &#40;8 threads&#41;   48&#58;56

geots · Post by **geots** » Mon Apr 22, 2013 4:29 am

Rebel wrote:Before the average reader gets the impression HT is worth not much, the following:

I don't want to be involved in the SMP-HT discussion but contribute that HT (hyper-threading) speeds-up SP (single processor) testing significantly. That is on my Intel I7 quad with HT using Windows 7.

First I run 2 equal engines on 4 cores, 100 games each (total 400 games) on fixed depth. This means every reverse game is exactly the same. Same moves, same depths, same score, same number of nodes searched, the final score is of course 200-200 each match ending in 50-50.

Then in stage-2 I run the same 400 games spread over 8 threads with HT and compare the running times of the 2 matches. Result:
Code: Select all
Match-1 &#40;4 cores&#41;   1&#58;03&#58;42

Match-2 &#40;8 threads&#41;   48&#58;56

Just arrived the other day and I am setting up everything on my new Alienware AURORA R4 Intel i7 6-core system, overclocked from the factory. First thing I did was go into the bios and disable hyperthreading. The advantage to that is it took me all of 49 seconds to disable it and back to my main browser. And that was when I wasn't exactly sure of its location- but pretty much knew. Now that I know for sure its location, it "might" be a stretch to say I could cut the "enable/disable" time in half- but close. That means on occasion I could use it for whatever reason, and disable it for whatever reason. In testing, if it does not say "HT enabled" that means by "my default" it is disabled.

Not that your thread mentioned it- I'm just saying, if one wanted as close to perfection in accuracy as he could get when testing, with multiple machines benched he would be wise to disable turbo-boost as well. But as I am not after perfection- maybe I will- maybe I won't...............................

gts

bob · Post by **bob** » Mon Apr 22, 2013 9:26 pm

Rebel wrote:Before the average reader gets the impression HT is worth not much, the following:

I don't want to be involved in the SMP-HT discussion but contribute that HT (hyper-threading) speeds-up SP (single processor) testing significantly. That is on my Intel I7 quad with HT using Windows 7.

First I run 2 equal engines on 4 cores, 100 games each (total 400 games) on fixed depth. This means every reverse game is exactly the same. Same moves, same depths, same score, same number of nodes searched, the final score is of course 200-200 each match ending in 50-50.

Then in stage-2 I run the same 400 games spread over 8 threads with HT and compare the running times of the 2 matches. Result:
Code: Select all
Match-1 &#40;4 cores&#41;   1&#58;03&#58;42
Match-2 &#40;8 threads&#41;   48&#58;56

That's not quite the same thing. First, the fixed depth eliminates variability, which is fine, but for parallel search, it is not worth doing since the point of using multiple cores is to go deeper. Second, the threads do not interact at all the way you are doing things, while a true parallel search shares data, synchronizes with locks, etc... And finally, the HT overhead varies all over the place, making actual timed testing more volatile, which is not something particularly useful.

rbarreira · Post by **rbarreira** » Mon Apr 22, 2013 11:00 pm

bob wrote:
Rebel wrote:Before the average reader gets the impression HT is worth not much, the following:

I don't want to be involved in the SMP-HT discussion but contribute that HT (hyper-threading) speeds-up SP (single processor) testing significantly. That is on my Intel I7 quad with HT using Windows 7.

First I run 2 equal engines on 4 cores, 100 games each (total 400 games) on fixed depth. This means every reverse game is exactly the same. Same moves, same depths, same score, same number of nodes searched, the final score is of course 200-200 each match ending in 50-50.

Then in stage-2 I run the same 400 games spread over 8 threads with HT and compare the running times of the 2 matches. Result:
Code: Select all
Match-1 &#40;4 cores&#41;   1&#58;03&#58;42
Match-2 &#40;8 threads&#41;   48&#58;56
That's not quite the same thing. First, the fixed depth eliminates variability, which is fine, but for parallel search, it is not worth doing since the point of using multiple cores is to go deeper. Second, the threads do not interact at all the way you are doing things, while a true parallel search shares data, synchronizes with locks, etc... And finally, the HT overhead varies all over the place, making actual timed testing more volatile, which is not something particularly useful.

You have several good points, but I just don't get how "making timed testing more volatile" is a problem. Randomness is a good thing in testing, surely you're not suggesting that "too much randomness" is something that the statistical elo models can't handle?

bob · Post by **bob** » Tue Apr 23, 2013 5:34 am

rbarreira wrote:
bob wrote:
Rebel wrote:Before the average reader gets the impression HT is worth not much, the following:

I don't want to be involved in the SMP-HT discussion but contribute that HT (hyper-threading) speeds-up SP (single processor) testing significantly. That is on my Intel I7 quad with HT using Windows 7.

First I run 2 equal engines on 4 cores, 100 games each (total 400 games) on fixed depth. This means every reverse game is exactly the same. Same moves, same depths, same score, same number of nodes searched, the final score is of course 200-200 each match ending in 50-50.

Then in stage-2 I run the same 400 games spread over 8 threads with HT and compare the running times of the 2 matches. Result:
Code: Select all
Match-1 &#40;4 cores&#41;   1&#58;03&#58;42
Match-2 &#40;8 threads&#41;   48&#58;56
That's not quite the same thing. First, the fixed depth eliminates variability, which is fine, but for parallel search, it is not worth doing since the point of using multiple cores is to go deeper. Second, the threads do not interact at all the way you are doing things, while a true parallel search shares data, synchronizes with locks, etc... And finally, the HT overhead varies all over the place, making actual timed testing more volatile, which is not something particularly useful.
You have several good points, but I just don't get how "making timed testing more volatile" is a problem. Randomness is a good thing in testing, surely you're not suggesting that "too much randomness" is something that the statistical elo models can't handle?

Randomness is NOT a good thing when you are trying to measure small Elo changes. You already changed the program's source code. Now you vary the CPU speed in a random way which introduces more variance in the results...

Running 8 engines on 4 physical cores can be problematic when one engine is highly tuned and the other is not. The highly tuned engine will get a greater percentage of a core than the one doing excessive memory references or whatever, which can skew the results in a way you don't want...

rbarreira · Post by **rbarreira** » Tue Apr 23, 2013 12:01 pm

bob wrote:
rbarreira wrote:
bob wrote:
Rebel wrote:Before the average reader gets the impression HT is worth not much, the following:

I don't want to be involved in the SMP-HT discussion but contribute that HT (hyper-threading) speeds-up SP (single processor) testing significantly. That is on my Intel I7 quad with HT using Windows 7.

First I run 2 equal engines on 4 cores, 100 games each (total 400 games) on fixed depth. This means every reverse game is exactly the same. Same moves, same depths, same score, same number of nodes searched, the final score is of course 200-200 each match ending in 50-50.

Then in stage-2 I run the same 400 games spread over 8 threads with HT and compare the running times of the 2 matches. Result:
Code: Select all
Match-1 &#40;4 cores&#41;   1&#58;03&#58;42
Match-2 &#40;8 threads&#41;   48&#58;56
That's not quite the same thing. First, the fixed depth eliminates variability, which is fine, but for parallel search, it is not worth doing since the point of using multiple cores is to go deeper. Second, the threads do not interact at all the way you are doing things, while a true parallel search shares data, synchronizes with locks, etc... And finally, the HT overhead varies all over the place, making actual timed testing more volatile, which is not something particularly useful.
You have several good points, but I just don't get how "making timed testing more volatile" is a problem. Randomness is a good thing in testing, surely you're not suggesting that "too much randomness" is something that the statistical elo models can't handle?
Randomness is NOT a good thing when you are trying to measure small Elo changes. You already changed the program's source code. Now you vary the CPU speed in a random way which introduces more variance in the results...

Running 8 engines on 4 physical cores can be problematic when one engine is highly tuned and the other is not. The highly tuned engine will get a greater percentage of a core than the one doing excessive memory references or whatever, which can skew the results in a way you don't want...

The problem with that scenario is not "too much randomness". The problem is that you're overloading the system and testing programs in a different environment than you would like.

If "too much randomness" was a bad thing in testing, then you would always test in single-threaded mode with fixed-nodes and only one starting position. Which is obviously ridiculous. If you wanted to test whether a coin is biased, would you throw it in a deterministic way?

In fact, what you want in testing is as much randomness as possible, because what you're trying to measure is how two (or more) fixed programs behave on average. The less randomness you have, the more redundancy between games, and therefore the less meaningful each played game is.

Laskos · Post by **Laskos** » Wed Apr 24, 2013 10:40 am

bob. wrote:
...blah-blah...

The only exception to the "hyper threading is not good for chess" would be a poorly implemented program which gets an unusual boost from HT, that a well-designed implementation would not get. Such tricks (HT) tend to help poorly written code more than code that has been optimized to efficiently access memory and to reduce as much as possible unnecessary data dependencies or unnecessary computation that stalls/clogs pipelines...

i7 2600, 4 physical cores, 1,000 games 10s + 0.2s in cutechess-cli, Houdini 3, conclusive result:

8 threads vs 4 threads

Code: Select all

+291 =462 -247

+16 Elo points for 8 threads
LOS = 97.1%

Hope Bob comes with less blah-blah, this is already the third conclusive test on my i7 that HT gives 10-20 Elo points. Just get used to it.

Kai

bob · Post by **bob** » Wed Apr 24, 2013 8:13 pm

rbarreira wrote:
bob wrote:
rbarreira wrote:
bob wrote:
Rebel wrote:Before the average reader gets the impression HT is worth not much, the following:

I don't want to be involved in the SMP-HT discussion but contribute that HT (hyper-threading) speeds-up SP (single processor) testing significantly. That is on my Intel I7 quad with HT using Windows 7.

First I run 2 equal engines on 4 cores, 100 games each (total 400 games) on fixed depth. This means every reverse game is exactly the same. Same moves, same depths, same score, same number of nodes searched, the final score is of course 200-200 each match ending in 50-50.

Then in stage-2 I run the same 400 games spread over 8 threads with HT and compare the running times of the 2 matches. Result:
Code: Select all
Match-1 &#40;4 cores&#41;   1&#58;03&#58;42
Match-2 &#40;8 threads&#41;   48&#58;56
That's not quite the same thing. First, the fixed depth eliminates variability, which is fine, but for parallel search, it is not worth doing since the point of using multiple cores is to go deeper. Second, the threads do not interact at all the way you are doing things, while a true parallel search shares data, synchronizes with locks, etc... And finally, the HT overhead varies all over the place, making actual timed testing more volatile, which is not something particularly useful.
You have several good points, but I just don't get how "making timed testing more volatile" is a problem. Randomness is a good thing in testing, surely you're not suggesting that "too much randomness" is something that the statistical elo models can't handle?
Randomness is NOT a good thing when you are trying to measure small Elo changes. You already changed the program's source code. Now you vary the CPU speed in a random way which introduces more variance in the results...

Running 8 engines on 4 physical cores can be problematic when one engine is highly tuned and the other is not. The highly tuned engine will get a greater percentage of a core than the one doing excessive memory references or whatever, which can skew the results in a way you don't want...
The problem with that scenario is not "too much randomness". The problem is that you're overloading the system and testing programs in a different environment than you would like.

If "too much randomness" was a bad thing in testing, then you would always test in single-threaded mode with fixed-nodes and only one starting position. Which is obviously ridiculous. If you wanted to test whether a coin is biased, would you throw it in a deterministic way?

In fact, what you want in testing is as much randomness as possible, because what you're trying to measure is how two (or more) fixed programs behave on average. The less randomness you have, the more redundancy between games, and therefore the less meaningful each played game is.

There are various kinds of randomness. Measuring time introduces significant randomness as I have already reported here. But this randomness is different. It is NOT a good idea to play games on random hardware platforms, because each new test will be run on different hardware speeds. And different speeds affect two programs in different ways. And that skews the results. For hardware, one wants as consistent a platform as possible when trying to compare two different versions of a program to see which is better. Two degrees of randomness is not such a good thing...

I'd personally be happy with ZERO randomness, but that's not possible.

Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M