Hyperthreading and Computer Chess: Intel i5-3210M

syzygy · Post by **syzygy** » Sat Apr 13, 2013 2:25 am

bnemias wrote:This subject comes up every so often, and it's hard to believe people still think searching a larger tree with a small increase in NPS is beneficial. I'm not completely convinced there's a relationship between time to depth and playing strength either. But lacking any data of my own, I tend to believe Bob.

I don't know why you doubt that there is a relationship between time to depth and playing strength. For sure these are strongly correlated. The same depth in less time, all else being equal, definitely results in stronger play.

It is also not strange to believe that the tree being larger, all else being equal, contributes positively to playing strength. With pure alpha-beta it would contribute nothing, but the top engines all use a highly selective search.

A more selective search is usually good because the added depth outweighs the errors introduced by more selectivity. But if you can search a somewhat larger tree in the same time and reaching the same depth, the impact of these errors is reduced and play may be expected to be stronger.

So even if time-to-depth increases with HT on, it just might be the case that overall this is outweighed by the positive impact of the larger tree.

Bob's HT tests were probably limited to crafty. I also believe they were done on a system that was not overclocked. The higher the overclock, the better the performance of HT. This is because memory latency has a bigger impact at higher clock frequencies and HT shines at hiding these latencies.

From the tests I have done myself I could not conclude that my engine benefits from HT, but I cannot say anything for engines that I have not tested.

geots · Post by **geots** » Sat Apr 13, 2013 3:33 am

bob wrote:
Modern Times wrote:
bob wrote: Bottom line is to NOT use hyper threading when playing chess. You don't have to disable it as new operating system process schedulers understand the issues and will make sure each thread runs on a physical core, unless you run more threads than physical cores. At that point, you start to hurt performance.
Which new operating systems are you referring to ? Do you include Windows 7 in that ?
Yes. Windows 7/8, recent linux kernels (no more than 3 years old), Mac OS x. Etc.

I thank everyone who posted a thread here with their thoughts. Here is what I come away with: First, I always end up wishing I knew just 1/1,000,000 of what Bob does. Second, I take away from this that HT could possibly-maybe in some situations help a little bit- but you have to be careful- because if mishandled- it could hurt also.

But what no one has thought about- or at least no one has mentioned- is another important issue, I would think. When, for example, you are testing for CCRL, we all benchmarked our computers using the "crafty benchmark" to try and get the hardware as close to the same as possible. There is no perfection there- you just have to do all you can. I have no idea if this is applicable, but right now I am running 2- intel i5 4-core systems and neither have HT. When you are always trying to get rid of as many hardware variables as possible in testing- it looks like it could be a bad idea to run 2 machines with no HT and then turn around and use another system WITH HT. And then all the results go in the same batch. I would think that ideally, you should run HP on all 3 (which I cannot), or run all 3 with NO HT.

Maybe I am wrong, but this is something I considered. I would be very interested to read Dr. Hyatt's response to this thought.

And best to all of you,

george

shrapnel · Post by **shrapnel** » Sat Apr 13, 2013 9:46 am

[/quote]

The only glitch I saw is that HT kicks to full NPS only after several seconds per move of Houdini 3, on ultra-fast controls it seems worthless indeed.[/quote]
This is absolutely correct. I too have found HT takes a little time to show its effects and yes, obviously on fast time-controls is worse than useless.
But given a little time, man, it really shows its worth !
I for one have NEVER experienced any degradation of performance with HT enabled. I may not understand all the technicalities and intricacies, but what I DO know for a FACT is that enabling HT has made it possible for me to beat very strong opponents with whom I used to draw earlier !
That's all that matters to me !

geots · Post by **geots** » Sat Apr 13, 2013 11:12 am

shrapnel wrote:

The only glitch I saw is that HT kicks to full NPS only after several seconds per move of Houdini 3, on ultra-fast controls it seems worthless indeed.[/quote]
This is absolutely correct. I too have found HT takes a little time to show its effects and yes, obviously on fast time-controls is worse than useless.
But given a little time, man, it really shows its worth !
I for one have NEVER experienced any degradation of performance with HT enabled. I may not understand all the technicalities and intricacies, but what I DO know for a FACT is that enabling HT has made it possible for me to beat very strong opponents with whom I used to draw earlier !
That's all that matters to me ![/quote]

So by your theory, you would have to be saying that HT basically "dumbed down" the engine so it was not playing as strong as it normally did before you used hyperthreading. If you enable HT and can now beat opponents you used to draw, then you did experience degradation of performance with the engine. Unless you are saying HT enabled made YOU play stronger. What you are doing is making a case for testers to NEVER USE HT in their tests. No harm meant- but you need to go back to the drawing board. Your post is chaotic at best.

george

Mike S. · Post by **Mike S.** » Sat Apr 13, 2013 11:44 am

A small update, with six more data pairs:

Code: Select all

Intel i5-3210M, 2 x 2.5 GHz (-2.9 GHz&#41;
2 physical cores; 4 logical cores &#40;HT&#41;
512 M hash tables &#40;DDR3-RAM / 800 MHz&#41;

Engine           | P#1 depth  time&#40;2T&#41;  time&#40;4T&#41; | P#2 depth  time&#40;2T&#41;  time&#40;4T&#41;
--------------------------------------------------------------------------------
Critter 1.6a     |      20      34        20     |      19      59        34
Deep Fritz 13    |      21      48        62 ?   |      22      32        15
Houdini 1.5a     |      21      40        26     |      21     113        36 !
Rybka 2.3.2a     |      16      36        26     |      16     113       136 ?
Stockfish 100413 |      24      34        13 !   |      25      66        41
--------------------------------------------------------------------------------
+UPDATE&#58;

Engine           | P#3 depth  time&#40;2T&#41;  time&#40;4T&#41; | P#4 depth  time&#40;2T&#41;  time&#40;4T&#41;
--------------------------------------------------------------------------------
Crafty 23.04 cbn.|      21      90        49     |      23      46        46 ?
Shredder Cl. 2012|      14      32        35 ?   |      16      60        36
Spark 1.0        |      19      55        17 !   |      22      45        30
--------------------------------------------------------------------------------
Average Time Relation total&#58;      ~1.73&#58;1
--------------------------------------------------------------------------------
AvTR. with (?)+(!) excluded&#58;     ~1.68&#58;1
------------------------------------------

positions with FEN see below

#1 r1bq1rk1/2ppbppp/p1n2n2/1p2p3/4P3/1B3N2/PPPP1PPP/RNBQR1K1 w - -
#2 5k2/6p1/2p2p2/P7/1Q6/2P1pqPP/7K/8 b - - bm c5; id Quick-19;
#3 starting position
#4 r5k1/p2r1bpp/2p2p2/8/n1P5/P5B1/5PPP/2R1RBK1 w - - bm c5; id "Mike's Test 2.2, Nr. 26";

[D]r5k1/p2r1bpp/2p2p2/8/n1P5/P5B1/5PPP/2R1RBK1 w - -

shrapnel · Post by **shrapnel** » Sat Apr 13, 2013 4:00 pm

[/quote]

So by your theory, you would have to be saying that HT basically "dumbed down" the engine so it was not playing as strong as it normally did before you used hyperthreading. If you enable HT and can now beat opponents you used to draw, then you did experience degradation of performance with the engine. Unless you are saying HT enabled made YOU play stronger. What you are doing is making a case for testers to NEVER USE HT in their tests. No harm meant- but you need to go back to the drawing board. Your post is chaotic at best.

george[/quote]
Actually I'm saying that NOT using HT was dumbing down the Engine.
The statement in Bold is the only one you got correct.
So you think MY post is chaotic, eh ?
Wonder what others think of YOUR post ? No harm meant

bob · Post by **bob** » Sat Apr 13, 2013 9:07 pm

syzygy wrote:
bnemias wrote:This subject comes up every so often, and it's hard to believe people still think searching a larger tree with a small increase in NPS is beneficial. I'm not completely convinced there's a relationship between time to depth and playing strength either. But lacking any data of my own, I tend to believe Bob.
I don't know why you doubt that there is a relationship between time to depth and playing strength. For sure these are strongly correlated. The same depth in less time, all else being equal, definitely results in stronger play.

It is also not strange to believe that the tree being larger, all else being equal, contributes positively to playing strength. With pure alpha-beta it would contribute nothing, but the top engines all use a highly selective search.

A more selective search is usually good because the added depth outweighs the errors introduced by more selectivity. But if you can search a somewhat larger tree in the same time and reaching the same depth, the impact of these errors is reduced and play may be expected to be stronger.

So even if time-to-depth increases with HT on, it just might be the case that overall this is outweighed by the positive impact of the larger tree.

Bob's HT tests were probably limited to crafty. I also believe they were done on a system that was not overclocked. The higher the overclock, the better the performance of HT. This is because memory latency has a bigger impact at higher clock frequencies and HT shines at hiding these latencies.

From the tests I have done myself I could not conclude that my engine benefits from HT, but I cannot say anything for engines that I have not tested.

I think you are hoping for "good luck". In reality, what is happening is that the parallel search is simply searching nodes that are completely unnecessary to produce the right score.

Sometimes a parallel search will find the answer much more quickly than expected, but this is generally a result of poor move ordering where the parallel search looks at the supposedly bad move sooner than the sequential search does, and quickly establishes a better bound that makes things go faster. But that is uncommon, not expected to happen regularly.

I'm likely the only person on the planet to have actually run 30K game matches with 1 cpu, then with 2, then with 4, and finally with 8. And I have found NO circumstance where time to depth suggests one thing, and actual games suggest another. That is, the speed of the parallel search is the thing that gains Elo, not some bizarre tree shape that happens regularly. Such is just "statistical noise" if the test is large enough...

The only exceptions I have seen are those where so few games are played, the statistical variance makes the results statistically insignificant.

The only exception to the "hyper threading is not good for chess" would be a poorly implemented program which gets an unusual boost from HT, that a well-designed implementation would not get. Such tricks (HT) tend to help poorly written code more than code that has been optimized to efficiently access memory and to reduce as much as possible unnecessary data dependencies or unnecessary computation that stalls/clogs pipelines...

bob · Post by **bob** » Sat Apr 13, 2013 9:12 pm

Mike S. wrote:A small update, with six more data pairs:

Code: Select all

Intel i5-3210M, 2 x 2.5 GHz (-2.9 GHz&#41;
2 physical cores; 4 logical cores &#40;HT&#41;
512 M hash tables &#40;DDR3-RAM / 800 MHz&#41;

Engine           | P#1 depth  time&#40;2T&#41;  time&#40;4T&#41; | P#2 depth  time&#40;2T&#41;  time&#40;4T&#41;
--------------------------------------------------------------------------------
Critter 1.6a     |      20      34        20     |      19      59        34
Deep Fritz 13    |      21      48        62 ?   |      22      32        15
Houdini 1.5a     |      21      40        26     |      21     113        36 !
Rybka 2.3.2a     |      16      36        26     |      16     113       136 ?
Stockfish 100413 |      24      34        13 !   |      25      66        41
--------------------------------------------------------------------------------
+UPDATE&#58;

Engine           | P#3 depth  time&#40;2T&#41;  time&#40;4T&#41; | P#4 depth  time&#40;2T&#41;  time&#40;4T&#41;
--------------------------------------------------------------------------------
Crafty 23.04 cbn.|      21      90        49     |      23      46        46 ?
Shredder Cl. 2012|      14      32        35 ?   |      16      60        36
Spark 1.0        |      19      55        17 !   |      22      45        30
--------------------------------------------------------------------------------
Average Time Relation total&#58;      ~1.73&#58;1
--------------------------------------------------------------------------------
AvTR. with (?)+(!) excluded&#58;     ~1.68&#58;1
------------------------------------------

positions with FEN see below

#1 r1bq1rk1/2ppbppp/p1n2n2/1p2p3/4P3/1B3N2/PPPP1PPP/RNBQR1K1 w - -
#2 5k2/6p1/2p2p2/P7/1Q6/2P1pqPP/7K/8 b - - bm c5; id Quick-19;
#3 starting position
#4 r5k1/p2r1bpp/2p2p2/8/n1P5/P5B1/5PPP/2R1RBK1 w - - bm c5; id "Mike's Test 2.2, Nr. 26";

[D]r5k1/p2r1bpp/2p2p2/8/n1P5/P5B1/5PPP/2R1RBK1 w - -

There's something badly wrong with your testing. I can post a ton of data relative to Crafty and SMT (hyper-threading). And it has ALWAYS been worse on than off. Including the recent test on my macbook dual i7 with SMT enabled (I can't turn it off).

It is not clear to me what you are measuring. But I can certainly run whatever test you are running, and do it on various hardware platforms (i7, an older nehalem box with 6 physical cores, and even an original PIV where SMT first showed up...

I am only hoping you are not just running a couple of positions and drawing conclusions from that???

syzygy · Post by **syzygy** » Sat Apr 13, 2013 9:31 pm

bob wrote:I think you are hoping for "good luck". In reality, what is happening is that the parallel search is simply searching nodes that are completely unnecessary to produce the right score.

Well, you don't have to read what I write, but I'm not sure why you still bother to answer?

syzygy · Post by **syzygy** » Sat Apr 13, 2013 9:51 pm

bob wrote:This plays right into the hands of a parallel search that by its very nature tends to do better when move ordering is sub-optimal.

Isn't it interesting that YBW is "known" to have no overhead compared to a sequential search with optimal move ordering?

I have a suspicion that with a good implementation of parallel search most search overhead is due to missed transpositions.

Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M