Some tests with hyper threading

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Howard E
Posts: 261
Joined: Wed Mar 08, 2006 8:49 pm

Some tests with hyper threading

Post by Howard E »

Computer:
corei7-920 (modest overclock from 133 to 150)
so 2660 mhz to 3000 mhz (20 * clock)
for single core apps something in mother board called
turbo enabled yields 21 * speed so 3150 mhz

8gb ram 512hash allotted for chess programs


Test:
nps count from new game starting position
ht is hyper threading
T is threads
nps is million except rybka's count

1. Rybka 2.2mpox64
ht=on 862.784
ht=off 744.450

2. Arasan 11.3
ht=off ht=on
1T 1.7 1.8
2T 3.2 2.9
4T 4.3 3.7
8Tfor ht on only 4.6

3. Bright0.4a
ht=off ht=on
1T 1.6 1.6
2T 3.1 3.0
4T 6.1 5.1
8T for ht on only 7.7
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Some tests with hyper threading

Post by zullil »

If I recall correctly, I think I was convinced last year that hyper-threading does not improve engine performance.

You may wish to scan this rather long thread.
Dann Corbit
Posts: 12792
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Some tests with hyper threading

Post by Dann Corbit »

It's only an interesting test when the thread count exceeds the number of physical cores for the machine.

I can't tell from your post if that is what you did.
Howard E
Posts: 261
Joined: Wed Mar 08, 2006 8:49 pm

Re: Some tests with hyper threading

Post by Howard E »

Very informative thread, thanks Louis
Many questions by those , like myself, seeking answers.
Lot's of explanation by those with computer knowledge (Bob, Gian-Carlo to name a few)
-nps increase with ht on is not a valid indicator of performance
-epd testing to solution time is more accurate for measuring ht on/off
(I'll do this for fun and compare solution times)

I did not see any actual posts of solution times in that thread.
Are there any implications that may make this method innacurate?
Howard E
Posts: 261
Joined: Wed Mar 08, 2006 8:49 pm

Re: Some tests with hyper threading

Post by Howard E »

The last two tests do this:
8Threads for ht on only 4.6m nps for Arasan 11.3
8Threads for ht on only 7.7m nps for bright 0.4a

There are only 4 physical processors on my machine,
so the other 4 are virtual.
But it looks like nps count is not accurate for comparing MP capable engine performance. I coming out of the single processor dark ages and just found this out.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Some tests with hyper threading

Post by bob »

Howard E wrote:Very informative thread, thanks Louis
Many questions by those , like myself, seeking answers.
Lot's of explanation by those with computer knowledge (Bob, Gian-Carlo to name a few)
-nps increase with ht on is not a valid indicator of performance
-epd testing to solution time is more accurate for measuring ht on/off
(I'll do this for fun and compare solution times)

I did not see any actual posts of solution times in that thread.
Are there any implications that may make this method innacurate?
No. If you measure time to depth, whether it be to a fixed depth or to a fixed solution answer, the faster time is the better time. HT often improves NPS by some amount, but not by an amount that is greater than or even equal to the overhead cost of doing a parallel search. parallel search overhead is about 30% per processor in the case of Crafty. If you don't get more than 30% performance (and you won't with hyper-threading, I've spent a ton of time optimizing cache references, etc) then hyperthreading is a loser.

I've played with the "new-and-improved" hyper-threading on the trial Nehalem box we have, and this hasn't changed at all for Crafty. I have it turned off on the box I am playing with as a result.
Allard Siemelink
Posts: 297
Joined: Fri Jun 30, 2006 9:30 pm
Location: Netherlands

Re: Some tests with hyper threading

Post by Allard Siemelink »

Howard E wrote:Computer:
corei7-920 (modest overclock from 133 to 150)
so 2660 mhz to 3000 mhz (20 * clock)
for single core apps something in mother board called
turbo enabled yields 21 * speed so 3150 mhz

8gb ram 512hash allotted for chess programs


Test:
nps count from new game starting position
ht is hyper threading
T is threads
nps is million except rybka's count

1. Rybka 2.2mpox64
ht=on 862.784
ht=off 744.450

2. Arasan 11.3
ht=off ht=on
1T 1.7 1.8
2T 3.2 2.9
4T 4.3 3.7
8Tfor ht on only 4.6

3. Bright0.4a
ht=off ht=on
1T 1.6 1.6
2T 3.1 3.0
4T 6.1 5.1
8T for ht on only 7.7
Thanks Howard, this is useful information.
bright's scaling without hyperthreading seems pretty good: 1.6, 3.1, 6.1.
(as I only have a dual core computer, I was not sure of the 4cpu nps)
But if hyperthreading is enabled, the 4cpu nps only reaches 5.1.

It seems that bright (and arasan too?!) needs to set the processor affinity to cope with hyperthreading (to make sure each thread gets its own cpu)

I haven't read the other discussion yet, but it seems to me that larger nps is better, so yes, bright would perform best (although just marginally better then 4 threads and no hyperthreading) with 8 threads and hyperthreading enabled.
Since the 8 thread (HT) nps is only 25% better than the 4 threads (no HT)nps, you'd need to play a large number (1000's) of games to actually prove it.
Howard E
Posts: 261
Joined: Wed Mar 08, 2006 8:49 pm

Re: Some tests with hyper threading

Post by Howard E »

System:
core i7 - 920
512 hash
Vista premium (64 bit)

[d] rn4k1/pp2p1b1/4b3/q2p2Q1/2B2P2/8/P1P1K1P1/R6R w - - bm Rh8+;

Hyper threading on with 8 threads

Arasanx-64:

15 01:16 359.329.020 4.673.286 +0.90 Qg6 Kf8 Qxe6 dxc4 Qc8+ Kf7 Qxc4+

Kf6 Rab1 Qa6 Qxa6+ bxa6 g4 Nc6 Ke3 Rh8 Rxh8 Bxh8 Ke4
15 01:59 557.274.712 4.659.098 +1.37 Rh8+ Kxh8

Bright-0.4a:

15/60 00:27 276.216.516 9.942 +0.52 Qg6 Kf8 Qxe6 dxc4 Qc8+ Kf7 Qxc4+ Kf6 Kf3

Qa6 Qb3 Qb6 g4 Qxb3+ axb3 Nc6 Ke4 Ke6
15/60 00:46 452.137.332 9.812 +3.32 Rh8+ Kxh8 Qh5+ Kg8 Qe8+ Bf8 Qg6+ Bg7 Qxe6+

Kf8 Qc8+ Kf7 Qf5+ Ke8 Rd1 Nd7 Rxd5 Qc7 Qg6+ Kd8 Bb5 Bf8 Rxd7+ Qxd7 Bxd7 Kxd7

Crafty-23.0-win64:

16 00:18 266.160.756 14.786.708 +0.66 Qg6 Kf8 Qxe6 dxc4 Qc8+ Kf7 Qxc4+

Kf6 Rab1 Qa6 Qxa6+ bxa6 Rb7 Nc6 g4 Rd8 c4 e6 c5
16 00:33 484.579.547 14.684.228 +1.21 Rh8+
16 00:34 494.459.146 14.542.916 +2.21 Rh8+
16 00:35 504.317.416 14.409.069 +2.46 Rh8+ Kxh8 Qh5+ Kg8 Qe8+ Bf8 Qg6+

Bg7 Qxe6+ Kf8 Qc8+ Kf7 Qf5+ Ke8 Rd1 Qc5 Bxd5 Nd7 Qf7+ Kd8 Qxg7 Kc7 Be4 Rd8

Glaurung-w64:

19 00:23 178.504.115 7.520.712 +0.76 Qg6 Kf8 Qxe6 dxc4 Qc8+ Kf7 Qxc4+

Kf6 Rab1 Qa6 Rb5 Qc6 Qxc6+ Nxc6 Rxb7 e5 Rd7 Bf8 Kf3 exf4 Kxf4 Ne5 Rdd1
19 00:32 244.997.383 7.633.744 +4.05 Rh8+ Kxh8 Qh5+ Kg8 Qe8+ Bf8 Qg6+

Bg7 Qxe6+ Kf8 Qc8+ Kf7 Qf5+ Ke8 Rd1 Nd7 Rxd5 Qc7 Qg6+ Kd8 Bb5 Bf8 Rxd7+ Qxd7 Bxd7 Kxd7 Qe4

Kc7 g4 Rd8

Rybka v2.2.mp.x64:

18 00:29 30.857.016 1.069.907 +0.15 Qg6 Kf8 Qxe6 dxc4 Qc8+ Kf7 Qxc4+

Kf6 Rab1 Qa6 Rb5 Qc6 Qxc6+ Nxc6
18 00:40 42.418.816 1.070.400 +2.59 Rh8+ Kxh8 Qh5+ Kg8 Qe8+ Bf8 Qg6+

Bg7 Qxe6+ Kf8 Qc8+ Kf7 Qf5+ Ke8

Toga141se-8cpu:

16/52 00:32 27.773.471 6.950.000 +0.70 Qg6 Bg4+ Qxg4 dxc4 Qc8+ Kf7 Qxc4+

Kf6 Rab1 Qa6 Qxa6+ Nxa6 Rxb7 Nc5 Rb4 Rd8 g4 Kf7 Rh3 a6 f5
16/58 00:39 34.719.415 6.873.846 +1.44 Rh8+ Kxh8 Qh5+ Kg8 Qe8+ Bf8 Qg6+

Bg7 Qxe6+ Kf8 Qf5+ Bf6 Rh1 Ke8 Qc8+ Qd8 Bb5+ Nc6 Qxb7 Kf8 Bxc6 Rc8 Rh5 Qc7 Qxc7 Rxc7

Hyper threading off with 4 threads (cores)


Arasanx-64:

15 01:19 398.333.027 4.986.018 +0.87 Qg6 Kf8 Qxe6 dxc4 Qc8+ Kf7 Qxc4+

Kf6 Rab1 Qa6 Qxa6+ bxa6 g4 Nc6 Ke3 Ke6 Rh7 Bd4+ Ke4 Bc3
15 01:38 494.356.327 5.012.739 +1.37 Rh8+ Kxh8
15 01:49 545.162.866 4.975.929 +1.72 Rh8+ Kxh8 Qh5+ Kg8 Qe8+ Bf8 Qg6+

Bg7 Qxe6+ Kf8 Qf5+ Bf6 Rh1 Ke8 Qc8+ Qd8 Bb5+ Nc6 Qxb7 Rc8 Bxc6+ Kf8 Rd1 Qc7 Qxc7 Rxc7

Bright-0.4a:

15/60 00:41 339.236.779 8.089 +0.48 Qg6 Kf8 Qxe6 dxc4 Qc8+ Kf7 Qxc4+ Kf6 Rab1

Qa6 Qxa6+ Nxa6 Rxb7 Nc5 Rb4 Na6 Rb5 Rc8
15/63 00:51 409.528.415 7.988 +2.35 Rh8+ Kxh8 Qh5+ Kg8 Qe8+ Bf8 Qg6+ Bg7 Qxe6+

Kf8 Qf5+ Bf6 Rh1 Ke8 Qc8+ Qd8 Bb5+ Nd7 Bxd7+ Kf8 Qxb7 d4 Kf2 a5

Crafty-23.0-win64:

16 00:22 278.169.475 12.644.067 +0.77 Qg6 Kf8 Qxe6 dxc4 Qc8+ Kf7 Qxc4+

Kf8
16 00:31 388.992.987 12.156.030 +1.21 Rh8+
16 00:33 405.397.607 12.284.775 +2.21 Rh8+
16 00:33 410.026.348 12.425.040 +2.22 Rh8+ Kxh8 Qh5+ Kg8 Qe8+ Bf8 Qg6+

Bg7 Qxe6+ Kf8 Qc8+ Kf7 Qf5+ Ke8 Rd1 Qc5 Bxd5 Nd7 Qf7+ Kd8 Qxg7 Rb8 c4 Kc7

Glaurung-w64:

19 00:31 212.468.147 6.771.893 +0.76 Qg6 Kf8 Qxe6 dxc4 Qc8+ Kf7 Qxc4+

Kf6 Rab1 Qa6 Rb5 Qc6 Qxc6+ Nxc6 Rxb7 e5 Rd7 Bf8 Kf3 exf4 Kxf4 Ne5 Rdd1
19 00:41 281.099.194 6.843.059 +4.11 Rh8+ Kxh8 Qh5+ Kg8 Qe8+ Bf8 Qg6+

Bg7 Qxe6+ Kf8 Qc8+ Kf7 Qf5+ Ke8 Rd1 Nc6 Qg6+ Kd7 Rxd5+ Qxd5 Bxd5 Nd4+ Kd3 Bf6 g4 Rb8 g5 Bh8

Qe4 Rc8 Be6+ Nxe6 Qd5+ Ke8 Qxe6

Rybka v2.2.mp.x64:

18 00:18 15.670.232 876.943 +0.15 Qg6 Kf8 Qxe6 dxc4 Qc8+ Kf7 Qxc4+ Kf6 Rab1

Qa6 Rb5 Qc6 Qxc6+ Nxc6
18 00:28 24.716.056 888.979 +2.59 Rh8+ Kxh8 Qh5+ Kg8 Qe8+ Bf8 Qg6+ Bg7 Qxe6+

Kf8 Qc8+ Kf7 Qf5+ Ke8

Toga141se-4cpu:

16/56 00:27 36.875.543 5.405.686 +2.53 Rh8+ Kxh8 Qh5+ Kg8 Qe8+ Bf8 Qg6+

Bg7 Qxe6+ Kf8 Qc8+ Kf7 Qf5+ Ke8 Rd1 Qc7 Bxd5 Nd7 Qf7+ Kd8 Qxg7 Qc8 g4 a6 Qg8+ Kc7 Qg7


Summary: on off (solution time + nodes searched)

Arasanx-64: (01:59 557.274.712) (01:38 494.356.327)
Bright-0.4a: (00:46 452.137.332) (00:51 409.528.415)
Crafty-23.0-win64: (00:33 484.579.547) (00:31 388.992.987)
Glaurung-w64: (00:32 244.997.383) (00:41 281.099.194)
Rybka v2.2.mp.x64: (00:40 42.418.816) (00:28 24.716.056)
Toga141se-(8/4)cpu: (00:39 34.719.415) (00:27 36.875.543)
krazyken

Re: Some tests with hyper threading

Post by krazyken »

Howard E wrote:Summary: on off (solution time + nodes searched)

Arasanx-64: (01:59 557.274.712) (01:38 494.356.327)
Bright-0.4a: (00:46 452.137.332) (00:51 409.528.415)
Crafty-23.0-win64: (00:33 484.579.547) (00:31 388.992.987)
Glaurung-w64: (00:32 244.997.383) (00:41 281.099.194)
Rybka v2.2.mp.x64: (00:40 42.418.816) (00:28 24.716.056)
Toga141se-(8/4)cpu: (00:39 34.719.415) (00:27 36.875.543)
Interesting. Do note that SMP searching is non-deterministic. If you run the exact same test again you will get different times. Probably needs to be run many times and take an average to get better results.
Howard E
Posts: 261
Joined: Wed Mar 08, 2006 8:49 pm

Re: Some tests with hyper threading

Post by Howard E »

Thanks, I was not aware of that. I ran a test using the same test position above. 6 Times each for two engines, clearing hash tables each time.

Bright 0.4a

ht on 35 18 14 34 24 33 avg 26.3
ht off 25 40 16 30 23 13 avg 24.5

Glaurung 2.2

ht on 33 35 33 30 32 33 avg 32.6
ht off 41 35 39 36 34 35 avg 36.6


[d] r4r1k/p1qn2pp/bp1bpp2/3n3N/2P1B2B/5N2/PP2QPPP/3R1RK1 w - - bm Nxg7;

this position posted recently by Dan Corbit

Bright 0.4a

ht on 1:20/15ply 1:26/15p 13:48/18p 2:08/16p 1:34/15p 6:23/17p
did not test with ht off