Some Notes about Hyper-Threading

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Some Notes about Hyper-Threading

Post by bob »

Sedat Canbaz wrote:
bob wrote:
Sedat Canbaz wrote:Dear Robert,

Btw,do you plan to release a new a well-optimized Crafty version which to support many cores

For example,the current available Crafty versions are up to 8 cores,thats why i paused Crafty 22.8 Benchmarks,due to it supports up to 8 cores

And as far as i know,there are some Crafty compilers,which support more than 8 cores,but unfortunately,theirs MP scaling are not very good...
I mean the chess benchmarks by Crafty are not performing quite good at 12 CPUs or higher CPUs...

In other words,its will be great if you release a new Crafty version which to support to many cores and later maybe i can resume my benchmarks with your great engine !

Best,
Sedat
Crafty supports any number of cores. It is up to the person that compiles it as to what limit they want to use. I run on ICC all the time using 12 cores, and have tested up to 64 cores... There's no limit within the software, just the compile-time option "CPUS=n". Probably everyone should use N of at least 16 today, at a minimum...
As i mentioned before, i have Crafty compilers,which support more than 8 cores,but i noticed that Crafty's MP benchmarks (as far as i remember with 12 cores)did not not perform quite well...there were misunderstanding benchmark results...

Actually Crafty 22.8 up to 8 cores is a great benchmarking tool...
But with more cores,i think it needs update-optimizing


Best,
Sedat
I'm running it on 12 cores all the time and it seems to run and test just fine. It will do better on older processors (such as 4 opteron 4-core chips, rather than 2 8-core chips, because of how cache is shared, but that is the only significant issue. When you get to 16 cores, you'd better be careful in programming as the cache coherency traffic can explode quickly, whether you MESIF or MOESI (Intel or AMD)...
Sedat Canbaz
Posts: 3018
Joined: Thu Mar 09, 2006 11:58 am
Location: Antalya/Turkey

Re: Some Notes about Hyper-Threading

Post by Sedat Canbaz »

bob wrote: Here's the problem. When you compare two programs that are VERY close in Elo, 400-500 games will NEVER be enough to find which is stronger. To get down to even the +/-4 Elo range requires 30,000 games. With SMP, the standard deviation is even larger because of the way the search behaves...
Dear Robert,

There is no doubt that you are a very good Engineer,where many chess friends studied/benefit from your great work

Just my two cents about me,
I am not in Computerchess since today or yesterday,i have a little bit experience in Computerchess too

In other words, you are a very good Engineer,where i am a Driver and i know very well the roads :)


And now about the current issue-30.000 games per player,
Forget this HT test,give me any other example, where we can see/compare a such number HT games per player ??

Actually its true that there are available many HT played games (HT ON against HT OFF) on some well-known serves
But,we can't say its a right HT testing,due to those testings are under different conditions,especially the opening books play a lot of influence/role

A few notes about the current my HT test,
This is the the first and right HT testing about proving which system (HT ON or HT OFF) is better for chess
And i hope/wish a lot to see other Testers,who will run 30.000 HT games per player in Auto232 conditions !

Btw,really i wish to see a lot too new 'Sedats',because i planning to drop my Computerchess activities in 2012
And i am quite sure too that there will be other chess friends who will do better work than mine !!


Kind Regards,
Sedat
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Some Notes about Hyper-Threading

Post by bob »

Sedat Canbaz wrote:
bob wrote:I don't see much that is useful there. Knps is not a useful measure. Time to depth is the key, as always.
As far as i know,there is no a such useful chess benchmarking tool-'Time to depth',which is working on all latest modern hardwares ?!

Or maybe i am missing something ??

Actually i need a very good indicator,which will work accurately under all hardwares (i mean e.g up to 12 Cores,16 Cores,32 Cores...)

For example,the current my Houdini 2.0 benchmark list is a quite good indicator about which are the fastest processors for computers chess
http://www.sedatcanbaz.com/chess/houdin ... enchmarks/

But unfortunately,not all people have this commercial top engine
Plus the current benchmarking method seems to be not so easy for some chess friends
Sometimes,even my tutorial does not help too :)

One thing more,honestly i like your Crafty bench program,but unfortunately it does not work properly on machines with more than 8 cores
As you know, i have created a successful Crafty benchmark list:
http://sedatchess.110mb.com/index.php?p=1_12

BTW,Axon bench was another great bench tool,but what a pity it does not work on the latest systems
http://sedatchess.110mb.com/index.php?p=1_14


Best,
Sedat
I do it all the time. The winboard protocol has a depth setting. Most programs have a command like sd=20 to cut the search off after 20 plies. Worst-case is to run 3 minute tests and look at the output to find the deepest depth that all tests reported, and then compare the times...

The Crafty benchmark is not really meant to be an SMP test. It has positions that are chosen to represent both opening and endgame positions, without regard to "typical parallel performance positions." I have a set of positions I use to measure parallel performance that come from a real game....
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Some Notes about Hyper-Threading

Post by bob »

Sedat Canbaz wrote:
bob wrote: Here's the problem. When you compare two programs that are VERY close in Elo, 400-500 games will NEVER be enough to find which is stronger. To get down to even the +/-4 Elo range requires 30,000 games. With SMP, the standard deviation is even larger because of the way the search behaves...
Dear Robert,

There is no doubt that you are a very good Engineer,where many chess friends studied/benefit from your great work

Just my two cents about me,
I am not in Computerchess since today or yesterday,i have a little bit experience in Computerchess too

In other words, you are a very good Engineer,where i am a Driver and i know very well the roads :)


And now about the current issue-30.000 games per player,
Forget this HT test,give me any other example, where we can see/compare a such number HT games per player ??

Actually its true that there are available many HT played games (HT ON against HT OFF) on some well-known serves
But,we can't say its a right HT testing,due to those testings are under different conditions,especially the opening books play a lot of influence/role

A few notes about the current my HT test,
This is the the first and right HT testing about proving which system (HT ON or HT OFF) is better for chess
And i hope/wish a lot to see other Testers,who will run 30.000 HT games per player in Auto232 conditions !

Btw,really i wish to see a lot too new 'Sedats',because i planning to drop my Computerchess activities in 2012
And i am quite sure too that there will be other chess friends who will do better work than mine !!


Kind Regards,
Sedat
If it were interesting to me, I could run 30K HT on vs 30K HT off games in a day. But I already know what HT does to MY engine. And most likely what it does to every OTHER engine since parallel search is parallel search. For crafty, it is a net loss. Not a big one, but a loss. No need to test it in real games when I know what it does to time-to-depth with 100% accuracy...
Vinvin
Posts: 5298
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: Some Notes about Hyper-Threading

Post by Vinvin »

Hi Sedat, I've access to an HT machine since few weeks and I noticed windows move 1 thread accross cores. Example : on 4 cores HT, 3 threads is better than 4 ...
So would you mind to test 9 and 10 threads with HT ON ? It could be better (time to depth) than 6 cores with HT OFF ...

Thanks,
Vincent.

Sedat Canbaz wrote:Hello dear Vincent,

Your request is done...


*Test 1:HT OFF 6 Physical cores


Conditions:
----------------
i7 980X @4.33GHz
Hyper Threading Disabled
Windows XP x64 Prof
TC:60 Minutes/Game
Ponder OFF
128 MB Hashtable
Large Pages Enabled
Position Learning OFF


Solved mate in sec:
-------------------------
1st-215s
2nd-198s
3rd-111s
4rd-209s
5th-50s
6th-168s
7th-150s
8th-110s
9th-56s
10th-158s
---------
-Hiarcs 13.2 HT OFF 6 physical cores solves the mate average in 142 sec



************************************************************

*Test 2:HT ON 12 Threads

Conditions:
----------------
i7 980X @4.33GHz
Hyper Threading Enabled
Windows XP x64 Prof
TC:60 Minutes/Game
Ponder OFF
128 MB Hashtable
Large Pages Enabled
Position Learning OFF

Solved mate in sec:
--------------------------
1st-48s
2nd-249s
3rd-98s
4rd-168s
5th-206s
6th-52s
7th-209s
8th-124s
9th-97s
10th-271s
----------
-Hiarcs 13.2 HT ON 12 Threads solves the mate position average in 152 sec



More Details:
-------------------
Hiarcs 13.2's Hashtables are cleaned before starting each bench
Hiarcs 13.2 has been tested with the same mate position:TOTAL 20 times
As we see again,the results are slightly in favor for HT OFF
Hiarcs 13.2 with 'Position Learning ON' is solving the mate much faster
Due to accurate speed testing,the benchmarks are done with Position Learning OFF



Download all HT Chess Benchmarks by Hiarcs 13.2:
http://www.sedatcanbaz.com/chess/games/ht_test.rar


Best Regards,
Sedat
User avatar
hgm
Posts: 28393
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Some Notes about Hyper-Threading

Post by hgm »

bob wrote:I have access to HUNDREDS of HT-enabled machines. And I have yet to find one single example where hyperthreading on provides faster time-to-depth results than hyperthreading off. .
So none of the machines has an Intel Atom CPU, then? :lol:
Sedat Canbaz
Posts: 3018
Joined: Thu Mar 09, 2006 11:58 am
Location: Antalya/Turkey

Re: Some Notes about Hyper-Threading

Post by Sedat Canbaz »

Vinvin wrote:Hi Sedat, I've access to an HT machine since few weeks and I noticed windows move 1 thread accross cores. Example : on 4 cores HT, 3 threads is better than 4 ...
So would you mind to test 9 and 10 threads with HT ON ? It could be better (time to depth) than 6 cores with HT OFF ...

Thanks,
Vincent.

Hello Vincent,

Interesting...

Right now my i7 980X/i7 970 machines are busy...
But later i will try to test 9,10 and 11 threads with HT ON

Best,
Sedat
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Some Notes about Hyper-Threading

Post by Daniel Shawul »

I think the best opportunity for better HT performance for chess tree search is using 1 or 2 physical cores and HT ON. The 30% gain from HT has a better chance of offseting overhead from smp at that setup, compared to going from 4 to 8 which will likely incur more idle time besides an increased search overhead. But I don't understand the HT technology well. It seems some execution units are duplicated but obviously not enough to call it a dual-core system. Gpus have a similar multithreading (similar to HT) and that is the major way to avoid pipeline stalls while waiting for memory. There there are thousands of threads to work with so you can bypass even stalls on an un-cached memory lookup with it.
User avatar
hgm
Posts: 28393
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Some Notes about Hyper-Threading

Post by hgm »

HT technology only duplicates the instruction pointer, so there can be two instruction streams, and then alternatingly feeds micro-Opts from one or the other into the execution pipeline. Because instructions from the different streams never have dependencies, out-of-order execution is more effective than on a single stream: if there are no execution-ripe instructions for one stream in the re-order buffer, the execution units will simply work on instructions from the other stream. (There are no extra exeution units, and this part of the CPU is not aware of the HT at all.)
Vinvin
Posts: 5298
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: Some Notes about Hyper-Threading

Post by Vinvin »

Sedat Canbaz wrote:
Vinvin wrote:Hi Sedat, I've access to an HT machine since few weeks and I noticed windows move 1 thread accross cores. Example : on 4 cores HT, 3 threads is better than 4 ...
So would you mind to test 9 and 10 threads with HT ON ? It could be better (time to depth) than 6 cores with HT OFF ...

Thanks,
Vincent.

Hello Vincent,

Interesting...

Right now my i7 980X/i7 970 machines are busy...
But later i will try to test 9,10 and 11 threads with HT ON

Best,
Sedat
Great !
11 is too much IMO, 8 is even more interesting ...

In fact the reference message is this one : http://www.talkchess.com/forum/viewtopi ... 887#437887 . The one I quoted is not complete ...