Page 1 of 4

Interesting machine

Posted: Tue May 17, 2016 9:37 pm
by bob
IBM just donated a dual socket 10 cores per socket power-8 machine to the department. I thought I would run a quick test and was amused to see this:

[hyatt@blueblaze crafty]$ ./crafty
unable to open book file [./book.bin].
book is disabled
unable to open book file [./books.bin].

Crafty v25.1

machine has 160 processors <--------------- :)

White(1):

Turns out the power 8 has 8 hyper threaded cores per physical core. :) 160 indeed. Only downside here is that it is about 20% slower per core than our Intel 2660 20 core box. I am looking into spin locks so I can run a 20 core test to see how it works, and to even try 40 and beyond to see if their hyper threading offers anything beyond Intel...

Re: Interesting machine

Posted: Wed May 18, 2016 7:26 am
by Michael Sherwin
Sounds like a very economical platform to develop an application that requires a huge number of threads.

Re: Interesting machine

Posted: Wed May 18, 2016 6:07 pm
by bob
Michael Sherwin wrote:Sounds like a very economical platform to develop an application that requires a huge number of threads.
Only question is, how effective is that high level of hyper threading? I have not yet had time to figure out a good lock mechanism (spin lock) as I have a bunch of details to handle as I finalize my retirement paperwork and such.

Re: Interesting machine

Posted: Wed May 18, 2016 8:16 pm
by Vinvin
bob wrote:
Michael Sherwin wrote:Sounds like a very economical platform to develop an application that requires a huge number of threads.
Only question is, how effective is that high level of hyper threading? I have not yet had time to figure out a good lock mechanism (spin lock) as I have a bunch of details to handle as I finalize my retirement paperwork and such.
The target of IBM is probably not 1 application multi-threaded but many different applications running on 1 core.

Re: Interesting machine

Posted: Wed May 18, 2016 8:28 pm
by bob
Vinvin wrote:
bob wrote:
Michael Sherwin wrote:Sounds like a very economical platform to develop an application that requires a huge number of threads.
Only question is, how effective is that high level of hyper threading? I have not yet had time to figure out a good lock mechanism (spin lock) as I have a bunch of details to handle as I finalize my retirement paperwork and such.
The target of IBM is probably not 1 application multi-threaded but many different applications running on 1 core.
Yes. Hyper threading has the same issue it has always had. Got the thing working with a spin lock for power 8. So far, "OK". Speedup comparable to the 20 core 2660 box, except the power 8 at 3.4ghz is about 20% slower than the 2660 at 2.9ghz...

One interesting thing is this machine has 1 TB of DRAM. Have not run on a single box with this much memory previously:

[hyatt@blueblaze crafty]$ more /proc/meminfo
MemTotal: 1067485312 kB
MemFree: 1057470400 kB
MemAvailable: 1057322048 kB
Buffers: 5760 kB

Would be one hell of a machine to generate endgame tables..

Re: Interesting machine

Posted: Thu May 19, 2016 7:25 am
by smatovic
is there any performance loss compared to big endian when running code in little endian mode on these Power8 cpus?

--
Srdja

Re: Interesting machine

Posted: Thu May 19, 2016 7:25 pm
by bob
smatovic wrote:is there any performance loss compared to big endian when running code in little endian mode on these Power8 cpus?

--
Srdja
I don't have any endian-specific code, so not for me. I consistently number bits as LSB = 0, MSB = 63. Regardless of how they are stored in memory.

I am temporarily using pthread_spin_lock() in place of my asm code, but I am probably going to rewrite the ASM code in ppc assembly language to be sure it is doing what I expect. Seems to be working well, however.

I have noticed that transparent huge pages are not working on this box (running linux kernel 3.10.0. Not yet sure why. I also noticed that large page size is 16mb rather than intel's 2mb norm. But it is running, so I am running my full 1 to 20 core parallel search test 4x per number of cores to get some data for parallel speedup.

I also noticed that the process scheduler is hopelessly confused on this box. Our 2660 box numbers cores (with hyper threading on which we usually disable) as 0-19 are on physical cores 0 - 19, and 20-39 wrap around and are also on physical cores 0-19. So 0,20 share a physical core, etc. The IBM PPC box has 8 hyper threads per CPU, with zero through 7 on physical core 0, 8 through 15 on physical core 1, etc. Don't yet know whether that is confusing the kernel's attempt at managing processor affinity or not. But I hard-code affinity which solves it completely.

I went back and re-read your post and think I mis-interpreted your question. You were asking about any speed difference if the processor itself is run in big-endian or little-endian mode? No data there, yet. I had assumed it is running in big-endian mode but will have to figure out how to check. Probably have to write some code for this.

Re: Interesting machine

Posted: Thu May 19, 2016 8:16 pm
by smatovic
You were asking about any speed difference if the processor itself is run in big-endian or little-endian mode?
Yes, this was my intended question.

I have read about Power8 cpus with little endian mode to enable support for Nvidia Cuda GPUs...so i wonder if there is any performance difference between big/little endian mode on these machines.

--
Srdja

Re: Interesting machine

Posted: Thu May 19, 2016 11:01 pm
by bob
smatovic wrote:
You were asking about any speed difference if the processor itself is run in big-endian or little-endian mode?
Yes, this was my intended question.

I have read about Power8 cpus with little endian mode to enable support for Nvidia Cuda GPUs...so i wonder if there is any performance difference between big/little endian mode on these machines.

--
Srdja
The DEC alpha had the same facility and I didn't measure any difference with it. It really only affects how stuff is stored in memory which likely has zero effect in terms of processor speed.

Re: Interesting machine

Posted: Fri May 20, 2016 2:06 am
by syzygy
bob wrote:Would be one hell of a machine to generate endgame tables..
Indeed...