smatovic wrote:is there any performance loss compared to big endian when running code in little endian mode on these Power8 cpus?
--
Srdja
I don't have any endian-specific code, so not for me. I consistently number bits as LSB = 0, MSB = 63. Regardless of how they are stored in memory.
I am temporarily using pthread_spin_lock() in place of my asm code, but I am probably going to rewrite the ASM code in ppc assembly language to be sure it is doing what I expect. Seems to be working well, however.
I have noticed that transparent huge pages are not working on this box (running linux kernel 3.10.0. Not yet sure why. I also noticed that large page size is 16mb rather than intel's 2mb norm. But it is running, so I am running my full 1 to 20 core parallel search test 4x per number of cores to get some data for parallel speedup.
I also noticed that the process scheduler is hopelessly confused on this box. Our 2660 box numbers cores (with hyper threading on which we usually disable) as 0-19 are on physical cores 0 - 19, and 20-39 wrap around and are also on physical cores 0-19. So 0,20 share a physical core, etc. The IBM PPC box has 8 hyper threads per CPU, with zero through 7 on physical core 0, 8 through 15 on physical core 1, etc. Don't yet know whether that is confusing the kernel's attempt at managing processor affinity or not. But I hard-code affinity which solves it completely.
I went back and re-read your post and think I mis-interpreted your question. You were asking about any speed difference if the processor itself is run in big-endian or little-endian mode? No data there, yet. I had assumed it is running in big-endian mode but will have to figure out how to check. Probably have to write some code for this.