Cache line width

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

mjlef
Posts: 1494
Joined: Thu Mar 30, 2006 2:08 pm

Re: Cache line width

Post by mjlef »

Can someone post a chunk of sample code showing how to align an array to the cache line width? I always wonder if my program is doing multiple fetches to retrieve info and could be sped up a bit if aligned properly. Or does the compiler do this automagically for me?
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Cache line width

Post by sje »

Code: Select all

 #pragma align <args>
The <args> are compiler dependent.
mathmoi
Posts: 286
Joined: Mon Mar 13, 2006 5:23 pm
Location: Québec

Re: Cache line width

Post by mathmoi »

mcostalba wrote:
mathmoi wrote:Hi Marco,

On Linux you can check in /proc/cpuinfo. In this questions on stackoverflow.com I asked this very question and got some answers : http://stackoverflow.com/questions/1502 ... -size-in-c
Thanks for the link !

I didn't know that site, it is an interesting one.
Yep, I like it a lot.

It's like expertsexchange.com, but completely free. Most of the time you get a good answer to your question well within 24 hours.
Jan Brouwer
Posts: 201
Joined: Thu Mar 22, 2007 7:12 pm
Location: Netherlands

Re: Cache line width

Post by Jan Brouwer »

mathmoi wrote:
mcostalba wrote:
mathmoi wrote:Hi Marco,

On Linux you can check in /proc/cpuinfo. In this questions on stackoverflow.com I asked this very question and got some answers : http://stackoverflow.com/questions/1502 ... -size-in-c
Thanks for the link !

I didn't know that site, it is an interesting one.
Yep, I like it a lot.

It's like expertsexchange.com, but completely free. Most of the time you get a good answer to your question well within 24 hours.
I highly recommend the podcasts by the founders of stack overflow, Joel Spolsky and Jeff Atwood at http://itc.conversationsnetwork.org/ser ... rflow.html
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: Cache line width

Post by wgarvin »

mjlef wrote:Can someone post a chunk of sample code showing how to align an array to the cache line width? I always wonder if my program is doing multiple fetches to retrieve info and could be sped up a bit if aligned properly. Or does the compiler do this automagically for me?
Nearly all compilers align data with "natural alignment", by which I mean that 4-byte types (e.g. float, int, unsigned) will have 4-byte alignment, 8-byte types (e.g. double, long long) will have 8-byte alignment, and 2-byte types (e.g. unsigned short) will have 2-byte alignment.

Two interesting things to note about natural alignment:
(1) a naturally aligned basic type will never get "split" across two cache lines (i.e. a 4-byte float or an 8-byte double is always contained within a single 32- or 64- or 128-byte cacheline, it can never get split across the boundary between two cache lines).
(2) some types of CPU will only be able to load data thats naturally aligned (and its the compiler's job to ensure this). x86 is the most notable exception, it is very tolerant of misaligned data (but it is still a bit slower to access especially if it splits across two cache lines, so even x86 compilers still use natural alignment for everything).


If you want more alignment than that (e.g. to align your global variables within a cache line, or separate ones etc.), then for GCC and Microsoft, you could try something like this:

Code: Select all

#ifdef __GCC__
#define MY_ALIGN&#40;n&#41;  __attribute__(&#40;aligned&#40;n&#41;))
#elif MSC_VER
#define MY_ALIGN&#40;n&#41;  __declspec&#40;align&#40;n&#41;)
#endif

struct MyGlobals
&#123;
    int x;
    int y;
&#125;

MY_ALIGN&#40;128&#41; MyGlobals  g_globals;

MY_ALIGN&#40;128&#41; unsigned   g_myAlignedArray&#91;1000&#93;;
I'm not 100% positive, but Intel's compiler probably supports the same syntax as Microsoft's so you could check for that also in the ifdef.

Here's another useful thing, a macro that gives you the required alignment of any type (so for a structure or class type, it will tell you the alignment needed for the largest member in it, etc.):

Code: Select all

#ifdef __GCC__
#define GET_ALIGN_OF&#40;type&#41;  __alignof__&#40;type&#41;
#elif MSC_VER
#define GET_ALIGN_OF&#40;type&#41;  __alignof&#40;type&#41;
#endif
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Cache line width

Post by bob »

mjlef wrote:Can someone post a chunk of sample code showing how to align an array to the cache line width? I always wonder if my program is doing multiple fetches to retrieve info and could be sped up a bit if aligned properly. Or does the compiler do this automagically for me?
All you need to do is to force the first byte to be on an address where the rightmost 6 bits are zero.

t = malloc(size_needed + 63);

t = t +63 & ~63;

and t is now on a 64 byte boundary.
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: Cache line width

Post by wgarvin »

bob wrote:
mjlef wrote:Can someone post a chunk of sample code showing how to align an array to the cache line width? I always wonder if my program is doing multiple fetches to retrieve info and could be sped up a bit if aligned properly. Or does the compiler do this automagically for me?
All you need to do is to force the first byte to be on an address where the rightmost 6 bits are zero.

t = malloc(size_needed + 63);

t = t +63 & ~63;

and t is now on a 64 byte boundary.
For heap-allocated things, that works well. (If you need to free the memory later, you need to keep the original pointer somewhere as well).

For fixed-size, stack-allocated things, you can do a similar thing with a byte array of N+63 bytes, and a pointer into the array. Or you can try and use the compiler-specific stuff from my post above, but you would have to read the details about how each compiler handles stack alignment very carefully. (Most will align stack frames with 8-byte alignment if you use any long long or double locals; I'm not sure if it will give you more than 8-byte alignment though even if you use the macro I gave above).

The place where the compiler-specific stuff really shines is when you want to control the alignment of global variables (arrays or otherwise), or static member variables of a class. There is no portable way to do it without always accessing your "variables" through a pointer or reference; however, the most popular compilers do have some sort of extension that supports it.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Cache line width

Post by bob »

wgarvin wrote:
bob wrote:
mjlef wrote:Can someone post a chunk of sample code showing how to align an array to the cache line width? I always wonder if my program is doing multiple fetches to retrieve info and could be sped up a bit if aligned properly. Or does the compiler do this automagically for me?
All you need to do is to force the first byte to be on an address where the rightmost 6 bits are zero.

t = malloc(size_needed + 63);

t = t +63 & ~63;

and t is now on a 64 byte boundary.
For heap-allocated things, that works well. (If you need to free the memory later, you need to keep the original pointer somewhere as well).

For fixed-size, stack-allocated things, you can do a similar thing with a byte array of N+63 bytes, and a pointer into the array. Or you can try and use the compiler-specific stuff from my post above, but you would have to read the details about how each compiler handles stack alignment very carefully. (Most will align stack frames with 8-byte alignment if you use any long long or double locals; I'm not sure if it will give you more than 8-byte alignment though even if you use the macro I gave above).

The place where the compiler-specific stuff really shines is when you want to control the alignment of global variables (arrays or otherwise), or static member variables of a class. There is no portable way to do it without always accessing your "variables" through a pointer or reference; however, the most popular compilers do have some sort of extension that supports it.
All the critical data in Crafty is in a structure where the data for it is grabbed via malloc(). My split blocks are forced to a 2k page boundary to prevent sharing a page between two different split blocks, as well as provide better cache alignment. Hash tables are forced to an appropriate boundary (16 bytes for normal hash). Global data is not nearly as important for performance in Crafty as the basic tree state stuff...