Cache line width
Moderators: hgm, Rebel, chrisw
-
- Posts: 1494
- Joined: Thu Mar 30, 2006 2:08 pm
Re: Cache line width
Can someone post a chunk of sample code showing how to align an array to the cache line width? I always wonder if my program is doing multiple fetches to retrieve info and could be sped up a bit if aligned properly. Or does the compiler do this automagically for me?
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: Cache line width
Code: Select all
#pragma align <args>
-
- Posts: 286
- Joined: Mon Mar 13, 2006 5:23 pm
- Location: Québec
Re: Cache line width
Yep, I like it a lot.mcostalba wrote:Thanks for the link !mathmoi wrote:Hi Marco,
On Linux you can check in /proc/cpuinfo. In this questions on stackoverflow.com I asked this very question and got some answers : http://stackoverflow.com/questions/1502 ... -size-in-c
I didn't know that site, it is an interesting one.
It's like expertsexchange.com, but completely free. Most of the time you get a good answer to your question well within 24 hours.
Mathieu Pagé
mathieu@mathieupage.com
mathieu@mathieupage.com
-
- Posts: 201
- Joined: Thu Mar 22, 2007 7:12 pm
- Location: Netherlands
Re: Cache line width
I highly recommend the podcasts by the founders of stack overflow, Joel Spolsky and Jeff Atwood at http://itc.conversationsnetwork.org/ser ... rflow.htmlmathmoi wrote:Yep, I like it a lot.mcostalba wrote:Thanks for the link !mathmoi wrote:Hi Marco,
On Linux you can check in /proc/cpuinfo. In this questions on stackoverflow.com I asked this very question and got some answers : http://stackoverflow.com/questions/1502 ... -size-in-c
I didn't know that site, it is an interesting one.
It's like expertsexchange.com, but completely free. Most of the time you get a good answer to your question well within 24 hours.
-
- Posts: 838
- Joined: Thu Jul 05, 2007 5:03 pm
- Location: British Columbia, Canada
Re: Cache line width
Nearly all compilers align data with "natural alignment", by which I mean that 4-byte types (e.g. float, int, unsigned) will have 4-byte alignment, 8-byte types (e.g. double, long long) will have 8-byte alignment, and 2-byte types (e.g. unsigned short) will have 2-byte alignment.mjlef wrote:Can someone post a chunk of sample code showing how to align an array to the cache line width? I always wonder if my program is doing multiple fetches to retrieve info and could be sped up a bit if aligned properly. Or does the compiler do this automagically for me?
Two interesting things to note about natural alignment:
(1) a naturally aligned basic type will never get "split" across two cache lines (i.e. a 4-byte float or an 8-byte double is always contained within a single 32- or 64- or 128-byte cacheline, it can never get split across the boundary between two cache lines).
(2) some types of CPU will only be able to load data thats naturally aligned (and its the compiler's job to ensure this). x86 is the most notable exception, it is very tolerant of misaligned data (but it is still a bit slower to access especially if it splits across two cache lines, so even x86 compilers still use natural alignment for everything).
If you want more alignment than that (e.g. to align your global variables within a cache line, or separate ones etc.), then for GCC and Microsoft, you could try something like this:
Code: Select all
#ifdef __GCC__
#define MY_ALIGN(n) __attribute__((aligned(n)))
#elif MSC_VER
#define MY_ALIGN(n) __declspec(align(n))
#endif
struct MyGlobals
{
int x;
int y;
}
MY_ALIGN(128) MyGlobals g_globals;
MY_ALIGN(128) unsigned g_myAlignedArray[1000];
Here's another useful thing, a macro that gives you the required alignment of any type (so for a structure or class type, it will tell you the alignment needed for the largest member in it, etc.):
Code: Select all
#ifdef __GCC__
#define GET_ALIGN_OF(type) __alignof__(type)
#elif MSC_VER
#define GET_ALIGN_OF(type) __alignof(type)
#endif
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Cache line width
All you need to do is to force the first byte to be on an address where the rightmost 6 bits are zero.mjlef wrote:Can someone post a chunk of sample code showing how to align an array to the cache line width? I always wonder if my program is doing multiple fetches to retrieve info and could be sped up a bit if aligned properly. Or does the compiler do this automagically for me?
t = malloc(size_needed + 63);
t = t +63 & ~63;
and t is now on a 64 byte boundary.
-
- Posts: 838
- Joined: Thu Jul 05, 2007 5:03 pm
- Location: British Columbia, Canada
Re: Cache line width
For heap-allocated things, that works well. (If you need to free the memory later, you need to keep the original pointer somewhere as well).bob wrote:All you need to do is to force the first byte to be on an address where the rightmost 6 bits are zero.mjlef wrote:Can someone post a chunk of sample code showing how to align an array to the cache line width? I always wonder if my program is doing multiple fetches to retrieve info and could be sped up a bit if aligned properly. Or does the compiler do this automagically for me?
t = malloc(size_needed + 63);
t = t +63 & ~63;
and t is now on a 64 byte boundary.
For fixed-size, stack-allocated things, you can do a similar thing with a byte array of N+63 bytes, and a pointer into the array. Or you can try and use the compiler-specific stuff from my post above, but you would have to read the details about how each compiler handles stack alignment very carefully. (Most will align stack frames with 8-byte alignment if you use any long long or double locals; I'm not sure if it will give you more than 8-byte alignment though even if you use the macro I gave above).
The place where the compiler-specific stuff really shines is when you want to control the alignment of global variables (arrays or otherwise), or static member variables of a class. There is no portable way to do it without always accessing your "variables" through a pointer or reference; however, the most popular compilers do have some sort of extension that supports it.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Cache line width
All the critical data in Crafty is in a structure where the data for it is grabbed via malloc(). My split blocks are forced to a 2k page boundary to prevent sharing a page between two different split blocks, as well as provide better cache alignment. Hash tables are forced to an appropriate boundary (16 bytes for normal hash). Global data is not nearly as important for performance in Crafty as the basic tree state stuff...wgarvin wrote:For heap-allocated things, that works well. (If you need to free the memory later, you need to keep the original pointer somewhere as well).bob wrote:All you need to do is to force the first byte to be on an address where the rightmost 6 bits are zero.mjlef wrote:Can someone post a chunk of sample code showing how to align an array to the cache line width? I always wonder if my program is doing multiple fetches to retrieve info and could be sped up a bit if aligned properly. Or does the compiler do this automagically for me?
t = malloc(size_needed + 63);
t = t +63 & ~63;
and t is now on a 64 byte boundary.
For fixed-size, stack-allocated things, you can do a similar thing with a byte array of N+63 bytes, and a pointer into the array. Or you can try and use the compiler-specific stuff from my post above, but you would have to read the details about how each compiler handles stack alignment very carefully. (Most will align stack frames with 8-byte alignment if you use any long long or double locals; I'm not sure if it will give you more than 8-byte alignment though even if you use the macro I gave above).
The place where the compiler-specific stuff really shines is when you want to control the alignment of global variables (arrays or otherwise), or static member variables of a class. There is no portable way to do it without always accessing your "variables" through a pointer or reference; however, the most popular compilers do have some sort of extension that supports it.