Read access contention

mcostalba · Post by **mcostalba** » Wed Jun 03, 2009 7:46 pm

Thanks for your answer. Now is a bit more clear, especially the cacheline trick is interesting.

One thing. You write that we always need to avoid the same cacheline with a write accessed variable, but HOW can effectively guarantee this, as example for global variables or for file scope variables ?

I mean, I can control the placment of the variables inside my file, but what is before and after is up to the linker and I cannot control.

A possible solution, as example for file scope variables would be to define a couple of dummy padding array, one at the beginning and one after the last variable, so to be sure that whatever the linker attaches after or before your file the variables within are never near a possible write accessed one.

But this is very ugly

Is there a better way ?

hgm · Post by **hgm** » Wed Jun 03, 2009 8:03 pm

Put all your global variables in one structure. Then you control the memory allocation from the source code.

mcostalba · Post by **mcostalba** » Wed Jun 03, 2009 10:10 pm

hgm wrote:Put all your global variables in one structure. Then you control the memory allocation from the source code.

And file scope variables ?

namespace {

int heavy_read_access_here;

}

or if you prefer C style

static int heavy_read_access_here;

These I cannot move outside the file and I have different of them in different files, so when the program is compiled how can I know what will be stored in the memory adresses just above and below my variable?

hgm · Post by **hgm** » Wed Jun 03, 2009 10:25 pm

Do not use them.

If you want to exert control over what happens at machine-code level, object-oriented techniqus are a bad idea.

michiguel · Post by **michiguel** » Wed Jun 03, 2009 10:32 pm

mcostalba wrote:
hgm wrote:Put all your global variables in one structure. Then you control the memory allocation from the source code.
And file scope variables ?

namespace {

int heavy_read_access_here;

}

or if you prefer C style

static int heavy_read_access_here;

These I cannot move outside the file and I have different of them in different files, so when the program is compiled how can I know what will be stored in the memory adresses just above and below my variable?

I do not know what you do with "heavy_read_access_here", but a static read-write variable may not even be thread-safe to begin with. If it is a read-only variable, what is the problem to make it global?

Miguel

bob · Post by **bob** » Wed Jun 03, 2009 10:37 pm

michiguel wrote:
mcostalba wrote:
hgm wrote:Put all your global variables in one structure. Then you control the memory allocation from the source code.
And file scope variables ?

namespace {

int heavy_read_access_here;

}

or if you prefer C style

static int heavy_read_access_here;

These I cannot move outside the file and I have different of them in different files, so when the program is compiled how can I know what will be stored in the memory adresses just above and below my variable?
I do not know what you do with "heavy_read_access_here", but a static read-write variable may not even be thread-safe to begin with. If it is a read-only variable, what is the problem to make it global?

Miguel

I agree. this is getting way out in left field. The terms "file scope" and "threading" should not be used in the same brain, much less the same program. This is the way to create a lifetime career in debugging.

bob · Post by **bob** » Wed Jun 03, 2009 10:45 pm

Let's get some terminology set before this spirals out of control.

1. read-only variables are variables that are not modified by any execution thread, when more than one thread is executing. They can be computationally initialized in the original thread, but once the second thread is created, they must not be modified again. If this is true, then the variable is safe to use everywhere and will not cause any cache issues and it needs no lock protection.

2. read-write global variables are variables that can be read or written by any thread at any time. They are more problematic with respect to cache overhead because of cache-forwarding. And they absolutely must be protected by locks so that only one thread at a time can modify the value, although any thread can safely read it without a lock.

3. thread-local variables are read/write and are specific to each thread, so that no other thread can modify the values except for the thread they were defined in. No locks needed, no cache issues to deal with.

Note that "global" above doesn't necessarily mean the classic definition of global, the "global values" could be in an array of structures or any other sort of definition, but the global characteristic is that more than one thread might try to modify the same variable at the same time, meaning locking is required.

"file scope" does not even belong in the discussion. Has nothing to do with whether a variable is local or global from the above definitions. One also needs to _fully_ understand the concept of a "volatile" variable, particularly volatile pointers as opposed to pointers to volatile objects, since a threaded program might well use both, particularly related to chess.

The other cache-block issues being discussed are performance issues, not correctness issues...

diep · Post by **diep** » Thu Jun 04, 2009 12:48 am

Zach Wegner wrote:
diep wrote:Accessing the same cachelines in principle you ALWAYS should avoid whenever possible, as you never know which entity owns that cacheline and whatever that entity had done with it. With local cachelines there is no such problem.
Maybe in principle, but there are plenty of exceptions where you can do reads on shared memory without fear of hitting a dirty cache line. For example, if I'm deciding whether to split or not based on the current iteration number, the chance that it is in our cache but has been modified is virtually nil. In many other instances it's simply unavoidable, or you could simply do without shared memory.

Dead wrong, realize how much system time in a program like crafty goes to cache snoops in a 2 socket system. You want to increase that?

Easier is to have it local and not shared. So in case of AMD that's using a different, better, protocol (for 4 socket systems which is what i was told what most likely is the reason the Xeon MP didn't release yet, as this is tough to get bugfree). It uses the DEC Alpha system, which nowadays is a tad more complex than it was back some years ago. That's both true for intel as well as AMD.

They try all kind of tricks to avoid to ship that message from this memory controller to a remote memory controller. Let me try to say it simple.
The obvious trick to try is that when something is only local and not in any other L3 cache, that you don't need to ship a message at all to "possible update" for this cache line.

In any test you really try and measure very significant, doing things ONLY local at a core is ALWAYS better in multiple socket systems that are NUMA.

Nowadays that's both Intel as well as AMD.

NUMA is simply the cheapest manner of doing things and by now the OS-es schedule reasonably for it (not perfect).

So if we have a model of A and B where both A and B are memory controllers, obviously taking care that a cacheline is only at A and not shared with cores in B is the best.

In diep the evaluation tables and pawntables are local of course. Every core allocates its own tables. The hashtable also gets allocated local but shared across all cores. Note hashtable ALSO stores evaluation information.

If i use a shared hashtable, in theory that could improve the hitrate in evaluation table by 1% or so.

So when i'm garantueed of having fast shared memory, that sure is a point of improvement at single socket machines, but as things move currently it's not worth doing it.

I'm not locking a single table of course. Tim Mann's XOR gets used everywhere.

In Diep for cachelines i want to be sure that IN THIS cacheline can't get written in, i'm manually doing something like this:

Code: Select all

struct MovesMade {
  /* DIEP NUMA SMP */
                                        char dummycachelineabxle[CACHELINELENGTH];
  lockvar lock_initidlelist;
                                        char dummycachelinefoslo[CACHELINELENGTH];
  volatile int totalinitidle;
  volatile int initidleproccies[MAXPROCESSES];
                                        char dummycachelinexoiws[CACHELINELENGTH];
  int nidlelists;             // aantal idle lists
  volatile int uselistnr;     // processnummer waar idlelist is located voor deze proc
                                        char dummycachelineolkeh[CACHELINELENGTH];
                          // alles tricky in zelfde cacheline nu.
  volatile int totalidle; // totaal aantal CPUs dat idle is
                                        char dummycachelineikqzw[CACHELINELENGTH];
...

The first variable is a lock. Of course other processes might also lock it.

Let's suppose some other proces P1 tries to get the lock from P0.
P0 is just modifying datastructure and writing cacheline.

It means obviously that theoretical spoken it is possible that P1 overwrites data that P0 just tried to modify. To avoid that i'm having a cachelinesize difference between the variables.

Note the odds of this happening is rather tiny. In reality things work a tad more complex than the above theory, causing it to go wrong in less occasions than theoretical it should go wrong.

Which is why so many crap programs do not crash that much, but just seldom.

I hope that answers some questions i also saw in another thread here.

michiguel · Post by **michiguel** » Sat Jun 06, 2009 7:53 am

Gian-Carlo Pascutto wrote:
michiguel wrote: I am confused about your statement. "read/write variable in that same cache line" means in the same cache line to what exactly?
In the same cacheline as the read-only variable.

You as a programmer may think in terms of variables, but the memory subsystem of the CPU thinks in terms of cachelines. So if you have a read-write variable in the same cacheline as a read-only variable, the entire cacheline is effectively read-write.

Thanks Gian-Carlo, this is a pretty good summary and that is what I understood.
Thanks Bob for your detailed explanation, too.

Miguel

Read access contention

Re: Read access contention

Re: Read access contention

Re: Read access contention

Re: Read access contention

Re: Read access contention

Re: Read access contention

Re: Read access contention - clarification

Re: Read access contention

Re: Read access contention