Sync question

bob · Post by **bob** » Fri Jul 24, 2009 6:59 pm

Aleks Peshkov wrote:I do not know how the thought is callled, but it seems obvious.

What about reading the same complex value twice? If the result is different repeat until read is consistent.

The same is for writing stage: reread the result of write, until it is synchronized with what is required to write.

Same problem. race, race, race. You can't pull this off on X86 without a lock. within a single CPU writes are done out of order. With multiple cpus, you get multiple out-of-order write permutations...

That's why we have atomic locks (lock prefix on X86) in the first place.

bob · Post by **bob** » Fri Jul 24, 2009 7:00 pm

hgm wrote:Taking another look at the description of the load/store unit makes it very unlikely an sfence instruction is needed here:

http://www.anandtech.com/cpuchipsets/in ... i=2748&p=5

This confirms that neither AMD nor Intel architectures allow stores to pass each other (as I already thought). So if you write A before B there should never be a situation where your cache contains an old A and an updated B. So no matter when these changes become "globally visible", it should not be possible for a reader to obtain a cache line where B has changed and A not.

Sfence is not needed; it does much more than needed. It delays the second store until after others could have seen the first. We don't care about that here; it is perfectly OK to change the the second even before other readers could have seen the change of the first. As long as they could not see them change in the wrong order eventually. Which should be the case if they are written in order into the same cache line.

The problem is not cache. The problem is memory. If you write A, and then B, Intel will most definitely be able to push B to memory before A. You can find lots of information on out of order writes.

hgm · Post by **hgm** » Fri Jul 24, 2009 7:30 pm

The MESI protocol has taken care of this for more than a decade now. It doesn't matter at all in which order item are flushed from cache to memory. A reader will just not be allowed to access stale memory before the cache line is completely written. Snooping hardware will see to it that being in cache is for all intents and purposes equivalent to being in memory.

If there is a problem, it is certainly not in that area.

bob · Post by **bob** » Fri Jul 24, 2009 8:53 pm

hgm wrote:The MESI protocol has taken care of this for more than a decade now. It doesn't matter at all in which order item are flushed from cache to memory. A reader will just not be allowed to access stale memory before the cache line is completely written. Snooping hardware will see to it that being in cache is for all intents and purposes equivalent to being in memory.

If there is a problem, it is certainly not in that area.

There are several problems. Let me give you a very simple case to think about.

We have 3 words of memory, call 'em A, B and C. I claim that no matter what order you write them in, they can show up in memory in a _different_ order. Here is one way, there are others:

First I store A, which is the first word on cache line X. Then I store B, which is the first word on cache line Y. Then I store C which is the second word in cache line X.

Both lines get invalidated because they get replaced by something else. Which one gets invalidated first? Who knows. But that one gets written back first. If cache line X is written first, then A and C get updated in memory before C. If Line Y is written first, then B gets written before A and C.

Look up "write-combining buffers" for lots of examples where this can be a problem. not if you use atomic locks, but fences don't solve the problem. A fence just forces all writes to be done before you proceed beyond the fence. But that is "all writes" from the processor's perspective, if you haven't yet done a write because the optimizer moved it around, then you will see perhaps wrong results.

Bottom line is quite simple. If you want to have readers and writers, and make certain that a reader can't access the data if a writer has modified it, you need to use the classic mutex/atomic-lock critical-section approach. Any attempt to bypass the overhead of the lock is going to lead to many very subtle (and perhaps rare, but not never) bugs requiring significant debugging time. The right way is to simply prevent the problem from happening in a safe and reliable way, and either eliminate the need for the critical section (the lockless hashing idea is one example of this) or else make the program immune to failure if the writes are not all completed before a reader starts accessing the data.

To tell anyone to do this in any other way is simply doing them a disservice and will lead to much unnecessary debugging. And indeed sometimes these race windows are so very narrow they almost never occur. But "almost never" is not "never" and leaving such a window of opportunity for failure is not the way to write reliable software.

To understand this problem, you have to step back one or two steps further than you currently are. An individual CPU executes things out of order already, so getting out of order stores is already common. Then you have multiple instruction streams with multiple cores or packages, further adding to this. And then cache introduces its own set of quirks by trying to optimize memory writes with the write-combining stuff, I can't safely read something and assume that if it is unchanged, the other "associated data" is also unchanged. There are too many possible pathways that violate this.

There's some good info in the linux kernel documentation, where they talk about this issue specifically and "at length" and complain bitterly about how lax the X86 memory write policies really are...

As far as your MESI (or really MESIF now for Intel or MOESI for AMD) comment goes, it does address some things. But the hole remains since you can't know what order things are physically executed in, and how cache modifies the write order of that already rearranged stream.

hgm · Post by **hgm** » Fri Jul 24, 2009 10:37 pm

You argue as if it is of any importance in which order things are written to memory. And this is completely wrong, as it is not. In fact it is totally irrelevant. What is relevant is in which order the changes can appear in the registers of a sibbling CPU.

In your exmple cache line X is invalidated in all other CPU caches when you write A, and again when you write C (in case they re-obtained it), and cache line Y is invalidated when you write B. The only way other CPUs can obtain the data is directly or indirectly from the CPU that owns it, i.e. the one that has written in them. There is just no way it could get an unchanged B and on a later read a changed A. Even if line X is written to memory first, putting A and C there, a reader will not be able to read the stale B. That became impossible as soon as B was written to cache. Even long before X was written to memory.

The MESI protocol makes caching transparent in an SMP environment. "Transparent" means that the system acts like any write goes instantly to memory, and the caches do not exist. So it is only the order in which the CPU performs the writes that matters. You are imagining problems where there are none.

bob · Post by **bob** » Fri Jul 24, 2009 10:54 pm

hgm wrote:You argue as if it is of any importance in which order things are written to memory. And this is completely wrong, as it is not. In fact it is totally irrelevant. What is relevant is in which order the changes can appear in the registers of a sibbling CPU.

In your exmple cache line X is invalidated in all other CPU caches when you write A, and again when you write C (in case they re-obtained it), and cache line Y is invalidated when you write B. The only way other CPUs can obtain the data is directly or indirectly from the CPU that owns it, i.e. the one that has written in them. There is just no way it could get an unchanged B and on a later read a changed A. Even if line X is written to memory first, putting A and C there, a reader will not be able to read the stale B. That became impossible as soon as B was written to cache. Even long before X was written to memory.

The MESI protocol makes caching transparent in an SMP environment. "Transparent" means that the system acts like any write goes instantly to memory, and the caches do not exist. So it is only the order in which the CPU performs the writes that matters. You are imagining problems where there are none.

I don't have a clue what you are talking about. If data is not in anyone's cache, and in the example I gave it was not, then the order it appears in the registers is the order it is loaded from memory. And the values in memory are not necessarily present in the supposed order they were stored in. You can not make assumptions about the order stores are done in a C program, the original topic, and from that extrapolate the order things get modified in memory. It just doesn't work that way. Too much asynchronous hardware, from CPUs, to multiple levels of cache, to optimizer issues.

That is, and has been my only point. You can write things in any order you want, the cpu will write them in the order _it_ wants, and if the two are not compatible with the "reader/writer tricks" you try to play, you get burned...

If things worked as you claim, the lockless hashing algorithm would _never_ bee needed. Yet everyone that has tested this has verified that the problem is there, including some Intel engineers I have had discussions with. I believe that when this subject arose last time, someone (Zach I think) wrote a test program to confirm the problem. It is not extremely common, but anything > 0 is not acceptable if it can cause program problems such as crashes or whatever.

MESI doesn't solve race conditions, it just makes certain that as far as the hardware is concerned there is a coherent image of all data in memory and that two different caches won't have two different values for the same address. Doesn't do a thing for programs that use more than one CPU and get into race conditions. Was never intended to do so. That's where the atomic lock prefix comes in, when used properly.

hgm · Post by **hgm** » Fri Jul 24, 2009 11:09 pm

I have no idea how you jump to that conclusion. If caches would not exist, and out-of-order writes would never occur... There still is data corruption when two different processes do non-atomic writes to the same variables. You will need some form of locking to handle that without any risk for data corruption.

There are safe ways to implement locks without the aid of atomic read-modify-write instructions, (i.e. bus locking), though. Solely based on the order in which variables are read or written. Such as Peterson's algorithm, for example. ( http://en.wikipedia.org/wiki/Peterson's_algorithm )

wgarvin · Post by **wgarvin** » Sat Jul 25, 2009 3:27 am

bob wrote:Look up "write-combining buffers" for lots of examples where this can be a problem. not if you use atomic locks, but fences don't solve the problem. A fence just forces all writes to be done before you proceed beyond the fence. But that is "all writes" from the processor's perspective, if you haven't yet done a write because the optimizer moved it around, then you will see perhaps wrong results.

Some compilers have intrinsics you can use to tell them not to re-order things. Microsoft's compiler for example, has _ReadBarrier, _WriteBarrier and _ReadWriteBarrier intrinsics that don't emit any instructions, all they do is tell the compiler not to move reads or writes across the point in the source where you invoked the intrinsic.

bob · Post by **bob** » Sat Jul 25, 2009 4:45 am

hgm wrote:I have no idea how you jump to that conclusion. If caches would not exist, and out-of-order writes would never occur... There still is data corruption when two different processes do non-atomic writes to the same variables. You will need some form of locking to handle that without any risk for data corruption.

There are safe ways to implement locks without the aid of atomic read-modify-write instructions, (i.e. bus locking), though. Solely based on the order in which variables are read or written. Such as Peterson's algorithm, for example. ( http://en.wikipedia.org/wiki/Peterson's_algorithm )

I understand those. Just not on X86.

But the issue here is "is there a way to prevent readers from reading while a writer is writing, or prevent a writer from writing, while readers are reading, without a lock?" On X86, the answer is "no". That was the original question, and the correct answer to that question.

bob · Post by **bob** » Sat Jul 25, 2009 4:47 am

wgarvin wrote:
bob wrote:Look up "write-combining buffers" for lots of examples where this can be a problem. not if you use atomic locks, but fences don't solve the problem. A fence just forces all writes to be done before you proceed beyond the fence. But that is "all writes" from the processor's perspective, if you haven't yet done a write because the optimizer moved it around, then you will see perhaps wrong results.
Some compilers have intrinsics you can use to tell them not to re-order things. Microsoft's compiler for example, has _ReadBarrier, _WriteBarrier and _ReadWriteBarrier intrinsics that don't emit any instructions, all they do is tell the compiler not to move reads or writes across the point in the source where you invoked the intrinsic.

Again, I understand that idea as well. But it doesn't solve the problem. Even though the compiler won't move writes outside the boundary you set, there is still the problem of stores happening out of order inside the architecture.

Sync question

Re: Sync question

Re: Sync question

Re: Sync question

Re: Sync question

Re: Sync question

Re: Sync question

Re: Sync question

Re: Sync question

Re: Sync question

Re: Sync question