The only thing I need is to be sure that B is seen by other CPUs to be written later than A. This can be accomplished bybob wrote:For each of the following, I want to execute the two following instructions:
mov A, eax
mov B, ebx
They appear exactly in the order in my program.
The _only_ way you can _guarantee_ that the the two stores are done _exactly_ in the order written is as follows:
mov A, eax
sfence
mov B, eax
sfence
Which incurs far more overhead than the original atomic lock I mentioned.
You can't get away with this:
mov A, eax
mov B, ebx
sfence
Because all the sfence instruction guarantees is that when you execute it, the processor hangs until both of the preceeding mov instructions have executed. But it does _not_ guarantee that B won't be written first. It just guarantees that before you proceed, _both_ will be written in unknown order.
Code: Select all
mov A, eax
sfence
mov B1, ebx
mov B2, ecx
...
mov Bn, edx
I didn't notice your mentioning this.
4 pages ago hgm wrote:Uncertainty about the order in which things are done by different processors is another matter than uncertainty about the order in which a single CPU does things. For the latter there is the sfence and similar instuctions. This does not lock the bus, as it pertains only to what happens in a single CPU.
Wrong again. You only have to put an sfence between them. And on x86 that isn't even needed.Because I don't believe you understand what is going on at this level. Which is OK, but it is not OK to just say "sfence" solves this when it most certainly does not.
So, if you want to guarantee the order of two stores, you have to put an sfence after each.
Not if reads are >10,000 times more frequent than writes. Then even a single cycle saved on the reads would earn back more than a 5,000-cycle penalty on a single write. So: good solution...And you completely stall the CPU while it is waiting on the write, and won't execute any other instructions out of order to fill in the idle cycles in the various pipelines. Lousy solution.
As Gian-Carlo pointed out, the manuals say differently.Otherwise you can use a lock where you write and a lock where you read, so that a reader can't access _either_ value until the lock is released. You still need an sfence (store fence) prior to clearing the lock or you can still get zapped since the instruction to clear the lock is a simple mov lock, 0 and the preceeding stores might not both be completed prior to that.
If you don't program in assembly language, you don't get access to the sfence stuff anyway, and unless you know you need it, you have subtle timing issues that are a royal pain to deal with, because they are very hard to detect.
Out of order writes are a x86 characteristic. No way around them unless you want to intersperse sfence instructions throughout your code and slow it down by at least a couple of orders of magnitude or more.
This you will have to explain. For one, you needed the sfence to clear the atomic lock as well, so how does that make atomic locks better? (Suppose for the sake of argument that we are talking about other systems than x86, where sfences would be needed.)That is exactly what I have been saying all along. Peterson's code has a problem on X86 without using sfence. In fact, any program that depends on A being written before B (in the above example) has a problem unless you are willing to slow things to a crawl with sfence.
As said before, write cost was not a problem, as writes were virtually never done.That is a simple explanation of the problem I have been hitting on from day one. I have _repeatedly_ said "this is not a cache issue" because it has _nothing_ to do with cache. If cache never sees the mov A, eax write, it can't do a thing to help us. And it won't always see that write before the write to B. Hence no order guarantee and cache can't do a thing about it.
Note also that it does not matter at all what the state of A and B are with respect to other caches or even the cache on the current processor. Neither might be cached anywyere. Both might be caches as shared everywhere. The problem still persists in exactly the same way with no help from cache to solve it at all.
Again, you have the out-of-order problem unless you sfence _each_ write on the writer end. And that is worse, still, as I said. Locks don't stall the CPU, while sfence instructions halt everything.