Please. You have TWO choices.syzygy wrote:That is a modified statement with exactly zero content. You now define "write is done" as "everybody sees the new value".bob wrote:My original statement: ONCE the write is done, everybody gets new value. That has not changed one scintilla.syzygy wrote:Aha! So while the write "is in progress" the old value can stil be read by some CPUs while other CPUs already see the new value. Woohoo. So much for your original statement..With my volatile example, it might miss one read due to the race, the next read will get the correct value. Once the write is done.
Originally we talked about this:What I described to HGM is completely correct. What you wrote is not. Period.bob wrote:It is NOT ok to retrieve old values. The caches on Intel SPECIFICALLY prevent this by their snooping and inter-cache forwarding. Where is this stuff coming from? On Intel, the value you read will be the LAST value written by any other CPU. That's guaranteed.syzygy wrote:Already on x86 there is no general guarantee that the value read from cache is identical to the value stored in DRAM by another thread, especially if you consider multi-socket systems. What is guaranteed (on x86, not on other architectures) is that if CPU1 writes to A and then to B, and (some small time later) CPU2 reads B and then A, it will retrieve the new value from A if it retrieved the new value from B. But it is OK if it retrieves old (cached) values for both A and B or if it retrieves the old value for B and the new value for A.hgm wrote:Returning the local value in the core's private cache is always fine. If that wasn't the currently valid value (because it was changed in DRAM, a shared higher cache level or some other core's private cache), it would be no longer in your cache.
(1) just do a plain write, allowing for the fact that the write from the CPU is not done instantly, and that there is a tiny window measured in nanoseconds, where the old value is available in some caches after the cpu writes a new value.
If that is a concern;
(2) use a lock-prefix or an xchg (which automatically has a lock prefix). Then there is ZERO window to get the old value. The CPU informs the cache controller "I am fixing to write to this address, I want exclusive control." The cache controller negotiates with ALL other cache controllers to obtain exclusive control, all other controllers guarantee it has been invalidated, then the local cache controller tells the cpu "proceed". Zero window. When the CPU does the write, no other CPU will ever get anything but the newly written value.
the lock prefix has a bit of overhead, but NOTHING like the overhead involved in the pthread_mutex_lock() blocking and unblocking a thread.
SO, if you don't like the tiny window where the old value might be available for a very few nanoseconds, then close the window to zero with a lock prefix. Is that so hard to grasp? This has ALWAYS been about cache coherency. I don't have any problem with the stale value showing up a couple of times until my cache controller gets around to invalidating the value and then fetches the correct value from the cache that modified it. Except for atomic locks where that would be fatal. So there I use an xchg which guarantees no old value will be read by ANYONE once the cpu executes the instruction that writes the value back to the local cache.
What, exactly, is your problem here? You can do it either way. I chose to avoid the lock prefix overhead because it does not cause any difficulty in my "do I have work" spin loop. I chose to incur the overhead for the atomic spin lock outer loop because the guarantee of coherency is necessary to make it work.
What YOU describe is not forced. You CAN guarantee that if you write to A with a "mov $1, a" that there is a tiny window where another CPU can get the old value. But if you use (say) "mov $1, %eax; xchg %eax, a" then you guarantee this will NEVER happen. The CPU doesn't execute the xchg until it has ownership of the cache block containing a, and ALL other caches have verified they have invalidated the block if they had it."
jeez, this is NOT that complicated. Do it whichever way you want. If you insist, you will NEVER see a stale value. Exactly as I have said, over and over.