volatile?

Discussion of chess software programming and technical issues.

Moderator: Ras

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: volatile?

Post by bob »

syzygy wrote:
bob wrote:
syzygy wrote:
With my volatile example, it might miss one read due to the race, the next read will get the correct value. Once the write is done.
Aha! So while the write "is in progress" the old value can stil be read by some CPUs while other CPUs already see the new value. Woohoo. So much for your original statement..
My original statement: ONCE the write is done, everybody gets new value. That has not changed one scintilla.
That is a modified statement with exactly zero content. You now define "write is done" as "everybody sees the new value".

Originally we talked about this:
bob wrote:
syzygy wrote:
hgm wrote:Returning the local value in the core's private cache is always fine. If that wasn't the currently valid value (because it was changed in DRAM, a shared higher cache level or some other core's private cache), it would be no longer in your cache.
Already on x86 there is no general guarantee that the value read from cache is identical to the value stored in DRAM by another thread, especially if you consider multi-socket systems. What is guaranteed (on x86, not on other architectures) is that if CPU1 writes to A and then to B, and (some small time later) CPU2 reads B and then A, it will retrieve the new value from A if it retrieved the new value from B. But it is OK if it retrieves old (cached) values for both A and B or if it retrieves the old value for B and the new value for A.
It is NOT ok to retrieve old values. The caches on Intel SPECIFICALLY prevent this by their snooping and inter-cache forwarding. Where is this stuff coming from? On Intel, the value you read will be the LAST value written by any other CPU. That's guaranteed.
What I described to HGM is completely correct. What you wrote is not. Period.
Please. You have TWO choices.

(1) just do a plain write, allowing for the fact that the write from the CPU is not done instantly, and that there is a tiny window measured in nanoseconds, where the old value is available in some caches after the cpu writes a new value.

If that is a concern;

(2) use a lock-prefix or an xchg (which automatically has a lock prefix). Then there is ZERO window to get the old value. The CPU informs the cache controller "I am fixing to write to this address, I want exclusive control." The cache controller negotiates with ALL other cache controllers to obtain exclusive control, all other controllers guarantee it has been invalidated, then the local cache controller tells the cpu "proceed". Zero window. When the CPU does the write, no other CPU will ever get anything but the newly written value.

the lock prefix has a bit of overhead, but NOTHING like the overhead involved in the pthread_mutex_lock() blocking and unblocking a thread.

SO, if you don't like the tiny window where the old value might be available for a very few nanoseconds, then close the window to zero with a lock prefix. Is that so hard to grasp? This has ALWAYS been about cache coherency. I don't have any problem with the stale value showing up a couple of times until my cache controller gets around to invalidating the value and then fetches the correct value from the cache that modified it. Except for atomic locks where that would be fatal. So there I use an xchg which guarantees no old value will be read by ANYONE once the cpu executes the instruction that writes the value back to the local cache.

What, exactly, is your problem here? You can do it either way. I chose to avoid the lock prefix overhead because it does not cause any difficulty in my "do I have work" spin loop. I chose to incur the overhead for the atomic spin lock outer loop because the guarantee of coherency is necessary to make it work.

What YOU describe is not forced. You CAN guarantee that if you write to A with a "mov $1, a" that there is a tiny window where another CPU can get the old value. But if you use (say) "mov $1, %eax; xchg %eax, a" then you guarantee this will NEVER happen. The CPU doesn't execute the xchg until it has ownership of the cache block containing a, and ALL other caches have verified they have invalidated the block if they had it."

jeez, this is NOT that complicated. Do it whichever way you want. If you insist, you will NEVER see a stale value. Exactly as I have said, over and over.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: volatile?

Post by bob »

Daniel Shawul wrote:
Rémi Coulom wrote:
Daniel Shawul wrote:There are test_ant_set atomic intrinsics to achieve just that. In windows I use the below to get spinlocks

Code: Select all

#             define l_lock(x)     while(InterlockedExchange((LPLONG)&(x),1) != 0) {while((x) != 0);}
Also I don't understand why the bool is declared atomic and not volatile instead.
Don't know about C++11 but in C++ bool is probably atomic anyway so what it needs is to be told that it is modifiable simultaneously from different threads.
My impression is that instead of

Code: Select all

while((x) != 0);
It is recommended to do

Code: Select all

while((x) != 0)
    YieldProcessor();
The inner loop is an optimization that works only on cache-coherent memories because it relies on a cache copy of x. More here http://www.cis.temple.edu/~ingargio/cis ... insem.html

Yielding execution to a different thread, SwitchToThread, may be slow for spinlocks which are supposed to be fast and hold the critical section for a small time. I use thread_yield when I run more number of threads than cores (e.g an input polling thread on clusters or when using hyper-threading),in which case busy waiting is a waste of valuable processor time.
I think he was referring to the hardware "pause" instruction, which is only useful on a hyper-threaded system. The "pause" says "switch from this logical core to the other logical core on the same physical core, unless that core is blocked waiting on memory or whatever."
syzygy
Posts: 5978
Joined: Tue Feb 28, 2012 11:56 pm

Re: volatile?

Post by syzygy »

bob wrote:Please. You have TWO choices.

(1) just do a plain write, allowing for the fact that the write from the CPU is not done instantly, and that there is a tiny window measured in nanoseconds, where the old value is available in some caches after the cpu writes a new value.

If that is a concern;
That is not a concern but just a FACT.

All I did was state a FACT.

Scroll up and check for yourself.

As usual, you reflexively call out "you're all wrong", a hopeless discussion ensues, it somehow dooms on you that your position is not tenable, and you start arguing some completely different point hoping that I wil go along. That probably works with your students. It does not work with intelligent people.

Next time you see a discussion between adults, please just walk by.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: volatile?

Post by bob »

syzygy wrote:
bob wrote:Please. You have TWO choices.

(1) just do a plain write, allowing for the fact that the write from the CPU is not done instantly, and that there is a tiny window measured in nanoseconds, where the old value is available in some caches after the cpu writes a new value.

If that is a concern;
That is not a concern but just a FACT.

All I did was state a FACT.

Scroll up and check for yourself.

As usual, you reflexively call out "you're all wrong", a hopeless discussion ensues, it somehow dooms on you that your position is not tenable, and you start arguing some completely different point hoping that I wil go along. That probably works with your students. It does not work with intelligent people.

Next time you see a discussion between adults, please just walk by.
Notice the "fact" you mention can be eliminated with an xchg or any of the other read/write instructions with a lock prefix. So it isn't a "fact" fat all...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: volatile?

Post by bob »

syzygy wrote:
bob wrote:"using volatile is not necessary" if you don't care about high-performance. Simple as that.
Next time please do this on twitter instead of making irrelevant statements in technical threads.
That should be your place. You merely quote standards, don't read the hardware technical references, and give advice that is poor at best, wrong at worst.

If high performance is the issue, ALL of the available tools should be considered, not just the junky stuff in the pthread library, which works, but not very efficiently. I use what works best in the given situation, I don't run into trouble doing so.
syzygy
Posts: 5978
Joined: Tue Feb 28, 2012 11:56 pm

Re: volatile?

Post by syzygy »

bob wrote:
syzygy wrote:
bob wrote:Please. You have TWO choices.

(1) just do a plain write, allowing for the fact that the write from the CPU is not done instantly, and that there is a tiny window measured in nanoseconds, where the old value is available in some caches after the cpu writes a new value.

If that is a concern;
That is not a concern but just a FACT.

All I did was state a FACT.

Scroll up and check for yourself.

As usual, you reflexively call out "you're all wrong", a hopeless discussion ensues, it somehow dooms on you that your position is not tenable, and you start arguing some completely different point hoping that I wil go along. That probably works with your students. It does not work with intelligent people.

Next time you see a discussion between adults, please just walk by.
Notice the "fact" you mention can be eliminated with an xchg or any of the other read/write instructions with a lock prefix. So it isn't a "fact" fat all...
My statement: "there is no general guarantee that ...". Ok, too difficult so never mind.
syzygy
Posts: 5978
Joined: Tue Feb 28, 2012 11:56 pm

Re: volatile?

Post by syzygy »

bob wrote:
syzygy wrote:
bob wrote:"using volatile is not necessary" if you don't care about high-performance. Simple as that.
Next time please do this on twitter instead of making irrelevant statements in technical threads.
That should be your place. You merely quote standards, don't read the hardware technical references, and give advice that is poor at best, wrong at worst.

If high performance is the issue, ALL of the available tools should be considered, not just the junky stuff in the pthread library, which works, but not very efficiently. I use what works best in the given situation, I don't run into trouble doing so.
High performance was simply not the issue. Read the thread until my post.............. yawn.

Btw, why don't you start posting on the LKML:
https://www.kernel.org/doc/Documentatio ... armful.txt
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: volatile?

Post by bob »

syzygy wrote:
bob wrote:
syzygy wrote:
bob wrote:"using volatile is not necessary" if you don't care about high-performance. Simple as that.
Next time please do this on twitter instead of making irrelevant statements in technical threads.
That should be your place. You merely quote standards, don't read the hardware technical references, and give advice that is poor at best, wrong at worst.

If high performance is the issue, ALL of the available tools should be considered, not just the junky stuff in the pthread library, which works, but not very efficiently. I use what works best in the given situation, I don't run into trouble doing so.
High performance was simply not the issue. Read the thread until my post.............. yawn.

Btw, why don't you start posting on the LKML:
https://www.kernel.org/doc/Documentatio ... armful.txt
I'd bet performance IS important for MOST of us in computer chess. Hence my comment. As far as the LKML goes, I post there if I have something to discuss dealing with the kernel.

Pretty funny discussion however, since the KERNEL uses volatile for its own spin locks among other things..
syzygy
Posts: 5978
Joined: Tue Feb 28, 2012 11:56 pm

Re: volatile?

Post by syzygy »

bob wrote:I'd bet performance IS important for MOST of us in computer chess.
In so far as it was a factor in Lucas' question, the answer is that the use of volatiles in Senpai only slows down the engine. (I've tested and #define'ing volatile away gave a very slight speedup, just 0.5%.)
Hence my comment.
The one where you argued that the compiler won't reload non-volatiles after a pthread_mutex_lock()? Right...
As far as the LKML goes, I post there if I have something to discuss dealing with the kernel.

Pretty funny discussion however, since the KERNEL uses volatile for its own spin locks among other things..
Your insight in these matters is just amazing.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: volatile?

Post by bob »

syzygy wrote:
bob wrote:I'd bet performance IS important for MOST of us in computer chess.
In so far as it was a factor in Lucas' question, the answer is that the use of volatiles in Senpai only slows down the engine. (I've tested and #define'ing volatile away gave a very slight speedup, just 0.5%.)
Hence my comment.
The one where you argued that the compiler won't reload non-volatiles after a pthread_mutex_lock()? Right...

Correct. If you include the pthread_mutex_lock source into the compiled program, it will optimize right across it, because it can see all the side effects (or the lack thereof). And it is not JUST pthread_mutex_lock() where this is a problem. You claim the compiler has specific knowledge about that procedure. I claim it does not and I actually looked for any reference in gcc 4.7 source. Not there. Of course it doesn't optimize across ANY procedure call where it can't see the procedure source, since it doesn't know about side effects.
As far as the LKML goes, I post there if I have something to discuss dealing with the kernel.

Pretty funny discussion however, since the KERNEL uses volatile for its own spin locks among other things..
Your insight in these matters is just amazing.
Just factual. Want me to show you the volatile ints in the kernel source for spin locks?