Questions on volatile keyword and memory barriers

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Questions on volatile keyword and memory barriers

Post by bob »

Gian-Carlo Pascutto wrote:
bob wrote: Without volatile, you end up inserting barriers at inconvenient locations and break other optimizations, where all I want to guarantee is that when I test the value, I test the most current value, not one saved in a register for what could be several minutes.
Volatile will basically inhibit optimization on all operations involving the variable. The barrier can put be there only when it's needed. This is a disadvantage of volatile, not an advantage.
How is it a disadvantage? That is _exactly_ why I use it, because it can _not_ be optimized.

But I don't want barrier() calls scattered all over my code to make it harder to read and more likely that I will forget to use one (the often-quoted "sin of omission" in parallel programming).
You can just as well make the opposite argument and say that using volatile will hide the fact that it's parallel from the code and make it more likely that you forget to lock something, whereas the barrier forces you to think it over on every access.

In Deep Sjeng I just lock, and for those few variables who need to be checked every node so locking them is too expensive (is_aborted), I use accessor functions.
I lock where appropriate. But there are cases where locking is not needed. This is not something that happens in an O/S, which is why Linux uses the barrier() approach. But in chess, I am willing to eliminate the overhead of unnecessary barriers, and accept the risk that I might just miss the flag getting set and having to wait until another node is searched before I see the change.

An operating system really doesn't have many cases where that might apply. Perhaps waiting for a buffer when all buffers are allocated, but it is pretty infrequent.

One could make the argument that the best way to share stuff is not via threads, but via the system V shared memory facility, where you put all shared data into a shared block of memory and use a pointer (I used to use shared->variable when I was using processes rather than threads) to make this obvious.

But volatile is cleaner, if you know what you are doing. Because it gives you the opportunity to avoid excessive overhead (barriers/fences thrown in the code in places where it is really not necessary) if you are careful to lock where needed.

I used to have alpha-specific lock/unlock functions in Crafty, where we had to include a barrier else the lock was not safe. So the issues are always there. But I do not ever depend on write order or read order or read/write order being done a certain way for my program to execute correctly. Which is what some of the earlier lockless approaches depended on. Volatile won't cure all evils. Neither will generic barriers. So it is a matter of taste. I personally like the volatile approach although I don't have very many such variables in Crafty since they are not needed. Looks like there are 8 or 9 total, although about half are in a structure that is replicated many times (split blocks).
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: Questions on volatile keyword and memory barriers

Post by wgarvin »

oops.. double post
Last edited by wgarvin on Sat Aug 22, 2009 7:10 am, edited 1 time in total.
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: Questions on volatile keyword and memory barriers

Post by wgarvin »

Gian-Carlo Pascutto wrote:
bob wrote: Without volatile, you end up inserting barriers at inconvenient locations and break other optimizations, where all I want to guarantee is that when I test the value, I test the most current value, not one saved in a register for what could be several minutes.
Volatile will basically inhibit optimization on all operations involving the variable. The barrier can put be there only when it's needed. This is a disadvantage of volatile, not an advantage.
I don't see why its a disadvantage. As mentioned earlier, you can cast the address of the variable to a volatile pointer at the place where you access it. The opposite should work too (casting away volatileness to do a normal access to a volatile variable) but is arguably more hackish.

Anyways, I can't really think of a situation where inhibiting optimization on an inter-thread communication variable of this sort is a bad thing. Volatile means the thing being accessed does not behave like ordinary RAM in a single-threaded program. The value returned by reads might spontaneously change, and there might be side effects caused by a read or a write that the compiler knows nothing about. This conservative treatment seems reasonable for a marker variable (e.g. an "abort the search please" flag). If you want to do some intermediate calculations on it followed by one store, you can always do that in a non-volatile temporary of the same type and then assign it once to the volatile variable. If you want to read from the variable once and then use the value that was read in a bunch of different expressions, copying it into a temporary is a fine way to do that (and also nice and explicit).

In short, if the reading and writing of this variable is "special", then I actually want to see it in the source code: two reads means two reads. If I only want to read once, then there should only be one rvalue use of the variable in that bit of code. etc.
Gian-Carlo Pascutto wrote:
But I don't want barrier() calls scattered all over my code to make it harder to read and more likely that I will forget to use one (the often-quoted "sin of omission" in parallel programming).
You can just as well make the opposite argument and say that using volatile will hide the fact that it's parallel from the code and make it more likely that you forget to lock something, whereas the barrier forces you to think it over on every access.
That might be true if you put the volatile on the variable declaration and then access it directly. You could make things clearer by either casting it to a volatile type at the site of each access (in which case marking the variable itself as volatile becomes optional, but is probably still a good idea in case you forget the cast somewhere). You could also choose a name that will remind you of its unusual semantics (e.g. "g_VolatileAbortFlag"). Or alternatively, you could make macros or inline functions and use those to access the variable, so that it doesn't look like an "ordinary" variable (principle of least surprise!).


Anyway, a lot of this stuff comes down to personal taste. Some people don't like the syntax of the volatile casts. They are occasionally useful, but there are almost always alternate solutions. I like volatile sometimes because it requires very few lines of code for simple things like sending "please die now" requests to a thread.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Questions on volatile keyword and memory barriers

Post by bob »

wgarvin wrote:
Gian-Carlo Pascutto wrote:
bob wrote: Without volatile, you end up inserting barriers at inconvenient locations and break other optimizations, where all I want to guarantee is that when I test the value, I test the most current value, not one saved in a register for what could be several minutes.
Volatile will basically inhibit optimization on all operations involving the variable. The barrier can put be there only when it's needed. This is a disadvantage of volatile, not an advantage.
I don't see why its a disadvantage. As mentioned earlier, you can cast the address of the variable to a volatile pointer at the place where you access it. The opposite should work too (casting away volatileness to do a normal access to a volatile variable) but is arguably more hackish.

Anyways, I can't really think of a situation where inhibiting optimization on an inter-thread communication variable of this sort is a bad thing. Volatile means the thing being accessed does not behave like ordinary RAM in a single-threaded program. The value returned by reads might spontaneously change, and there might be side effects caused by a read or a write that the compiler knows nothing about. This conservative treatment seems reasonable for a marker variable (e.g. an "abort the search please" flag). If you want to do some intermediate calculations on it followed by one store, you can always do that in a non-volatile temporary of the same type and then assign it once to the volatile variable. If you want to read from the variable once and then use the value that was read in a bunch of different expressions, copying it into a temporary is a fine way to do that (and also nice and explicit).

In short, if the reading and writing of this variable is "special", then I actually want to see it in the source code: two reads means two reads. If I only want to read once, then there should only be one rvalue use of the variable in that bit of code. etc.
Gian-Carlo Pascutto wrote:
But I don't want barrier() calls scattered all over my code to make it harder to read and more likely that I will forget to use one (the often-quoted "sin of omission" in parallel programming).
You can just as well make the opposite argument and say that using volatile will hide the fact that it's parallel from the code and make it more likely that you forget to lock something, whereas the barrier forces you to think it over on every access.
That might be true if you put the volatile on the variable declaration and then access it directly. You could make things clearer by either casting it to a volatile type at the site of each access (in which case marking the variable itself as volatile becomes optional, but is probably still a good idea in case you forget the cast somewhere). You could also choose a name that will remind you of its unusual semantics (e.g. "g_VolatileAbortFlag"). Or alternatively, you could make macros or inline functions and use those to access the variable, so that it doesn't look like an "ordinary" variable (principle of least surprise!).


Anyway, a lot of this stuff comes down to personal taste. Some people don't like the syntax of the volatile casts. They are occasionally useful, but there are almost always alternate solutions. I like volatile sometimes because it requires very few lines of code for simple things like sending "please die now" requests to a thread.
My thinking here is that C was _never_ designed to be a "wordy cobol-like" language. It was designed to be concise and clear. And IMHO volatile meets that requirement perfectly. Inserting barrier() all over the code is completely anti-C. If I know that a variable is used in a volatile way, I simply declare it so and keep on trucking. If I know that multiple threads modify a variable, or a structure, it has to be declared volatile _and_ it needs to be locked/unlocked to prevent interleaved update issues.

The only thing missing in the C standard is that a write to a volatile variable could be preceeded by a sfence/wbr/etc type instruction to make sure all previous writes are done. However, I am happy this was not done because I do not want all writes to a volatile have a fence in front of them, I would rather add that to my unlock() which is the place where the fence is important, for architectures like the alpha...

C was really intended to be one step above assembly language, and it does that surprisingly well.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Questions on volatile keyword and memory barriers

Post by diep »

Gian-Carlo Pascutto wrote:I believe that barriers can indeed replace volatile (and have better defined semantics).

You can find some very interesting discussion by Linus Torvalds about this on the Linux Kernel mailing list.
I'd argue with volatile you can atomically count at the cacheline invalidation happens 100% correct. That's not the same as that other cores also read things in the correct order. Maybe they read things as being in a specific state at the same time isn't it?

Even though extra overhead is practically never needed, in most theory you see as an extra security that locking happens, in short instructions that for sure flushes all those instructions and ram.

Nowadays cpu's can reorder instructions pretty agressive. It wouldn't be the first time that reorder mistakes do get made there. We see ever bigger pressure at manufacturers to quickly release cpu's. Verifying them for 100% to be bugfree with just memory operands must be hell.

So if you're programming software where lives could depend upon, it sure isn't enough to just use volatile, as proof that it works bugfree is weak.

However in chess we do want to risk this and go for the fastest thing which is simply go for volatile. End 2005 (december) i figured out somewhere in the parallel code of Diep a bug where 2 volatile memory operands were in wrong order getting written by accident (implementation bug). It somehow never crashed diep and after a long quest i figured out the reason for that had to be the fact that 1 instruction later i locked something else, which also would for sure flushes all this.

In short the atomic speed at which the processor functions would always write things with a delay to RAM, only the invalidation happened atomic, taking care the order never was a problem for other cores and processors hammering onto these cachelines.

In short if you have somewhere a necessity to first set a variable

A to false
and only THEN variable B to false,

So the illegal state is A== true && B == false.

In your code:

A= false;
B= false;

Maybe it is entirely possible that under some conditions you will never be able to read at another processor the 'in between' state: A == false && B == true.

At least that is what i concluded.

The interesting thing to measure is how many times a second you can flip in these manners with 2 threads a second a variable.
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: Questions on volatile keyword and memory barriers

Post by Zach Wegner »

diep wrote:
Gian-Carlo Pascutto wrote:I believe that barriers can indeed replace volatile (and have better defined semantics).

You can find some very interesting discussion by Linus Torvalds about this on the Linux Kernel mailing list.
I'd argue with volatile you can atomically count at the cacheline invalidation happens 100% correct. That's not the same as that other cores also read things in the correct order. Maybe they read things as being in a specific state at the same time isn't it?

Even though extra overhead is practically never needed, in most theory you see as an extra security that locking happens, in short instructions that for sure flushes all those instructions and ram.

Nowadays cpu's can reorder instructions pretty agressive. It wouldn't be the first time that reorder mistakes do get made there. We see ever bigger pressure at manufacturers to quickly release cpu's. Verifying them for 100% to be bugfree with just memory operands must be hell.

So if you're programming software where lives could depend upon, it sure isn't enough to just use volatile, as proof that it works bugfree is weak.

However in chess we do want to risk this and go for the fastest thing which is simply go for volatile. End 2005 (december) i figured out somewhere in the parallel code of Diep a bug where 2 volatile memory operands were in wrong order getting written by accident (implementation bug). It somehow never crashed diep and after a long quest i figured out the reason for that had to be the fact that 1 instruction later i locked something else, which also would for sure flushes all this.

In short the atomic speed at which the processor functions would always write things with a delay to RAM, only the invalidation happened atomic, taking care the order never was a problem for other cores and processors hammering onto these cachelines.

In short if you have somewhere a necessity to first set a variable

A to false
and only THEN variable B to false,

So the illegal state is A== true && B == false.

In your code:

A= false;
B= false;

Maybe it is entirely possible that under some conditions you will never be able to read at another processor the 'in between' state: A == false && B == true.

At least that is what i concluded.

The interesting thing to measure is how many times a second you can flip in these manners with 2 threads a second a variable.
Adding a store fence between the A=false and B=false (and without any volatiles) is guaranteed to read in the correct order. With just volatile, it should work on x86, but not on architectures with out-of-order writes.