volatile?

AlvaroBegue · Post by **AlvaroBegue** » Fri Mar 21, 2014 7:48 pm

syzygy wrote:I'm still not sure if ++ on an std::atomic type is executed atomically, but after googling a bit I think it does.

Yes, it does: http://en.cppreference.com/w/cpp/atomic ... ator_arith

syzygy · Post by **syzygy** » Fri Mar 21, 2014 9:13 pm

AlvaroBegue wrote:
syzygy wrote:I'm still not sure if ++ on an std::atomic type is executed atomically, but after googling a bit I think it does.
Yes, it does: http://en.cppreference.com/w/cpp/atomic ... ator_arith

Indeed, this makes it clear.

To come back to the "p_workers++" line from Senpai, it is still not automatically safe to replace

Code: Select all

   int volatile p_workers;

   void enter() {
      lock();
      p_workers++;
      unlock();
   }

by

Code: Select all

   std::atomic<int> volatile p_workers;

   void enter() {
      p_workers++;
   }

This is because there might be critical sections that access p_workers in a way that relies on p_workers not being changed from outside the critical section. So you have to look at all other uses of p_workers.

Code: Select all

   void leave() {

      lock();

      assert(p_workers > 0);
      p_workers--;

      if (p_workers == 0) sl_signal(*p_master);

      unlock();
   }

This assumes p_workers does not change between p_workers-- and the test for p_workers == 0. This assumption may no longer be true after removing the lock from enter().

AlvaroBegue · Post by **AlvaroBegue** » Fri Mar 21, 2014 9:16 pm

Can you not just write this?

Code: Select all

   void leave() {
      if (--p_workers == 0) sl_signal(*p_master);
   }

Rein Halbersma · Post by **Rein Halbersma** » Fri Mar 21, 2014 9:16 pm

syzygy wrote: This is because there might be critical sections that access p_workers in a way that relies on p_workers not being changed from outside the critical section. So you have to look at all other uses of p_workers.
Code: Select all
   void leave() {

      lock();

      assert(p_workers > 0);
      p_workers--;

      if (p_workers == 0) sl_signal(*p_master);

      unlock();
   }
This assumes p_workers does not change between p_workers-- and the test for p_workers == 0. This assumption may no longer be true after removing the lock from enter().

I agree with your observation, but why would you want to put extra locking around p_workers? The cleanest way would be factor out all read/write access of p_workers outside locking. And the lock should preferably use a std::lock_guard in order to be exception-safe.

bob · Post by **bob** » Fri Mar 21, 2014 10:00 pm

rbarreira wrote:
bob wrote:
rbarreira wrote:
bob wrote:
syzygy wrote:
bob wrote:The code looks ugly to me and is probably not safe.
If you had just read:
syzygy wrote:It seems splitPoint->moveCount is always accessed under lock protection (except for an assert), so it seems it could be made non-volatile.
Please spare us your confused "contributions" if you can't take the time to read first.
Please read what I wrote. Locks do NOT avoid the need for volatile. Not now, not ever. I might not be able to acquire the lock if someone else has it, but I can certainly keep a cached value of the variable for a long time, which can certainly be a problem...
You are wrong.

When a compiler sees a pthread_mutex_lock call (or many other kinds of API calls for that matter) it has to assume "anything can happen here, so don't make optimizations relying on things staying in the same state before and after the call".

To be more precise, at least it has to behave as if it did that (which does not allow it to mandate the use of volatile in code).
Actually this is wrong. The compiler has to assume that ANY global variable can be changed by any procedure call, including pthread_mutex_lock(). It is not doing anything specific for pthread_mutex_lock() at all. You can confirm this by downloading the library source, and sticking those two source files into your code. Now that the compiler can see what pthread_mutex_lock()/unlock() does, it notices none of your global variables are changed, and it will keep them right across the call as one would expect...
You are so wrong it isn't even funny. Your knowledge of compilers is stuck in the early 90s.

On top of that you're also contradicting yourself. First you said volatile was needed even with locks, now you say it's not.

Compilers can certainly make optimizations based on knowledge of what some function calls do.

OK, some specific points.

1. If you choose to lock EVERY shared access, you still have a problem For example, the simple spin lock in Crafty where a thread waits on work. Do you want to lock to READ the value? As well as write the value? How does that solve the problem? You acquire the lock, you read the value, you release the lock, if the value is zero, you repeat. That's known as a "loop". Without volatile, the "value" is a loop invariant, and the compiler will lift the code out of the loop and fetch it once. Now it doesn't work because the code will never see the changed value.

2. My knowledge about compilers is not "so 90s". I actually (1) tested this on gcc yesterday, and I actually looked at SOME of the compiler source to see if it had any recognition of pthread_mutex_lock built in. You seem to have either a reading problem OR a comprehension problem. I clearly pointed out that by presenting the compiler with the procedure source, it definitely CAN optimize across the procedure call. But NOT if it is buried in a library where it can't see it, or if it is compiled as a separate file that it can't see.

Try again.

bob · Post by **bob** » Fri Mar 21, 2014 10:03 pm

syzygy wrote:
bob wrote:Interestingly when I compile a simple test including the SOURCE for pthread_mutex_lock() and pthread_mutex_unlock() the compiler will maintain variable values right across the lock calls as I had originally thought, only because it can see none of the global variables are changed in the lock code, making it safe. Again, this has nothing to do with pthread_mutex_lock() itself, it is an artifact of all procedure calls, something I should have instantly realized. I'll blame it on this cough-medicine my doc has me on for flu-like symptoms...
Congratulations on using non-POSIX compliant locking primitives. POSIX tells you to #include the appropriate system files.

I propose we all just ignore Bob.

I think it better to ignore your uninformed rambling. Congratulations on not knowing a thing about what is going on here. Have you ever LOOKED at pthread.h, for example?

Of course not. Nothing magic in there telling the compiler "hey, behave differently for this function."

In the test above, I STILL had to include pthread.h, I just sucked in the lock source code so the compiler could see it. Was that too complicated for you to notice?

bob · Post by **bob** » Fri Mar 21, 2014 10:09 pm

lucasart wrote:
syzygy wrote:
lucasart wrote:Also, what about atomicity? If one thread modified a shared variable and another one modifies it at the same time. Is it possible that the bytes in memory composing that variable and up all scrambled?
If both threads write a 32-bit int to the same 4 bytes of memory within a single cacheline, then this write is guaranteed to be atomic. In other words, the end result is one of the two 32-bit ints and not a mixture of the two.

On x86-64 the same applies to 64-bit ints.

Obviously the same holds for 16-bit and 8-bit itns.
OK, that means for built-in types read/write operations are atomic. This is because built-in types have a size that divides the cash line size. Hence, in the absence of unaligned memory access (which you would really have to provoke on purpose with some ugly C-tyle reinterpretation of pointers), you are guaranteed that they don't cross a cache line.

For example, I'm wondering if in this line of code:
https://github.com/lucasart/Sensei/blob ... i.cc#L5861
I can remove the lock protection.

If I can assume that 'p_workers++' is an atomic operation, I should be able to remove the lock. I can think of two ways the compiler would translate this into assembly:
1/ incrementing directly the variable in memory (a single INC op-code)
2/ moving it into a registry, incrementing the registry, and moving it back to the memory. That three step approach wouldn't be atomic, leading to racy code without the lock protection.

Is there anything in the C++ standard that forbids 2/, and guarantees that the incrementation will be atomic? (hence allows removal of the lock). Should I define the variable p_workers as std::atomic<int> in order to get this guarantee?

Not quite. Here is why:

to do the ++ operation, you have to fetch that 64 byte block to L1, where the CPU can then read and write the data. But the cpu can read the original value, and another cache can then request that block for itself. It gets forwarded. The second cache now has the same value. Both increment and write, you get a race. You can end up with either 1 or 2 (assuming it started at zero). Nothing bogus like half of one and half of the other (for example, start at -2, you would either get -1 or 0, not -1 or 0xffff0000 or some other "torn" result.

You can use a lock prefix on the inc IIRC, to make it work, but the lock prefix has the same overhead as a regular lock.

BTW there is NO instruction that increments directly in memory. If you do an "inc x", the cpu has to fetch x, increment it, and write it back.

bob · Post by **bob** » Fri Mar 21, 2014 10:16 pm

syzygy wrote:
hgm wrote:As to the #include of the code, this still puzzles me. I can of course see that this help the compiler to se what the routines do, and thus which global variables run the risk of being changed, and which are safe. But when I #include a file that really defines a routine in more than one of my source files, I usually get a 'multiply-defined symbol' linker error. How is this prevented, in this case?
If you mean #include <pthread.h>, you do not have to worry what happens below the level of the source code you have typed. If your system complies with POSIX, and you stick to the rules (i.e. #include and link in the proper way and not copy & paste from the library source), then there is no need to make variables volatile in order to prevent optimisations from introducing bugs.

In the meantime I have understood better why "volatile" and concurrency are completely orthogonal concepts. "volatile" forces the compiler to reload values from memory, but gives no guarantee whatsoever (at the C/C++ standard level) that what you read is the value that has been written by another thread. It would be perfectly fine if the value returned is the local value in the processor's cache. volatile does not enforce cache coherency.

On a POSIX-compliant system, there is a guarantee that certain primitives synchronise memory across threads (or at least work "as if" memory is synchronised at these points).

You DO realize that when I copied the pthread_mutex_lock() code I STILL had to #include <pthread.h> correct? Or did you bother thinking about it at all? You STILL need the various constants/flags that are defined. But also, have you LOOKED at pthread.h?

Feel free to show me the magic stuff that tells the compiler "hey, you can't optimize global variable accesses across these calls, just like you can't optimize them across ANY procedure call the compiler can't see."

please...

syzygy · Post by **syzygy** » Fri Mar 21, 2014 10:34 pm

AlvaroBegue wrote:Can you not just write this?
Code: Select all
   void leave() {
      if (--p_workers == 0) sl_signal(*p_master);
   }

Probably better, assuming:
1. std::atomic has inter alia the effect of volatile; and
2. semantics of (p_workers == 0) implies p_workers is only loaded once (not entirely clear, at least not from the C99 standard, see the beginning of the thread).

To know whether it is sufficient we need to know what kind of enter()/leave() sequences are possible. Is it always first a series of enter()s, then a series of leave()s? Or can we at least be sure that p_workers is not anymore incremented after it has been decremented to 0? Then things get much simpler.

syzygy · Post by **syzygy** » Fri Mar 21, 2014 10:37 pm

Rein Halbersma wrote:
syzygy wrote: This is because there might be critical sections that access p_workers in a way that relies on p_workers not being changed from outside the critical section. So you have to look at all other uses of p_workers.
Code: Select all
   void leave() {

      lock();

      assert(p_workers > 0);
      p_workers--;

      if (p_workers == 0) sl_signal(*p_master);

      unlock();
   }
This assumes p_workers does not change between p_workers-- and the test for p_workers == 0. This assumption may no longer be true after removing the lock from enter().
I agree with your observation, but why would you want to put extra locking around p_workers? The cleanest way would be factor out all read/write access of p_workers outside locking. And the lock should preferably use a std::lock_guard in order to be exception-safe.

I'm not "wanting" anything, if that's what you mean

. If I had to rewrite this code I would first try to thorougly understand what are the possible scenarios. In particular, whether something can still enter() after the last one has leave()d...

volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?