Just put the "inline" keyword in front of a function definition inside a header.hgm wrote:Note that even a single INC mem instruction is not atomic on i386/x64. It still involves reading and then writing back the data in separate steps of the micro-architecture, and other cores could read or write that same memory location in between. Only with a LOCK prefix the instruction access to memory by other cores will be blocked between the read and the write.
As to the #include of the code, this still puzzles me. I can of course see that this help the compiler to se what the routines do, and thus which global variables run the risk of being changed, and which are safe. But when I #include a file that really defines a routine in more than one of my source files, I usually get a 'multiply-defined symbol' linker error. How is this prevented, in this case?
volatile?
Moderators: hgm, Rebel, chrisw
-
- Posts: 741
- Joined: Tue May 22, 2007 11:13 am
Re: volatile?
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
Re: volatile?
Yes, but remember that this is platform-dependent. It might be true for all platforms we know, but it is not mandated by C/C++ or POSIX (actually it is not true for all platforms we know: uint64 is a built-in type on 32-bit x86, but not atomic).lucasart wrote:OK, that means for built-in types read/write operations are atomic. This is because built-in types have a size that divides the cash line size. Hence, in the absence of unaligned memory access (which you would really have to provoke on purpose with some ugly C-tyle reinterpretation of pointers), you are guaranteed that they don't cross a cache line.
It is not atomic. Whether it is implemented as a single x86 increment instruction or as separate loads and stores, it is not atomic.For example, I'm wondering if in this line of code:
https://github.com/lucasart/Sensei/blob ... i.cc#L5861
I can remove the lock protection.
If I can assume that 'p_workers++' is an atomic operation,
If you want atomic increment, you need to use inline assembly to generate a "lock inc" instruction, or gcc intrinsics (e.g. __atomic_fetch_add), or whatever the language provides (at least C++11 and I suppose C11 do).
The C++ standard allows both and neither is atomic.Is there anything in the C++ standard that forbids 2/, and guarantees that the incrementation will be atomic?
The C++ standard before C++11 does not know about multithreading so has no notion of atomicity. It guarantees nothing at all as multithreading is concerned.
I don't think POSIX/pthreads makes any atomicity guarantees, but I might be wrong. Certainly it does not guarantee that p_workers++ is atomic.
C++11 provides atomic primitives, but does not guarantee that p_workers++ is atomic.
I'm afraid that would only guarantee that reads and writes are atomic, not that ++ is atomic.Should I define the variable p_workers as std::atomic<int> in order to get this guarantee?
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
Re: volatile?
If you mean #include <pthread.h>, you do not have to worry what happens below the level of the source code you have typed. If your system complies with POSIX, and you stick to the rules (i.e. #include and link in the proper way and not copy & paste from the library source), then there is no need to make variables volatile in order to prevent optimisations from introducing bugs.hgm wrote:As to the #include of the code, this still puzzles me. I can of course see that this help the compiler to se what the routines do, and thus which global variables run the risk of being changed, and which are safe. But when I #include a file that really defines a routine in more than one of my source files, I usually get a 'multiply-defined symbol' linker error. How is this prevented, in this case?
In the meantime I have understood better why "volatile" and concurrency are completely orthogonal concepts. "volatile" forces the compiler to reload values from memory, but gives no guarantee whatsoever (at the C/C++ standard level) that what you read is the value that has been written by another thread. It would be perfectly fine if the value returned is the local value in the processor's cache. volatile does not enforce cache coherency.
On a POSIX-compliant system, there is a guarantee that certain primitives synchronise memory across threads (or at least work "as if" memory is synchronised at these points).
-
- Posts: 303
- Joined: Sat Apr 28, 2012 6:18 pm
- Location: Austin, TX
Re: volatile?
It's for hardware access. When you have program memory mapped in the hardware space (i.e. memory that can be accessed by hardware registers) outside the program, the volatile keyword let's you know that the value of the register could change out from under you.lucasart wrote: The reason I ask, is because I was wondering what this obscure "volatile" keyword really means. I read this short article:
https://www.kernel.org/doc/Documentatio ... armful.txt
Essentially they make the point thatThey say that volatile has nothing to do with concurrency, it's almost never correct to use it, and the lock is enough.In properly-written code, volatile can only serve to slow things down
On the other hand, Stockfish also declares all shared variables as volatile. And I know that Marco is much more knowledgeable than I am in C++, especially when it comes to multi-threading. So I can't help wondering if there isn't indeed a good reason for all this volatile stuff
We use it all the time on PCB (printed circuit board) projects where we have external hardware registers that can program shared hardware/software registers and memory locations. I've never used it for a simple program only situation and see no real use for it if external hardware isn't involved.
regards,
--tom
-
- Posts: 27809
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: volatile?
Hardware automatically forces cache coherency. There is nothing the compiler has to or can do about that. Returning the local value in the core's private cache is always fine. If that wasn't the currently valid value (because it was changed in DRAM, a shared higher cache level or some other core's private cache), it would be no longer in your cache.syzygy wrote:"volatile" forces the compiler to reload values from memory, but gives no guarantee whatsoever (at the C/C++ standard level) that what you read is the value that has been written by another thread. It would be perfectly fine if the value returned is the local value in the processor's cache. volatile does not enforce cache coherency.
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
Re: volatile?
Maybe your hardware does that, but in general it does not. Certainly the various standards do not require any automatic enforcement of cache coherency.hgm wrote:Hardware automatically forces cache coherency. There is nothing the compiler has to or can do about that.syzygy wrote:"volatile" forces the compiler to reload values from memory, but gives no guarantee whatsoever (at the C/C++ standard level) that what you read is the value that has been written by another thread. It would be perfectly fine if the value returned is the local value in the processor's cache. volatile does not enforce cache coherency.
I believe the x86 architecture has a memory model that is so software friendly (and hardware unfriendly) that the pthreads library does not have to do anything special. On other architectures this is certainly different, e.g. memory writes by one thread may be observed by other threads out of order. The pthreads primitive (or those of a comparable libray) on those platforms will take care of this and the programmer will not notice anything, provided he sticks to the rules.
Already on x86 there is no general guarantee that the value read from cache is identical to the value stored in DRAM by another thread, especially if you consider multi-socket systems. What is guaranteed (on x86, not on other architectures) is that if CPU1 writes to A and then to B, and (some small time later) CPU2 reads B and then A, it will retrieve the new value from A if it retrieved the new value from B. But it is OK if it retrieves old (cached) values for both A and B or if it retrieves the old value for B and the new value for A.Returning the local value in the core's private cache is always fine. If that wasn't the currently valid value (because it was changed in DRAM, a shared higher cache level or some other core's private cache), it would be no longer in your cache.
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
Re: volatile?
http://www.airs.com/blog/archives/154
For dealing with memory mapped hardware, volatile is exactly what you want. For most other types of code, including multi-threaded code, volatile does not help.
Using volatile does not mean that the variable is accessed atomically; no locks are used. Using volatile does not mean that other cores in a multi-core system will see the memory accesses; no cache flushes are used. While volatile writes are guaranteed to occur in the program order for the core which is executing them, there is no guarantee that any other core will see the writes in the same order. Using volatile does not imply any sort of memory barrier; the processor can and will rearrange volatile memory accesses (this will not happen for address ranges used for memory mapped hardware, but it will for ordinary memory).
Conversely, if you use the locking primitives which are part of any threading library, then you do not need to use volatile. The locking primitives will include the required memory barriers or cache flushes. They will include whatever special directives are needed to tell the compiler that memory must be stable.
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: volatile?
THank you for all the explanations. Very clear.syzygy wrote:Yes, but remember that this is platform-dependent. It might be true for all platforms we know, but it is not mandated by C/C++ or POSIX (actually it is not true for all platforms we know: uint64 is a built-in type on 32-bit x86, but not atomic).lucasart wrote:OK, that means for built-in types read/write operations are atomic. This is because built-in types have a size that divides the cash line size. Hence, in the absence of unaligned memory access (which you would really have to provoke on purpose with some ugly C-tyle reinterpretation of pointers), you are guaranteed that they don't cross a cache line.
It is not atomic. Whether it is implemented as a single x86 increment instruction or as separate loads and stores, it is not atomic.For example, I'm wondering if in this line of code:
https://github.com/lucasart/Sensei/blob ... i.cc#L5861
I can remove the lock protection.
If I can assume that 'p_workers++' is an atomic operation,
If you want atomic increment, you need to use inline assembly to generate a "lock inc" instruction, or gcc intrinsics (e.g. __atomic_fetch_add), or whatever the language provides (at least C++11 and I suppose C11 do).
The C++ standard allows both and neither is atomic.Is there anything in the C++ standard that forbids 2/, and guarantees that the incrementation will be atomic?
The C++ standard before C++11 does not know about multithreading so has no notion of atomicity. It guarantees nothing at all as multithreading is concerned.
I don't think POSIX/pthreads makes any atomicity guarantees, but I might be wrong. Certainly it does not guarantee that p_workers++ is atomic.
C++11 provides atomic primitives, but does not guarantee that p_workers++ is atomic.
I'm afraid that would only guarantee that reads and writes are atomic, not that ++ is atomic.Should I define the variable p_workers as std::atomic<int> in order to get this guarantee?
I've got some RTFM to do
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
Re: volatile?
You're welcomelucasart wrote:THank you for all the explanations. Very clear.
I've got some RTFM to do
It seems atomic increment in C++11 can be done using std::atomic_fetch_add. As far as I understand, the variable has to be declared using std::atomic for it to work.
I'm still not sure if ++ on an std::atomic type is executed atomically, but after googling a bit I think it does.
-
- Posts: 27809
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: volatile?
That amounts to the same thing, not? It only means the CPU that reads can be too early. There is no way to compare the absolute time scale on different CPUs.syzygy wrote:Already on x86 there is no general guarantee that the value read from cache is identical to the value stored in DRAM by another thread, especially if you consider multi-socket systems. What is guaranteed (on x86, not on other architectures) is that if CPU1 writes to A and then to B, and (some small time later) CPU2 reads B and then A, it will retrieve the new value from A if it retrieved the new value from B. But it is OK if it retrieves old (cached) values for both A and B or if it retrieves the old value for B and the new value for A.
If one CPU writes the location in its cache it broadcasts the address on the bus, so other CPUs invalidate any copies they might be holding. That should work also for multi-socket.
Indeed I am talking only about Intel architecture.