A note for C programmers

bob · Post by **bob** » Sat Dec 07, 2013 6:15 pm

Rein Halbersma wrote:
mar wrote: To be honest, being 100% standard compiant will become black art,
existing very large code bases will have to be rewritten completely (which will cost time and money and introduce new bugs).
I'm not sure if this is of any good. We'll see.
I don't buy that. If you stay away from the dusty corners of the language (both in C and C++) you can and usually do write compliant code. That means don't get involved in tricky bit-twiddling (shifting through sign-bits, signed overflow), manual pointer stuff (null derefence, bounds violations) or cute expressions (incrementing and assigning in one go to the same variable like i = i++; ). Could you show me an example of UB that should not already have been rewritten on grounds of readability alone?

WHAT?

Get away from pointers? All the things C was DESIGNED to make easy to use?

Go use Java and leave C ALONE. It works quite well for those of us that know what we are doing. Many years ago OSHA started to demand chain brakes on chain saws, and blade brakes on mowers. To stop the kick-back injuries from chainsaws or the mower running over you when you pull it backward and trip.

Net result? MORE chainsaw injuries because people now feel that the chain brake makes it impossible for a kickback to occur. Had a co-worker a few years ago that came in with bandages all over his face. Now has a permanent scar from forehead to chin. Chain brake did not save him. Nor will a pseudo-race-detecting compiler save anyone either.

BTW I own both a Stihl chainsaw and have owned several mowers with blade brakes. Only advantage I saw on the blade brakes on mowers was that it increased profits for the manufacturer since they wear out quickly enough.

bob · Post by **bob** » Sat Dec 07, 2013 6:17 pm

Rein Halbersma wrote:
mar wrote:
Rein Halbersma wrote:I don't buy that. If you stay away from the dusty corners of the language (both in C and C++) you can and usually do write compliant code. That means don't get involved in tricky bit-twiddling (shifting through sign-bits, signed overflow), manual pointer stuff (null derefence, bounds violations) or cute expressions (incrementing and assigning in one go to the same variable like i = i++; ). Could you show me an example of UB that should not already have been rewritten on grounds of readability alone?
Of course, what about data races? pthreads vs C11.
Don't know about C11, but in C++11, concurrent data structures should use a boost::shared_mutex<uint64_t> to implement shared hash tables. This allows multiple readers and a single writer to safely (i.e. without UB) access a global hash table from multiple threads. In this C++ book on concurrency, such an example is worked out in detail.

And it will run as slow as a dog for the effort.

bob · Post by **bob** » Sat Dec 07, 2013 6:18 pm

mar wrote:
Rein Halbersma wrote:Don't know about C11, but in C++11, concurrent data structures should use a boost::shared_mutex<uint64_t> to implement shared hash tables. This allows multiple readers and a single writer to safely (i.e. without UB) access a global hash table from multiple threads. In this C++ book on concurrency, such an example is worked out in detail.
Now I'm a bit puzzled. I just looked at boost::shared_mutex implementation and it's a plain class, not a template. It uses pthreads or WinAPI on windows.
Ronald already said that you have no guarantee that threads will be able to see the same memory according to the standard.
This bothers me as I always thought that in multi-cpu environment, you are guaranteed that a process can run on exactly one CPU and threads are serviced by individual cores of that CPU.
Is there a standard way to collect results from individual threads? You will need shared memory for that. I have never worked with a multi-CPU machine before so I may be wrong.

Ronald was wrong. In fact, the C standard is quite specific about parts of memory layout. Such as a structure.

Rein Halbersma · Post by **Rein Halbersma** » Sat Dec 07, 2013 6:20 pm

bob wrote:
Rein Halbersma wrote:
mar wrote:
Rein Halbersma wrote:I don't buy that. If you stay away from the dusty corners of the language (both in C and C++) you can and usually do write compliant code. That means don't get involved in tricky bit-twiddling (shifting through sign-bits, signed overflow), manual pointer stuff (null derefence, bounds violations) or cute expressions (incrementing and assigning in one go to the same variable like i = i++; ). Could you show me an example of UB that should not already have been rewritten on grounds of readability alone?
Of course, what about data races? pthreads vs C11.
Don't know about C11, but in C++11, concurrent data structures should use a boost::shared_mutex<uint64_t> to implement shared hash tables. This allows multiple readers and a single writer to safely (i.e. without UB) access a global hash table from multiple threads. In this C++ book on concurrency, such an example is worked out in detail.
And it will run as slow as a dog for the effort.

You should read the book. It is not using a global mutex, but one mutex per bucket.

bob · Post by **bob** » Sat Dec 07, 2013 6:21 pm

mar wrote:
Rein Halbersma wrote:Sorry, I indeed wrote the wrong class. Your data entry (or bucket of entries) can be plain Key, Value pair, supplemented with a std::mutex (per bucket or per pair). You can also use a boost::shared_mutex, which will become std::shared_mutex in C++14. WIth the mutex, you have to lock/unlock that mutex in every member function of your hash table.

I am not very proficient in multi-threaded code either, but the C++ Standard does not mandate on which core a thread is run. The OS has a lot of leeway in determining that. But that doesn't really matter, since the std::mutex does guarantee the memory consistency of simultaneous read/write attempts.
What I don't like about locking is that it pauses other threads accessing the hashtable simultaneously (the same lock in fact). I like bob's xor trick because it's completely lock-free.
What might work (and I bet others are doing it as well) would be to use a pool of spinlocks (or mutexes) indexed by n bits of the hash key. Maybe that's what you meant.
The advantage of this would be that you'd only lock if you access the same entry (actually entries with same m-n bits of the hash key) and you don't lock the whole TT.
Whether it's worth the effort (bob's solution is much simpler than that) and how much impact it would have on performance is another question (OTOH it would avoid data races completely).
I will probably try it someday.

We used the "array of locks" approach in a version of Cray Blitz. How many locks can you tolerate? Locks (on x86) still generate a ton of cache traffic as each is set and released. The overhead still is too high in general, particularly compared to the xor trick which will always work flawlessly with no overhead.

AlvaroBegue · Post by **AlvaroBegue** » Sat Dec 07, 2013 6:21 pm

I think you are taking too much comfort in your lack of imagination: You might not be able to think of a way in which a compiler might break your neat lockless hashing technique, but this doesn't mean that one doesn't exist, or won't exist in the future.

In the meantime, keep using your hack, but don't start a thread like this one when it breaks.

bob · Post by **bob** » Sat Dec 07, 2013 6:22 pm

syzygy wrote:
mar wrote:
Rein Halbersma wrote:Sorry, I indeed wrote the wrong class. Your data entry (or bucket of entries) can be plain Key, Value pair, supplemented with a std::mutex (per bucket or per pair). You can also use a boost::shared_mutex, which will become std::shared_mutex in C++14. WIth the mutex, you have to lock/unlock that mutex in every member function of your hash table.

I am not very proficient in multi-threaded code either, but the C++ Standard does not mandate on which core a thread is run. The OS has a lot of leeway in determining that. But that doesn't really matter, since the std::mutex does guarantee the memory consistency of simultaneous read/write attempts.
What I don't like about locking is that it pauses other threads accessing the hashtable simultaneously (the same lock in fact). I like bob's xor trick because it's completely lock-free.
What might work (and I bet others are doing it as well) would be to use a pool of spinlocks (or mutexes) indexed by n bits of the hash key. Maybe that's what you meant.
The advantage of this would be that you'd only lock if you access the same entry (actually entries with same m-n bits of the hash key) and you don't lock the whole TT.
Whether it's worth the effort (bob's solution is much simpler than that) and how much impact it would have on performance is another question (OTOH it would avoid data races completely).
I will probably try it someday.
Personally I will continue to rely on data race when accessing the hashtable (and I won't use the xor-trick either). I promise I will not complain when a new compiler version breaks my code.

Sorry, but not allowed. You've been carping about MY using undefined behavior for days now. Isn't this just a tad inconsistent? You use it because it works? You use it in spite of it being called "undefined behavior"?

Henk · Post by **Henk** » Sat Dec 07, 2013 6:23 pm

I don't know if my operating system Windows 8 uses elements with undefined behavior. I don't feel safe now.

If your using a new product which might use elements of undefined behavior you don't know if and how often something might go wrong.
I don't like these surprises.

mar · Post by **mar** » Sat Dec 07, 2013 6:23 pm

bob wrote:Ronald was wrong. In fact, the C standard is quite specific about parts of memory layout. Such as a structure.

Actually I misunderstood, he clarified this to me with a later reply.

bob · Post by **bob** » Sat Dec 07, 2013 6:24 pm

Rein Halbersma wrote:
bob wrote:
Rein Halbersma wrote:
mar wrote:
Rein Halbersma wrote:I don't buy that. If you stay away from the dusty corners of the language (both in C and C++) you can and usually do write compliant code. That means don't get involved in tricky bit-twiddling (shifting through sign-bits, signed overflow), manual pointer stuff (null derefence, bounds violations) or cute expressions (incrementing and assigning in one go to the same variable like i = i++; ). Could you show me an example of UB that should not already have been rewritten on grounds of readability alone?
Of course, what about data races? pthreads vs C11.
Don't know about C11, but in C++11, concurrent data structures should use a boost::shared_mutex<uint64_t> to implement shared hash tables. This allows multiple readers and a single writer to safely (i.e. without UB) access a global hash table from multiple threads. In this C++ book on concurrency, such an example is worked out in detail.
And it will run as slow as a dog for the effort.
You should read the book. It is not using a global mutex, but one mutex per bucket.

You should read my other comment. That is even worse. For every node on a 16 core box, you get 16 locks set and released. That fries cache.

A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers