A note for C programmers

bob · Post by **bob** » Sat Dec 07, 2013 4:05 am

wgarvin wrote:
bob wrote:
simon wrote:
wgarvin wrote: Okay, we get that you're still annoyed about it, although I'm surprised that two weeks have passed and you still haven't admitted yet that you're completely wrong!
You're probably the only person who is surprised.
WHAT am I "wrong" about?

Certainly not my original point, namely that Apple broke something that did not need to be broken. That Apple, in doing so, caused a lot of debugging effort on my part, and on the part of many others, unnecessarily.

I'm neither supporting OR railing against using undefined behavior if one is careful. I do believe the compiler should "do the best thing" as opposed to "let's be anal and break the code..."
I think you are wrong to the extent that you are trying to assign blame to Apple (or the glibc maintainers, or whoever). There was an explicit contract about how strcpy would behave; you and the authors of all those other broken programs that other people are whining about, were 100% in the wrong by using strcpy in an illegal manner. The behavior of strcpy when used that way has been undefined for 25 years, and there is no possible justification--none--for using it in that undefined fashion anyway.

Is it annoying to have programs broken by a change by somebody else? Sure.

Was it a dick move for them to start enforcing the requirement by adding this explicit check and abort? Perhaps. I'm actually quite sympathetic to that argument, but neither you nor I knows exactly what their motivation was and it doesn't matter, their freedom to do that was explicitly reserved by the standard 25 years ago.

So we need to apportion blame for the broken programs. You seem to think some, most, or perhaps even all of the blame belongs to glibc/Apple. If so, you are completely wrong. 100% of the blame here belongs to the programmers who didn't write code that followed the standard. In each case, that incorrect code was always a time-bomb waiting to blow up their programs. In your case you got away with it for at least 25 years, which perhaps was extremely lucky. But finally your luck ran out.

So to be clear, I think you are wrong in that you think your experience this week was somebody else's fault when an impartial review of the facts makes clear that it was 100% your fault. Only one party here was violating their obligations under the language contract. Glibc was 100% entitled to do what they did (although they will still take some flak for it because of the "dick move" angle).

When a program invokes undefined behavior according to the spec, there is no other possible outcome. The implementation can do anything it wants, no matter how ludicrous or unintuitive. The implementor might change their mind from one week to the next, and whatever happens is still 100% the programmer's fault.

As programmers we might find this to be more than a little unfair, but its the way the language works. Undefined behavior exists to preserve optimization freedoms for implementors and/or to simplify some parts of their task, and the programmers just have to avoid it as best they can. As long as we avoid invoking undefined behavior, we get to rely on the program semantics promised by the standard. But if we accidentally invoke undefined behavior then the nasal demons can and sometimes do result.

As I have repeatedly stated, I neither advocate for or against undefined behavior usage. I do my best to avoid it whenever possible, or deal with it (ala' race conditions in hashing) when the only solution to avoid the race is too expensive (acquiring/clearing a lock).

I have no problem with them changing the code if they want to fix it permanently. Torvalds suggested just equating memcpy() to memmove() and be done with it. No performance loss. No undefined behavior. Everybody is happy. An alternative is to print a warning or two at run-time, noting the risky behavior, but continuing to execute anyway. The worst case is simply breaking it, no notice, no "warning, this is deprecated and will go away or crash in future versions." or anything else.

That I do not consider acceptable, for EXACTLY the same reasons Linux quoted in the great memcpy() debate. Suppose they find a way to detect SOME race conditions. Is just "aborting" OK there? Is it REALLY a race condition (since there is no current method for a compiler to detect it reliably in the first place)? Is it OK to abort for "possible undefined behavior" conditions or must it wait for an actual condition to arise (assuming it could determine this, which it can't.). I see no good reason for aborting on undefined behavior. If they want to get cute, add the test in the PGO code. Then you can get compile-time warnings rather than unexplained execution-time crashes. There are good ways to fix a problem, and bad ways. I suppose if the cops shot every speeder in the head, eventually there would be no speeders. But it does seem to be a bit drastic.

bob · Post by **bob** » Sat Dec 07, 2013 4:07 am

Rein Halbersma wrote:
bob wrote:
Rein Halbersma wrote:
Online example using gcc -O3 (gcc 4.6.4 on Linux)
So? 4.7.3 newer than 4.6.4... perhaps they realized such an optimization is a mistake.
Every online compiler I tried, gave the same thing:

http://rextester.com/runcode (running gcc 4.7.3) infinite loop, same warning
http://coliru.stacked-crooked.com/ (gcc 4.6.4 and g++ 4.8.1), both infinite loop, same warning, also on the same site with Clang 3.4 -03 prints 32 (didn't I tell you that anything could happen?).
http://gcc.godbolt.org/ all g++ versions 4.5 through 4.9 give an infinite loop

macports gcc 4.7.3 works perfectly for me on that program. As does all the other versions I tested and posted output for.

bob · Post by **bob** » Sat Dec 07, 2013 4:11 am

syzygy wrote:
bob wrote:
Rein Halbersma wrote:
bob wrote:
Code: Select all
#include <stdio.h> 

int main&#40;) 
&#123; 
  int i, k = 0; 

  for &#40;i = 1; i > 0; i += i&#41; 
    k++; 
  printf&#40;"k = %d\n", k&#41;; 
  return 0; 
&#125;
Running gcc -O0 will produce 31 on architectures where an int is 32-bits. However, with gcc -O2 and higher, the compiler will recognize that "i += i" yields signed overflow UB. It will then eliminate the entire expression, and further optimize this code to an infinite loop. Such a scenario is also possible to happen with your XOR trick on multicore machines. Compiler routinely optimize away UB code instructions. You need extra compiler instructions or code modifcations to get the compiler to do what you intended.
Don't know what planet you compile on. Here's that run on my macbook:

scrappy% cc -O -o tst tst.c
.scrappy% ./tst
k = 31
scrappy% cc -O2 -o tst tst.c
scrappy% ./tst
k = 31
scrappy% cc -O3 -o tst tst.c
scrappy% ./tst
k = 31

gcc version 4.7.3 (MacPorts gcc47 4.7.3_3)
Online example using gcc -O3 (gcc 4.6.4 on Linux)
So? 4.7.3 newer than 4.6.4... perhaps they realized such an optimization is a mistake.
On my machine both 4.7.2 and 4.8.1 generate an infinite loop when compiling with -O3.

I gave the compile commands for 4.7.3 (macports) on my mac. And for the other versions on various departmental machines. One version of Intel C++ produced an infinite loop. Two others worked fine. So all I can confirm for failing is one specific version of intel C++. There may be several that fail. But there are some that work. I seem to be lucky enough to have more of those that work.

bob · Post by **bob** » Sat Dec 07, 2013 4:14 am

syzygy wrote:
bob wrote:
syzygy wrote:
bob wrote:
Rein Halbersma wrote:
bob wrote: So how about stopping with the hand waving and simply explain how to break a hash probe where I get an 8 byte signature that does not go with the corresponding 8 bytes of score and stuff. Then we can talk.
No, the burden of proof is on you. You claim that the compiler has no choice but to emit a sane set of instructions that give you the expected result. However, both the C11 and C++11 Standard leave no doubt that reads and writes from different threads to the same variable constitute undefined behavior.
I made no such claim. My claim is that the compiler is required to semantically do EXACTLY what I say.
Show me what in the C99 standard requires the compiler to let two different threads write into the same memory location.
Show me explicitly where it does not.
This question is too twisted to take seriously.

Show me specifically how it can even determine that that will happen in a place where it can then wreck the code.
You did not read what I wrote, or you did not understand.

You said, quote "Show me what in the C99 standard requires the compiler to let two different threads write into the same memory location." I have asked REPEATEDLY, show me one working methodology where the compiler can even determine that such can possibly happen. Not possible. Ergo that line of the standard is just random noise since the compiler can't even recognize the condition, unlike the condition of overlapping source/destination on strcpy() or memcpy().

So, tell me how it is going to recognize that condition, so that it would then be free to do anything that could break it. Once you get past that, then feel free to show me how ANYTHING the compiler does, from writing a bit at a time to writing 128 bits at a time, will break my code (which it won't). Yes, if the compiler were to decide to abort, that would break things. But why would it abort? It can't even DETECT the condition in the first place...

bob · Post by **bob** » Sat Dec 07, 2013 4:20 am

syzygy wrote:
bob wrote:
syzygy wrote:
bob wrote:
syzygy wrote:
bob wrote:I don't care what the compiler does, it can't break the lockless hashing code. It is as simple as that. If you want to even try to convince me otherwise, here's the challenge:
You certainly do care what the compiler does.

a hash entry is 16 bytes. I have to write it as 2 8-byte blocks, each of which is certainly guaranteed to be atomic in x86 since that is what is written across the bus in one cycle.
Nope, the compiler is free to implement storing an uint64 as 8x storing a byte. Nothing in the standard stops the compiler from doing just that.

The compiler most likely won't do that, but here you are depending on the compiler and not on the C language.
And if you read what I wrote
Read what you wrote, I quoted it. "I have to write it as 2 8-byte blocks, each of which is certainly guaranteed to be atomic in x86[sic] since that is what is written across the bus in one cycle."

How about the REST of that sentence you quoted:
I quoted all I needed. If you were not being precise, then that's your problem.

So what is it now. Are uint64 writes still guaranteed to be atomic by the C standard?

Do I care? No. My code works whether it is bit-wise or qword-wise. Exactly As I have repeatedly stated. But, please cite ONE reasonable explanation for why a compiler would take a uint64_t variable and do anything other than a single move instruction to write to it. And once you get past that, which has no reasonable explanation BTW, then explain how the compiler is going to detect the undetectable race condition in the first place so that it can do something odd...

The important thing is this: the C standard does not in any way prescribe how a C compiler maps C source code to the machine. You can have a 64-bit processor, but there is no guarantee that 8-byte writes are atomic. And again, there is nothing in the C99 standard that prescibes that different threads see the same global variables. This is simple enough: the C99 standard is not even aware of threads.

Are you kidding now? Memory addresses ARE addressed in the C standard. Better check up on structures. I CAN define what appears where, and all threads are guaranteed to see exactly the same data since there is only one address space managed by the operating system, NOT the compiler. If that statement were true, nothing would work. How could you access data in a buffer stored in the C library routines that you don't declare or malloc()?

We are now so far out in never-never land this can't possible be a discussion based on computer science.

Most compilers will have a sensible efficient mapping to the hardware, but if you rely on this you are completely dependent on the compiler. (Or on some other standard that the compiler implements, e.g. the phtread standard.)

I don't see what a compiler has to do with pthreads. That has always been outside the compiler (pthread library + clone() system call in linux O/S).

wgarvin · Post by **wgarvin** » Sat Dec 07, 2013 5:24 am

bob wrote:Torvalds suggested just equating memcpy() to memmove() and be done with it. No performance loss. No undefined behavior. Everybody is happy.

Um, I would be very unhappy by that change. It WOULD be a performance loss, it would introduce a mandatory branch into memcpy to detect forward vs. backward overlaps. That might be noise with a large size, but a LOT of copying of small structures and arrays is done with those functions. (I went through several hundred such occurrences in the codebase I work on at work today, testing a faster implementation of memcpy.) There's a reason memcpy and memmove are specified as two different functions with different semantics. They are heavily used by millions of C and C++ programs, with sizes ranging from 1 byte to megabytes. They need to be as efficient as possible. Memcpy can be implemented more efficiently because of the restriction that source and dest may not overlap.

bob wrote:An alternative is to print a warning or two at run-time, noting the risky behavior, but continuing to execute anyway. The worst case is simply breaking it, no notice, no "warning, this is deprecated and will go away or crash in future versions." or anything else.

That I do not consider acceptable, for EXACTLY the same reasons Linus quoted in the great memcpy() debate.

Well it was always their right according to the standard. The problem is that people like Linus think that C is supposed to be a stable, safe, secure systems programming language. But the people who control the language spec and the people who write the compilers, apparently think something else. They seem to think that maximum performance (even on synthetic benchmarks) trumps everything else, such as the usability and safety of the resulting language.

We really do need a C-like language with most of this undefined stuff fully specified, so that it would be safer and with semantics that are less surprising to the programmers (even if it costs a small bit of the potential performance). But C is not yet that language. (note: If someone decides to make that language, please provide only fixed-size integer types, and clean up the giant mess that is integer promotions)

Suppose they find a way to detect SOME race conditions. Is just "aborting" OK there?

Yes, definitely. You invoke undefined behaviour, you (might) get the nasal demons. Of course it would be idiotic for them to implement that, there would be complexity and no performance benefit from it. But the standard says they can do anything at all, including abort, if they can detect (either at compile time or at run time) that you invoked undefined behaviour. A program execution that invokes any undefined behavior has no expected semantics at all. It might fortuitously do what your intuition suggests is the "sensible" thing. Or it might not.

bob wrote:Is it REALLY a race condition (since there is no current method for a compiler to detect it reliably in the first place)?

Yes, of course it is. A race condition is a property of the program being executed, it exists whether or not the compiler is capable of detecting it.

bob wrote:Is it OK to abort for "possible undefined behavior" conditions or must it wait for an actual condition to arise (assuming it could determine this, which it can't.).

Well if the compiler can't prove that you invoked undefined behavior, then it can't abort. One way to prove that would be to detect it somehow at runtime. But it might also be able to detect it somehow statically at compile time (there are some race detector tools that are capable of doing this; Relacy is one I've heard of, and google has something they hooked up to clang too, IIRC).

bob wrote:I see no good reason for aborting on undefined behavior. If they want to get cute, add the test in the PGO code. Then you can get compile-time warnings rather than unexplained execution-time crashes. There are good ways to fix a problem, and bad ways. I suppose if the cops shot every speeder in the head, eventually there would be no speeders. But it does seem to be a bit drastic.

For sure. In this case, maybe they were planning to alter the function's behavior in a future version and they wanted to find out how many programs were incorrectly relying on the undefined behavior. I think this test-and-abort is probably better than just making the implementation change and having some programs silently do the wrong thing, because at least the programmers are put on notice that their misuse of strcpy won't behave the same in the future as it did in the past. In most cases the programmers probably weren't aware they were mis-using it, so now they will find out and have a chance to fix their programs.

Earlier in the thread, when I suggested putting a breakpoint on the abort function in the debugger, you asked "what abort function".. I was thinking of the C library function "abort". It raises SIGABRT and terminates the program abnormally. I think thats what normally happens after an assert failure, but I'm not sure. Anyway, if I saw the word "abort" on the TTY with no context, my guess would be that something had called that abort function (or raised SIGABRT, I suppose...)

mvk · Post by **mvk** » Sat Dec 07, 2013 5:33 am

wgarvin wrote:Earlier in the thread, when I suggested putting a breakpoint on the abort function in the debugger, you asked "what abort function".. I was thinking of the C library function "abort". It raises SIGABRT and terminates the program abnormally. I think thats what normally happens after an assert failure, but I'm not sure. Anyway, if I saw the word "abort" on the TTY with no context, my guess would be that something had called that abort function (or raised SIGABRT, I suppose...)

This was not necessary. The OSX library in this case already created a stack trace for the user to look at. OP didn't bother to learn how his system reports failure (or for that matter, didn't bother to familiarize himself with the language he is using, for example by reading man pages, C-FAQ or the C standard at any point in the past 25 years), and choose to let the demons fly through his nose.

wgarvin · Post by **wgarvin** » Sat Dec 07, 2013 5:39 am

bob wrote:
syzygy wrote:
bob wrote:
Rein Halbersma wrote:
bob wrote:
Code: Select all
#include <stdio.h> 

int main&#40;) 
&#123; 
  int i, k = 0; 

  for &#40;i = 1; i > 0; i += i&#41; 
    k++; 
  printf&#40;"k = %d\n", k&#41;; 
  return 0; 
&#125;
Running gcc -O0 will produce 31 on architectures where an int is 32-bits. However, with gcc -O2 and higher, the compiler will recognize that "i += i" yields signed overflow UB. It will then eliminate the entire expression, and further optimize this code to an infinite loop. Such a scenario is also possible to happen with your XOR trick on multicore machines. Compiler routinely optimize away UB code instructions. You need extra compiler instructions or code modifcations to get the compiler to do what you intended.
Don't know what planet you compile on. Here's that run on my macbook:

scrappy% cc -O -o tst tst.c
.scrappy% ./tst
k = 31
scrappy% cc -O2 -o tst tst.c
scrappy% ./tst
k = 31
scrappy% cc -O3 -o tst tst.c
scrappy% ./tst
k = 31

gcc version 4.7.3 (MacPorts gcc47 4.7.3_3)
Online example using gcc -O3 (gcc 4.6.4 on Linux)
So? 4.7.3 newer than 4.6.4... perhaps they realized such an optimization is a mistake.
On my machine both 4.7.2 and 4.8.1 generate an infinite loop when compiling with -O3.
I gave the compile commands for 4.7.3 (macports) on my mac. And for the other versions on various departmental machines. One version of Intel C++ produced an infinite loop. Two others worked fine. So all I can confirm for failing is one specific version of intel C++. There may be several that fail. But there are some that work. I seem to be lucky enough to have more of those that work.

Why do you say that one compiler "fails" and another one "works". The program invokes undefined behavior. The language standard defines no semantics for it at all. All of those compilers are working as intended.

You seem to think you are "lucky" because you have a compiler that produces code from this snippet that matches your intuition about what the hardware would do if the compiler generated the code you imagine it would generate. But none of these compilers are obligated to generate that code (not even the ones that apparently do anyway). A future version of them might not generate that code. They might not generate that code next week, or when you compile a version of the function that has something else next to this snippet (e.g. different control flow, or whatever). In short, there's precious little you can rely on once you invoke undefined behavior, and the way these compilers treat this one code snippet doesn't really give you any guarantees about how they will treat other pieces of code. For that you have to rely on compiler-specific options, such as -fwrapv.

(And yes, I love being able to disable strict aliasing on all of the compilers I use... I hope that all major compilers continue to support that effectively forever, because there are billions of lines of code out there that don't respect the rule, and millions of programmers who don't know it well enough to follow it 100% of the time, and I am probably one of them).

Rebel · Post by **Rebel** » Sat Dec 07, 2013 10:08 am

Glad to be an ASM programmer

syzygy · Post by **syzygy** » Sat Dec 07, 2013 11:05 am

bob wrote:Please stop writing the nonsense about race conditions. The compiler can't even RECOGNIZE them, much less do anything with 'em.

Some race conditions can be detected trivially. First of all, we need to be talking C11 or talking about threads does not even make sense. In C11, just let one thread create another thread and let both threads at some point access the same variable. If there are no synchronisation primitives on the paths towards these accesses, you have a data race. C11 defines the synchronisation primitives, so the compiler can recognise their absence.

Will compilers actively look for data races? Probably not. But they can assume that they won't happen and optimise your code in ways that you certainly are not expecting.

A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers