How could a compiler break the lockless hashing method?

bob · Post by **bob** » Tue Dec 10, 2013 10:27 pm

syzygy wrote:
bob wrote:Which I find amusing because no compiler on the planet can even recognize a race condition at compile time...
What is amusing is that you still think (or maybe don't think, but claim, for the sake of trolling) that defining something as UB only makes sense if it can be detected by the compiler.

When YOU use the definition that "UB means the compiler can do ANYTHING it chooses" the yes it is amusing. Because it can't even recognize all types of UB, and I am being a lousy programmer when I use something I know it can't detect, which means it will do exactly what I want, and I am happy. But we can't seem to get past that. You keep reciting, over and over, if you use UB it will break, somewhere, sometime. I am 100% certain the compiler can not even recognize UB most of the time...

wgarvin · Post by **wgarvin** » Tue Dec 10, 2013 10:32 pm

bob wrote:The problem is that "some" seem to think that "undefined behavior" means the compiler can do anything it wants. Abort the program. Ignore the operation and optimize it out. As opposed to "just do the right thing" where the race may well be perfectly acceptable with no risk at all.

That has been the status quo for more than 20 years now. Its not just "some" people who think this, all of the compiler experts and language lawyers interpret the spec that way. I tried to explain this to you for more than a week now, so I'm trying not to post about it anymore.

Argumentum ad nauseam.

Is it stupid? Yes. But compiler writers sincerely believe they are allowed to do it, and sometimes they have a sincere and maybe-somewhat-convincing argument that it will improve performance across a large set of correct programs while maybe unfortunately breaking your incorrect program that invokes UB.

This paper, which I linked once already in the first 30-page thread gives a bunch of examples of real-world security vulnerabilities caused by compilers optimizing invalid UB-causing code, while assuming it couldn't cause UB. Those are real bugs caused by programmers not understanding that the compiler might mangle their code because they didn't color inside the lines (or if we want to be nice, perhaps because they did understand that but just didn't realize they were coloring outside the lines. That happens a lot too.) Either way, they colored outside the borders of the page and the compiler cropped off that part of their picture without them realizing it.

Of course you would blame those bugs on the compiler vendor, right? It couldn't possibly be the fault of programmers who didn't understand UB, maybe even refused to believe the current truth about UB even after weeks of explanations and examples.

But if you did decide to complain to your compiler vendor, their response is going to be "thats undefined behavior, you were warned not to do that". I suppose if they were feeling helpful they might instead tell you which optimization flags to disable to sneak the broken code through safely.

bob wrote:When YOU use the definition that "UB means the compiler can do ANYTHING it chooses" the yes it is amusing. Because it can't even recognize all types of UB, and I am being a lousy programmer when I use something I know it can't detect, which means it will do exactly what I want, and I am happy. But we can't seem to get past that. You keep reciting, over and over, if you use UB it will break, somewhere, sometime. I am 100% certain the compiler can not even recognize UB most of the time...

He's just trying to get you to realize you are playing with fire. You are basically arguing that compilers are too dumb to break your code even if it does hit some undefined behavior here or there. That is already not true enough to rely on today, and its probably going to get less true over time. Its hard to quantify the actual risk, but anyone who cares about correct programs ought to just accept that they have to avoid UB. I mean, what purpose is there at all in having a language standard if programmers refuse to follow its rules or believe what it says?

The compiler vendors follow the standard, religiously. It is their bible. They might extend things here or there, they might provide (as gcc and clang do) some implementation-defined or unspecified semantics for certain things that the standard left as undefined. They don't really want to break your code just to be malicious. They do want to improve performance on the vast body of correct programs out there, and they seem to be willing to accept a bit of collateral damage among the time-bombed UB-relying programs in order to achieve that. Which should worry anybody who wants the code they write to work properly in the future too, not just today.

hgm · Post by **hgm** » Tue Dec 10, 2013 10:49 pm

syzygy wrote:That is not UB. (If another thread also accesses the variable and you're not using synchronisation primitives that guarantee that the two accesses are in a program-defined order, then you do have UB.)

Well, I obviously would not use such primitives, for efficiency reasons. The compiler will be able to see that. It will also see that I read and write a volatile variable non-atomically. So it could grab that as an excuse to 'optimize' away whatever essential and unambiguous task my code was specified to do.

It also allows for useful optimisations. For example, because signed overflow "does not happen", the compiler "knows" that certain values are positive and does not have to sign extend when converting between integer types.

Well, that is very nice. Except that it is plain stupid of the compiler to pretend that signed overflow does not happen if it can obviously see that it happens. There doesn't seem to be any excuse to prefer fiction over obvious fact. It should not make any optimizations that are based on an assumption that is obviously false.

Anyway, maybe you don't want to use such a compiler, but maybe your code will still one day be compiled on such a compiler. So it still might be a good idea to not have overflowing signed integers in your code.

Well, that seems a problem for the people that would use such a crappy compiler. They deserve all the trouble they are asking for!

bob · Post by **bob** » Tue Dec 10, 2013 11:29 pm

wgarvin wrote:
bob wrote:The problem is that "some" seem to think that "undefined behavior" means the compiler can do anything it wants. Abort the program. Ignore the operation and optimize it out. As opposed to "just do the right thing" where the race may well be perfectly acceptable with no risk at all.
That has been the status quo for more than 20 years now. Its not just "some" people who think this, all of the compiler experts and language lawyers interpret the spec that way. I tried to explain this to you for more than a week now, so I'm trying not to post about it anymore. Argumentum ad nauseam.

Is it stupid? Yes. But compiler writers sincerely believe they are allowed to do it, and sometimes they have a sincere and maybe-somewhat-convincing argument that it will improve performance across a large set of correct programs while maybe unfortunately breaking your incorrect program that invokes UB. This paper, which I linked once already in the first 30-page thread gives a bunch of examples of real-world security vulnerabilities caused by compilers optimizing invalid UB-causing code, while assuming it couldn't cause UB. Those are real bugs caused by programmers not understanding that the compiler might mangle their code because they didn't color inside the lines (or if we want to be nice, perhaps because they did understand that but just didn't realize they were coloring outside the lines. That happens a lot too.) Either way, they colored outside the borders of the page and the compiler cropped off that part of their picture without them realizing it.

The problem I keep coming back to is that they optimize as if no UB happens, but they do not know. They can't check. I consider that an unsafe optimization. When I was doing this stuff, that is not one I would consider unless it was defaulted off and you had to specifically enable it at your own risk. For example, the -fwrapv ought to be the default, and -fnwrapv be required to enable those overflow-sensitive optimizations. Above all else, I want a compiler to be consistent. And with overflow it is not. The "infinite loop" example can be fixed so that it works as expected, by hiding the constants from the compiler so that it can't see that an overflow will occur. Then it leaves it alone, it works, and to me that represents an inconsistent optimization problem that ought not be allowed. If it can see the value I am shifting, it can do one thing, if it can't it will do the expected thing. Whether the current compiler guys like it or not, there are many of us that consider that a problem.

Of course you would blame those bugs on the compiler vendor, right? It couldn't possibly be the fault of programmers who didn't understand UB, maybe even refused to believe the current truth about UB even after weeks of explanations and examples.

But if you did decide to complain to your compiler vendor, their response is going to be "thats undefined behavior, you were warned not to do that". I suppose if they were feeling helpful they might instead tell you which optimization flags to disable to sneak the broken code through safely.

bob wrote:When YOU use the definition that "UB means the compiler can do ANYTHING it chooses" the yes it is amusing. Because it can't even recognize all types of UB, and I am being a lousy programmer when I use something I know it can't detect, which means it will do exactly what I want, and I am happy. But we can't seem to get past that. You keep reciting, over and over, if you use UB it will break, somewhere, sometime. I am 100% certain the compiler can not even recognize UB most of the time...
He's just trying to get you to realize you are playing with fire. You are basically arguing that compilers are too dumb to break your code even if it does hit some undefined behavior here or there. That is already not true enough to rely on today, and its probably going to get less true over time. Its hard to quantify the actual risk, but anyone who cares about correct programs ought to just accept that they have to avoid UB. I mean, what purpose is there at all in having a language standard if programmers refuse to follow its rules or believe what it says?

Again...

a = b + c. How does one avoid UB there? will there even be any UB there? There's no incorrect programming there. No syntax violation. No semantic violation. Just a potential UB that MIGHT occur given the right data. I remember an early example from crafty, where I had this:

int nodes_searched;

and in the middle of search

nodes_searched++;

worked well until speeds reached a point I never expected to see on the PC and Crafty overflowed 2^31-1 and suddenly printed negative nodes searched. But it worked just fine, it didn't cause any crash or failure or anything other than a ?? when you saw it. So was that a bug one could anticipate?

2.1 billion nodes. If you search for 60 seconds, that would be 35M nodes per second. In 1995 was that reachable? Even imaginable on a PC? In 1994 Cray Blitz was hitting 7M on a 32 cpu Cray that sold for $70m. Poor programming. But over time, with faster cpus, and more of them, 35M is not so fast now. Should a program crash today due to a rational decision made 20 years ago? I don't see why. It is not as though I "ignored UB in 1995". There was no possibility of it. I think at the time I was maybe hitting 3-4K nodes per second. that is almost 9000 minutes of searching at that speed, almost a week. Did I ignore UB and invite the world to crash when that overflowed, or was it reasonable to believe that was big enough? Should the compiler just do a normal overflow and wrap, or should it crash, abort, produce a random number, format my hard drive, or send demons flying out of my nose? I have a pretty strong opinion on what is "the right thing to do."

The compiler vendors follow the standard, religiously. It is their bible. They might extend things here or there, they might provide (as gcc and clang do) some implementation-defined or unspecified semantics for certain things that the standard left as undefined. They don't really want to break your code just to be malicious. They do want to improve performance on the vast body of correct programs out there, and they seem to be willing to accept a bit of collateral damage among the time-bombed UB-relying programs in order to achieve that. Which should worry anybody who wants the code they write to work properly in the future too, not just today.

Back to strcpy(). Did they improve performance? No, they slowed it down. They broke any program that used a specific UB they could recognize, at a significant cost in performance...

hgm · Post by **hgm** » Tue Dec 10, 2013 11:29 pm

wgarvin wrote:They don't really want to break your code just to be malicious.

Then they must be incredibly incompetent. Because exiting with an error message when they detect a strcpy that would otherwise perfectly work, and for which they could see that it would perfectly work, does actually count as malicious sabotage. Like making the result of signed integer additions that can be seen at compile time to overflow anything else than the what the adder hardware does counts as malicious sabotage. Nothing in the standard forces them to do that; undefined behavior could mean anything. Also the obviously intended thing.

That they pester you with compile-time warnings when they detect such things, OK, I can live with that. But when I want to use it as a programmer, that should be my decision.

bob · Post by **bob** » Tue Dec 10, 2013 11:31 pm

hgm wrote:
wgarvin wrote:They don't really want to break your code just to be malicious.
Then they must be incredibly incompetent. Because exiting with an error message when they detect a strcpy that would otherwise perfectly work, and for which they could see that it would perfectly work, does actually count as malicious sabotage. Like making the result of signed integer additions that can be seen at compile time to overflow anything else than the what the adder hardware does counts as malicious sabotage. Nothing in the standard forces them to do that; undefined behavior could mean anything. Also the obviously intended thing.

That they pester you with compile-time warnings when they detect such things, OK, I can live with that. But when I want to use it as a programmer, that should be my decision.

And the obvious follow-up. Once they have recognized overlapping buffers, why not fix it by directly calling memmove() rather than just aborting? Now everyone is happy, the UB is gone, yet nothing was broken or harmed.

rbarreira · Post by **rbarreira** » Tue Dec 10, 2013 11:39 pm

bob wrote:
hgm wrote:
wgarvin wrote:They don't really want to break your code just to be malicious.
Then they must be incredibly incompetent. Because exiting with an error message when they detect a strcpy that would otherwise perfectly work, and for which they could see that it would perfectly work, does actually count as malicious sabotage. Like making the result of signed integer additions that can be seen at compile time to overflow anything else than the what the adder hardware does counts as malicious sabotage. Nothing in the standard forces them to do that; undefined behavior could mean anything. Also the obviously intended thing.

That they pester you with compile-time warnings when they detect such things, OK, I can live with that. But when I want to use it as a programmer, that should be my decision.
And the obvious follow-up. Once they have recognized overlapping buffers, why not fix it by directly calling memmove() rather than just aborting? Now everyone is happy, the UB is gone, yet nothing was broken or harmed.

On the other hand, the compiler made you fix your code, so you did get some benefit out of it. Sure, it could have done better by printing a better error message.

wgarvin · Post by **wgarvin** » Tue Dec 10, 2013 11:52 pm

hgm wrote:
wgarvin wrote:They don't really want to break your code just to be malicious.
Then they must be incredibly incompetent. Because exiting with an error message when they detect a strcpy that would otherwise perfectly work, and for which they could see that it would perfectly work, does actually count as malicious sabotage. Like making the result of signed integer additions that can be seen at compile time to overflow anything else than the what the adder hardware does counts as malicious sabotage. Nothing in the standard forces them to do that; undefined behavior could mean anything. Also the obviously intended thing.

That they pester you with compile-time warnings when they detect such things, OK, I can live with that. But when I want to use it as a programmer, that should be my decision.

I think you guys persistently assign the blame to the wrong source. You don't blame the programmers who wrote code that had undefined behavior and to which the language specs assign no semantics whatsoever. Instead, you blame the compilers? Just because something happens to compile and run on your current compiler does not make it a "working" program. Any program affected by the strcpy() change was by definition not a legal C or C++ program, at most it was a program in some similar-but-not-standardized language. You can say its a lousy state of affairs and I'll agree with you, but the fact of the matter is that all of those broken programs were already broken except for the complete fluke that the implementations they were previously compiled against happened not to trash their heap or overwrite their call stack or the like. Apple did those programmers a favor by pointing out their bug, and the change that made strcpy abort is just the last event in a series of events that made the program fail. Apple may have thrown the last pitch but the program was set up to fail by its lazy-or-harried-or-incompetent programmer. They didn't respect the API requirements of one of the most widely-known library functions on earth. Requirements that have existed in the exact same form for decades.

Those failed programs are the direct result of programmers not knowing or caring about undefined behavior. The glibc strcpy (like the memcpy before it) completely meets its obligations under the spec, and any programs aborting because of that change are doing that only because of their short-sighted and lazy programmers not knowing and following the API. If the users don't like it when their programs stop working, maybe they should give their custom to more reliable programmers in the future!

I mean, Bob can claim until he's blue in the face that the compiler and library vendors should protect him from the consequences of UB. But they're not going to do it, partly because its impossible in general and partly because its not actually their responsibility according to the spec. Its our job as programmers to write a safe and correct program--one that follows the rules of the language. Not some "common sense" version of the rules that a programmer has rattling around in their head, but the actual rules in the actual spec.

hgm · Post by **hgm** » Wed Dec 11, 2013 12:24 am

wgarvin wrote:I think you guys persistently assign the blame to the wrong source. You don't blame the programmers who wrote code that had undefined behavior and to which the language specs assign no semantics whatsoever. Instead, you blame the compilers?

Not exclusively. Of course the compilers are malicious, but they share the blame with the standard, which apparently offers them more leeway than they can handle.

Note that is is debatable what "the language" actually is. There used to be a time where the operator + working on ints did actually mean "perform an addition". That now some standards seem to be emerging that think they know better, basically renders what used to be a useful language into something completely useless. Basically these people are hijacking the language, to destroy it. They are nothing but terrorists!

Just because something happens to compile and run on your current compiler does not make it a "working" program. Any program affected by the strcpy() change was by definition not a legal C or C++ program, at most it was a program in some similar-but-not-standardized language.

More accurate would be to say that it was redefined to be no longer legal pseudo-C.

You can say its a lousy state of affairs and I'll agree with you, but the fact of the matter is that all of those broken programs were already broken except for the complete fluke that the implementations they were previously compiled against happened not to trash their heap or overwrite their call stack or the like. Apple did those programmers a favor by pointing out their bug, and the change that made strcpy abort is just the last event in a series of events that made the program fail. Apple may have thrown the last pitch but the program was set up to fail by its lazy-or-harried-or-incompetent programmer. They didn't respect the API requirements of one of the most widely-known library functions on earth. Requirements that have existed in the exact same form for decades.

Those failed programs are the direct result of programmers not knowing or caring about undefined behavior. The glibc strcpy (like the memcpy before it) completely meets its obligations under the spec, and any programs aborting because of that change are doing that only because of their short-sighted and lazy programmers not knowing and following the API. If the users don't like it when their programs stop working, maybe they should give their custom to more reliable programmers in the future!

I mean, Bob can claim until he's blue in the face that the compiler and library vendors should protect him from the consequences of UB. But they're not going to do it, partly because its impossible in general and partly because its not actually their responsibility according to the spec. Its our job as programmers to write a safe and correct program--one that follows the rules of the language. Not some "common sense" version of the rules that a programmer has rattling around in their head, but the actual rules in the actual spec.

Not really. Like with any poor but legal product, I can simply stop using it, and switch to a competative product that is of better quality. In the case of a compiler, I can use one that would allow me do non-atomic stores without using locks when I don't care about the undefinedness the hardware would produce from this, and add integers that I want to overflow. I cannot sue them for maliciously implementing the standard, but I can sure as hell switch to (or stick with) an implementation that follows the standard in a non-malicious way.

wgarvin · Post by **wgarvin** » Wed Dec 11, 2013 12:47 am

hgm wrote:.. now some standards seem to be emerging that think they know better, basically renders what used to be a useful language into something completely useless. Basically these people are hijacking the language, to destroy it. They are nothing but terrorists!
....
More accurate would be to say that it was redefined to be no longer legal pseudo-C.

Its not like this is entirely a recent phenomenon.. I've pointed out repeatedly in these threads that the same exact kind of discussions were happening in comp.std.c back in 1992 when that "demons may fly out of your nose" phrase first caught on. The only significant change in the last 10 years is the compilers are a bit smarter now. They're still hopelessly dumb in many situations of course, and most undefined behavior will sail right by because they don't know how to recognize it (in some cases its very-difficult-to-impossible). But once in a while they might. (Nasal demons!)

Maybe its easier to not worry about it. If your faithful compiler (gcc or whichever) is doing OK by you, then keep using it. Just keep upgrading it until one day it does something completely reprehensible, then downgrade and stick with that one. That strategy might work well enough for years, and by then perhaps the situation will have improved and some new "safe dialect" stuff might be available: new compiler options to disable funky/dangerous stuff, better-specified refinements of the standards, etc.

I mean, a lot of programmers use all of these optimizing compilers daily and many correct or nearly-correct programs are written and compiled with them, and mostly we get by, right? You might end up with latent UB bugs but you could just as easily have other kinds of latent bugs too. Debugging tools exist (such as the sanitizer stuff clang and now gcc have been adding) that can help find some of the UB bugs, just like valgrind can help find memory access bugs. UB can occasionally have nasty results, but if it happens infrequently enough then maybe a rational strategy is to just carry on like always until it bites you. If it compiles and it works, the risk of current problems is minimal and its probably only future problems you might need to worry about.

The next time you catch a compiler generating something totally weird from your code, asking yourself "am I relying on undefined behavior in this code?" early in the debugging process might save a bit of wasted time.

How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?