How could a compiler break the lockless hashing method?

AlvaroBegue · Post by **AlvaroBegue** » Wed Dec 11, 2013 7:54 am

bob wrote:But as an example, let's take our good friend x86.

Done the "normal way" x * 2 / 2 will never overflow. It is trivial to prevent.

mov eax, x
imul 2
idiv 2

end up with x, thanks to imul putting product low order 32 bits in eax, and the high order bits in edx. The idiv takes it right back down to x.

Of course, if they choose to get "cute" and do the sloppier form that doesn't use a register pair to prevent overflow, then they just made a poor choice. By doing the right thing, they get the right answer. If they assume x * 2 / 2 = x, they are exactly right. And all is well.

So you think compilers should minimize surprises for the user, and yet you expect these functions to produce different answers:

Code: Select all

int f1(int x) {
  return x * 2 / 2;
}

int f2(int x) {
  int y = x * 2;
  return y / 2;
}

Or do you expect them to produce the same answer as long as nobody uses `y' for anything else?

I think at this point you don't even know what your position is.

hgm · Post by **hgm** » Wed Dec 11, 2013 9:14 am

syzygy wrote:You want to use undefined behavior? You get undefined behavior...

Wrong! What I want it that behavior is not undefined at all, when the hardware perfectly defines it. If there is to be any UB from an unambiguous instruction to the hardware what to do, it should be that the compiler writers get roasted over a low fire!

And indeed, I think the compiler writers are more to blame than the standard committee, although they definitely share blame. The compiler writers are malicious, while the standard committee was just naive, in not realizing they had to make the standard fool-proof to malicious interpretation and abuse.

The argument that this helps to optimize X*2/2 to X is a non-argument. No one would write that if they want it to mean X. If I ever would write anything like that it would be to exploit the hardware's response to overflow (such as forcing a trap), and replacing it by X would be the last thing I wanted.

As it turns out the standard is not adequate in a world where compiler writers are malicious. It should be repaired, my removing any UB according to the current definition, and replace it by 'hardware-defined behavior'.

Rein Halbersma · Post by **Rein Halbersma** » Wed Dec 11, 2013 9:22 am

hgm wrote: Note that is is debatable what "the language" actually is. There used to be a time where the operator + working on ints did actually mean "perform an addition". That now some standards seem to be emerging that think they know better, basically renders what used to be a useful language into something completely useless. Basically these people are hijacking the language, to destroy it. They are nothing but terrorists!

A quote from Regehr's blog http://blog.regehr.org/archives/970 which I would like to coin

Regehr's Law: A sufficiently advanced compiler is indistinguishable from an adversary.

But there is a flip side to that

Corollary to Regehr's Law: A sufficiently obtuse programmer is indistinguishable from a victim.

Rein Halbersma · Post by **Rein Halbersma** » Wed Dec 11, 2013 9:27 am

bob wrote: I've been programming since 1968. I've used integer overflow so many times (ever seen a linear congruential PRNG as just one example) and they have worked flawlessly. Now some compiler guys have decided that since it is undefined, even though it is an established and working programming paradigm, they can break it in any way they choose; throw the code out, make it abort, write an error warning out, or make demons fly out of your nose. All with something that has been used for so long it is not funny, and something that would work PERFECTLY still if the compiler guys had just left it alone.

And Michael Schumacher has been driving at 200 mph since he got out of his diapers. That doesn't make it OK for him to do that in a residential zone. Maybe in your neck of the woods you get away with it, on account of the sheriff knowing you from Saturday night drag racing. But try and do that in some village in Texas, and Chuck Norris will roundhouse kick you in the nasal passage so that demons will come flying out of said passage...

syzygy · Post by **syzygy** » Wed Dec 11, 2013 9:35 am

bob wrote:
wgarvin wrote:For some optimizations they don't have to know it occurs. When there's a program construct for which some possible executions have defined behavior and others have UB, they can ignore the UB possibility and just translate it into something that works correctly for the defined cases. If they always had to handle the undefined cases too they would often be forced to generate less efficient code. e.g. for a signed int X, they will translate "X*2 / 2" into "X" even though this will change the value of the expression if integer overflow occurred during the multiply expression X*2.

This was one of a bunch of examples given in this blog post by Chris Lattner that I linked, and which you claimed to have read:
Signed integer overflow: If arithmetic on an 'int' type (for example) overflows, the result is undefined. One example is that "INT_MAX+1" is not guaranteed to be INT_MIN. This behavior enables certain classes of optimizations that are important for some code. For example, knowing that INT_MAX+1 is undefined allows optimizing "X+1 > X" to "true". Knowing the multiplication "cannot" overflow (because doing so would be undefined) allows optimizing "X*2/2" to "X". While these may seem trivial, these sorts of things are commonly exposed by inlining and macro expansion. A more important optimization that this allows is for "<=" loops like this:

for (i = 0; i <= N; ++i) { ... }

In this loop, the compiler can assume that the loop will iterate exactly N+1 times if "i" is undefined on overflow, which allows a broad range of loop optimizations to kick in. On the other hand, if the variable is defined to wrap around on overflow, then the compiler must assume that the loop is possibly infinite (which happens if N is INT_MAX) - which then disables these important loop optimizations. This particularly affects 64-bit platforms since so much code uses "int" as induction variables.

It is worth noting that unsigned overflow is guaranteed to be defined as 2's complement (wrapping) overflow, so you can always use them. The cost to making signed integer overflow defined is that these sorts of optimizations are simply lost (for example, a common symptom is a ton of sign extensions inside of loops on 64-bit targets). Both Clang and GCC accept the "-fwrapv" flag which forces the compiler to treat signed integer overflow as defined (other than divide of INT_MIN by -1).
I did read that. Had read it previously in fact.

But as an example, let's take our good friend x86.

Done the "normal way" x * 2 / 2 will never overflow. It is trivial to prevent.

mov eax, x
imul 2
idiv 2

end up with x, thanks to imul putting product low order 32 bits in eax, and the high order bits in edx. The idiv takes it right back down to x.

Of course, if they choose to get "cute" and do the sloppier form that doesn't use a register pair to prevent overflow, then they just made a poor choice. By doing the right thing, they get the right answer. If they assume x * 2 / 2 = x, they are exactly right. And all is well.

You miss the point completely.

You have been arguing all the time: signed integer overflow should wrap.

So what happens if signed integer overflow wraps? Let's take 16-bit ints to keep the numbers smaller.
x = 20000
x * 2 = -25536
x * 2 / 2 = -12768

So if you want signed int overflow to wrap, then clearly x * 2 / 2 cannot be optimised to x.

If you want signed int overflow to wrap, then preventing overflow, as you are now suggesting the compiler should do, is simply very clearly dead wrong.

Rein Halbersma · Post by **Rein Halbersma** » Wed Dec 11, 2013 9:35 am

bob wrote:The optimization about a + 1 > a being optimized into "1" is an unsafe one.

No. The use of a + 1 > a for signed int a is unsafe. The optimization for sure will not help this single line very much, but the optimizer rules in a compiler are written for an entire universe of correct code, not for the hopefully soon extinct subset of non-conforming and unmaintained code.

The whole "the compiler is malicious" line of reasoning is based on a fundamental misunderstanding. It seems that you and HG regard C as a mapping of the hardware to C programming constructs, instead of the other way around. The reason that some program constructs in C lead to undefined behavior is that the mapping of C to hardware is not one-to-one. Some things cannot be defined in a portable way because of the enormous diversity in hardware (memory models, bus-width, cache coherence etc.).

hgm · Post by **hgm** » Wed Dec 11, 2013 10:12 am

Rein Halbersma wrote:The reason that some program constructs in C lead to undefined behavior is that the mapping of C to hardware is not one-to-one. Some things cannot be defined in a portable way because of the enormous diversity in hardware (memory models, bus-width, cache coherence etc.).

Yes, I understand that. And that is OK with me. But what is not OK with me is that that it should be allowed to use this as an excuse to dream up other responses than any existing or conceivable hardware would ever exhibit. "I could not know if you start counting the drawers of your filing cabinet at 0 or 1. So now you asked me to file something, I must put a bullet through your brain, and you have no right to complain about it, because you asked for it." I think anyone (or at least any judge) would agree that this in socially unacceptable behavior, and put anyone using that as defence away for murder, rather than for "assistance in a suicide". Well, this is exactly the same, so I don't see why it should be acceptable here. UB is supposed to stand for undefined behavior, not for undesired behavior!

I can see perfectly legitimate use for expressions like if(a+6 > a). Like testing if a is close enough to not overflowing for calculating what I want to calculate next, e.g. to prevent a sqrt() error on a negative operand. Assuming that it would be helpful in general to replace this by 6 > 0, and then 1, borders on moronic. If a programmer would have meant if(1), he would not have written the if() in the first place. It wrecks lots of useful applications of the construct, for no upside.

Henk · Post by **Henk** » Wed Dec 11, 2013 11:34 am

I think that the described lockless hashing method won't scale well if the number of threads increases for the chance of an error per hash table entry increases.

Rein Halbersma · Post by **Rein Halbersma** » Wed Dec 11, 2013 11:41 am

hgm wrote: I can see perfectly legitimate use for expressions like if(a+6 > a). Like testing if a is close enough to not overflowing for calculating what I want to calculate next, e.g. to prevent a sqrt() error on a negative operand. Assuming that it would be helpful in general to replace this by 6 > 0, and then 1, borders on moronic. If a programmer would have meant if(1), he would not have written the if() in the first place. It wrecks lots of useful applications of the construct, for no upside.

If you want to guard against overflow, why not do it explicitly:

Code: Select all

if (a > INT_MAX - 6)

Rein Halbersma · Post by **Rein Halbersma** » Wed Dec 11, 2013 12:29 pm

hgm wrote:
Rein Halbersma wrote:The reason that some program constructs in C lead to undefined behavior is that the mapping of C to hardware is not one-to-one. Some things cannot be defined in a portable way because of the enormous diversity in hardware (memory models, bus-width, cache coherence etc.).
Yes, I understand that. And that is OK with me. But what is not OK with me is that that it should be allowed to use this as an excuse to dream up other responses than any existing or conceivable hardware would ever exhibit. UB is supposed to stand for undefined behavior, not for undesired behavior!

I agree with this, and so do many people on the C++ Committee . It was linked here before, but here it goes again: the mailinglist for the C++ Study Group on Undefined Behavior (it is likely that their insights will also be closely followed by the C Committee)
http://www.open-std.org/pipermail/ub/

This study group is not a conspiracy of malicious or naive people trying to frame veteran programmers by breaking their seasoned code. If you read their posts, I think you'll get more appreciation for the tradeoffs that language designers and compiler writers are facing. Some quotes:

On the signed integer overflow debacle

Signed integer overflow is a fairly good example here, I believe:

(1) Is a compiler diagnostic acceptable? Yes.
(2) Is a run-time abort acceptable? Yes.
(3) Is an unspecified result value acceptable? Yes.
(4) Is it acceptable that your compiler changes the behavior
of unrelated code that follows the overflow? That's very surprising.

Giving compilers latitude to choose among 1-3 (depending on the
target audience) is fine, but, in my opinion, prohibiting option 4
would be an improvement.

On the performance benefits of assuming no UB is present

Consider a program that adds a constant to a signed integer
in a loop. Under the current model, the compiler can assume
that the variable is monotonically increasing, and can therefore
eliminate comparisons, which leads to simpler loops, which leads
to vectorization, which leads to implementation in GPUs, ....
Without that assumption, the chain of optimizations disappears.

On winning the hearts and minds of programmers

I am hoping SG12's recommendations and resolutions will be more than
just "educate programmers on undefined behavior". We have been educating
C++ programmers for more than 25 years about these issues, and I am not
quite sure we can categorically say we have won on this front -- in
fact, I think we haven't and we are not winning -- and if we did, we
haven't done it at the pace of sophistication of optimizers.

How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?