How could a compiler break the lockless hashing method?

bob · Post by **bob** » Wed Dec 11, 2013 5:25 pm

rbarreira wrote:
bob wrote: I thought someone had already pointed out that shifting a negative value left is "undefined behavior"?

But if a shift is dangerous, don't do it. Do the imul which works correctly. Or, of course, you can do a shld. BTW what do you know about (say) SAL and SHL?
Shifting left is fine, it's right shift that is undefined. Just compile the following program and see the assembly code for yourself:
Code: Select all
#include <stdio.h>

int main (void)
{
        int a,b;

        scanf ("%d", &a);
        printf ("%d\n", a*8);
        return 0;
}
bob wrote:

As to your question, if I do that on the arm, I will get a 32 bit value. So what? If I do it on Intel, I get the right value. How many ARMs are there compared to X86 in terms of numbers? Close enough to zero for a statistician to say "zero"?
2 billion CPUs in 3 months is not enough for you?

http://news.softpedia.com/news/ARM-Reco ... 6252.shtml

OK, my mistake. Then why does gcc do it using SAR to replace divide, and SAR to replace multiply, if a power of 2 is one multiplier? Why does x86 have a SAR instruction in the first place (shl and sal are same opcode, for the record, but fills on left with sign bit which makes perfect sense. As far as 2 billion cpus go, I assume you mean the cell phone market as opposed to general purpose computing platforms? How many are using C? vs Java? Etc...

BTW still looks like intel is 30x more sales overall. Based on their total sales by quarter.

bob · Post by **bob** » Wed Dec 11, 2013 5:29 pm

Rein Halbersma wrote:
bob wrote:
Rein Halbersma wrote:
bob wrote:
This difference between signed and unsigned is arbitrary. It serves no good purpose. It ADDS confusion to the programming process. And it is diametrically opposed to what the hardware does for both cases, which is to handle them exactly the same.
It might be a better world if signed integer overflow was implementation-defined rather than undefined, you'll hear no complaints from me if that was the case. But it isn't. So given that state of affairs, can you answer me why
Code: Select all
if (a + 1 > a)
is any better than
Code: Select all
if (a > INT_MAX - 1)
The former is undefined, and the latter is implementation-defined. Given that piece of knowledge (free of charge), why not use that?
I don't use either one. If I am concerned about overflow, I don't do it. If I can deal with wrap, I do. if a+1 is < 0, or if a > MAXINT-1 what do I do? If it wraps and I am ok with that, I won't do the test. if I do care, I might do the latter. Or I might do the former. Either should work equally well if the hardware is consistent and the compiler is not off in never-never land.
How do you know if a is close to INT_MAX? By doing a test. You do "a + 1 > a" which is undefined. You could either flag -fwrapv to your compiler, or use "a > INT_MAX - 1", either of which make it implementation defined and since you know your hardware, you're fine.

Doing the former test when a safe and equivalent test exist is a clear sign of stubborness. Good luck and sweet dreams.

I understand my code. I have a GOOD idea of the ranges of the numbers I am working with. Do I make an occasional mistake? Yes, as in the node counter back in 1995. Easy enough to fix, when the problem shows up. UNLESS the compiler decides to do something screwy instead...

I've never written code like that in my life. It was given by someone else, not me. But there is no reason it should not work. I have done a * b + c where it would overflow, but I knew it would, intended for it to do so, and it worked exactly as intended. Fortunately, back then, compilers were not so malicious and didn't use such unsafe optimizations, leaving the code to do what I expressed, not what the compiler decided I wanted.

wgarvin · Post by **wgarvin** » Wed Dec 11, 2013 5:29 pm

bob wrote:
wgarvin wrote:
rbarreira wrote:Bob, I'm starting to wonder if your account has been taken over by someone. Now you're displaying ignorance about basic C and integer numbers in one single post.
I know. Its like talking to a wall. He just thinks he knows better than everyone else, even when he's repeating the same arguments that have been refuted multiple times already in the same thread.
Sorry you feel that way. I simply have my opinion, developed over several years of compiler development. Answer me this:

If you run an ASM program on an original Intel pentium, and then you run the same code on the pentium PRO which supported OOE, and you get DIFFERENT answers, would you consider that acceptable? Why or why not?

I'd be disappointed, because one of the selling points Intel used to get people to buy Pentiums was its backward-compatibility with 8086,80286, 80386 and 80486 code. Nonetheless, there was the fdiv bug, leading to a limited recall of the affected parts. And there are minor differences and errata between processor generations, though Intel and AMD mostly did a great job keeping them backward-compatible.

bob wrote: If you compile the same source with -O0 and then with -O3 and you get different answers, would you consider that acceptable? Why or why not?

Well if my source was a correct, standard-conforming program then it might indicate a compiler bug. But 99 times out of 100 when this happens, its because my code has a bug in it, and in that situation its completely acceptable. If I'm accessing out of bounds in an array, that's undefined behavior. If I've overflowed some pointer arithmetic, that's undefined behavior. If I've shifted an N-bit int by N or more bits, that's undefined behavior. I don't expect my compiler to magically translate my broken source code into working programs in such cases, so I'm not surprised when it fails to do so.

bob wrote: Finally, do your two answers match? If not, why? If both are yes, we probably can't communicate at all. Semantics are semantics. I don't want the computer hardware changing them on the fly just so it can execute things faster. If both are "no" then we seem to be in perfect agreement.

So I answered Yes to the first q, and No to the second, as would any programmer who actually knows both x86 and C.

bob wrote: Because today, OOE is guaranteed to produce the exact same results as if the program is executed in-order. But compilers provably do produce different results with -O0 and -O3 as already discussed. I consider that ridiculous. Even more, I consider it poor compiler development.

If your program gives different results when compiled at -O0 and -O3 then either it is not a correct program (which is extremely common) or it is depending on extensions to the language, extra semantics that compiler vendor has provided or an extra spec/set of library semantics such as pthreads (this is also extremely common). In any case, as I have said several times already, only the programmer has the power to produce correct programs. If you write an incorrect program there is no way your compiler can magically fix it for you. If you think you "know" the language better than the compiler does, and refuse to believe your program is broken, then neither I nor your compiler can help you.

bob wrote:
He gives examples from x86 code of "how compilers ought to do it" that haven't been current for almost 20 years, like his ridiculous imul / idiv example. Come on, bob! There's more than one form of imul instruction in x86 assembly, and there has been for a long time.
RTFM. Why do you think that give us the one operand imul instruction? Because of the possibility of overflow and they give us a solution to prevent it completely between a successive multiply/divide pair. No you don't have to use 'em. And if you don't, you will get the expected 32 bit wrap whether you do imul or mul.

Right. And your C compiler doesn't have to use 'em, because of the way the * operator is specified in C. And when it evaluates "X * 2 / 2" and that "expected 32 bit wrap" occurrs, the result is not the original value of X. But because of the way the language spec is written, compilers are allowed to optimize it back down to "X", and they do.

And I recall either you or hgm objected that nobody would write "X * 2 / 2" on purpose anyway... but that point was also addressed by Chris Lattner in those blog posts I have quoted and linked here multiple times. Stupid expressions like that often result from language abstractions (macro expansion, inlining and with C++, templates). Also, give some thought to loop induction variables. They might not overflow at the same points as the original variables would, so the expression might have different values in those cases. But since any overflow of the original variables makes it into undefined behavior, the compiler can safely use loop induction variables in those cases.

I don't believe the compiler writers would bother to do this stuff if it didn't actually speed up some actual programs out there. Their job is to generate the best possible code while complying with the spec. Our job is to write programs, but if we want those programs to work reliably, we also have to comply with the spec.

bob wrote:
You can use the one you quoted, but it has severe restrictions on which registers it can use. The one compilers usually use is the 32 * 32 -> 32 bit form, because it can be used with any of the general-purpose registers. And also because they know they only need 32 bits of result, since thats how the C language has been specified, since forever. Compilers also sometimes implement the multiply using shifts and adds, and they definitely implement divide by a constant using other instructions besides idiv, because idiv can be slow. All of this has been true for decades.
You make my point. They don't do it the "right" way according to the hardware, they do it the way that is compatible with hardware that doesn't have that option. Even though most existing hardware today is capable of doing so. On most non-x86 boxes that have real registers, if you multiply x times an even-numbered register, you get a even+odd register pair for the product. If you multiply x times an odd-numbered register, you get a single register value that will overflow much sooner than the pair of registers.

But ignoring that, let's take a simple example:

x = 0x40000000 on a 32 bit box, but it is passed in to a function so that the compiler can't see the value.

x * 2 / 2 produces 0x40000000 as expected.

But if you use 0x40000000 * 2 / 2, broken constant folding turns that into 0xc0000000 which is not quite the same. Why does it do them differently? Shouldn't it at LEAST be consistent in mathematical operations. Does "+" imply some different operation when applied to constants vs variables??? If so, why?

Bob's stubbornness in a debate is pretty amazing. I can only imagine the cognitive dissonance it must cause to argue, and believe, such a mix of contradictory and nonsensical positions.
Feel free to cite my "nonsensical position". If you think consistency is nonsensical, I can see why we don't agree. But that doesn't mean my believe that consistency is reasonable to expect is a flawed concept.

I admit I was tempted to go through the whole thread and make a list of all the things that I considered nonsensical. I think it would be a long and tedious task, and of little value in the end.

I agree that consistency is a nice property where its possible to have it, and that compiler writers should maybe do more than they have been doing to try and protect us from the dark corners of undefined behavior in C and C++. But you also have to acknowledge that locking down some of those undefined behaviors will have some costs: performance will suffer a bit. I think there is growing recognition lately that other things besides performance are pretty important too (safety, reliability, ability to reason about the program's behavior by reading the source code, etc.) so hopefully over the next few years the overall situation will get better.

rbarreira · Post by **rbarreira** » Wed Dec 11, 2013 5:35 pm

wgarvin wrote: I don't believe the compiler writers would bother to do this stuff if it didn't actually speed up some actual programs out there. Their job is to generate the best possible code while complying with the spec. Our job is to write programs, but if we want those programs to work reliably, we also have to comply with the spec.

It's actually trivial to give examples of where not using Bob's favorite instruction (1-operand imul) results in better performance. The easiest would be a situation where the compiler is using the edx register and doesn't want it clobbered. A slightly more complicated one would be auto-vectorization of code, where the pmullud SSE instruction would be used.

bob · Post by **bob** » Wed Dec 11, 2013 5:35 pm

hgm wrote:
wgarvin wrote:He asserts over and over, that the optimizations enabled by UB aren't very useful. I would argue that if they didn't speed up at least some programs, the compiler guys wouldn't bother to implement them.
Well, this remains to be seen. Although it was really the topic of another thread, I completely fail to see who would benefit from translating a strcpy() with overlapping source/destination as abort(). It seems to hurt everyone. It does not only break backward compatibility with programs written for older standards, but it also seems to seriously hurt everyone that religiously sticks to the new standard. As it requires wasting time on a non-trivial overlap test, requiring run-time determination of the length of the string, before it can start copying it.

The sole motivation for making overlapping copy areas UB was actually to avoid the necessity for such tests. Given that the tests will now always be done anyway, it would have much better to specify that strcpy should work no matter what. People would get more for the same performance. The Apple implementation is intentionally malicious towards everyone.

So the Apple strcpy case demonstrates that the current standard can be and will be too easily abused to inflict global and universal suffering, and needs to be rephrased to make such practices impossible.

I've pointed this out several times. But all I get back is "but it is undefined behavior according to the standard." Doesn't HAVE to be undefined. If the strings don't overlap, just do the normal copy (although there is overhead to catch the condition of course). If it does overlap, convert strcpy(a, a+n) to memmove(a, a+n, strlen(a+n)+1) and suddenly you have done a good thing. Undefined behavior is gone, because now strcpy doesn't see undefined behavior (overlapping source/dest) since those are now passed to memmove() which allows that. No undefined behavior. No slower than the current version that tests for overlap but aborts. Everybody is happy. ALL code becomes more robust because the undefined behavior in strcpy() has been eradicated. Instead it just crashes. Makes sense to somebody, I suppose.

Thankfully doctors observe "do nothing to make the patient WORSE." Maybe the clang guys need a year in pre-med?

bob · Post by **bob** » Wed Dec 11, 2013 5:39 pm

Rein Halbersma wrote:
bob wrote: Sorry you feel that way. I simply have my opinion, developed over several years of compiler development.
Objection, facts not in evidence. Please show us which conforming C compiler you actually developed? Or any other compiler that was used by anyone except yourself, your students or coworkers.

Didn't write a C compiler. Did work on a FORTRAN compiler for a long time that was used in production on TI supercomputers. Did work on a basic compiler (not interpreter) that was used in production for several years both within academic labs and outside. I also wrote a (puke) Pascal compiler that was used for students to write assignments in for many years. Was not available on the machine we were using (A Xerox box) at the time. That enough?

Pick up a good compiler book. One of the classic tests for optimizers is to produce the exact same results as the unoptimized code. C does not meet that standard. It did in the past.

rbarreira · Post by **rbarreira** » Wed Dec 11, 2013 5:41 pm

bob wrote:
Rein Halbersma wrote:
bob wrote: Sorry you feel that way. I simply have my opinion, developed over several years of compiler development.
Objection, facts not in evidence. Please show us which conforming C compiler you actually developed? Or any other compiler that was used by anyone except yourself, your students or coworkers.
Didn't write a C compiler. Did work on a FORTRAN compiler for a long time that was used in production on TI supercomputers. Did work on a basic compiler (not interpreter) that was used in production for several years both within academic labs and outside. I also wrote a (puke) Pascal compiler that was used for students to write assignments in for many years. Was not available on the machine we were using (A Xerox box) at the time. That enough?

Pick up a good compiler book. One of the classic tests for optimizers is to produce the exact same results as the unoptimized code. C does not meet that standard. It did in the past.

When exactly did C meet that standard? Give me a compiler that has a working optimizer and I can easily make programs that give different results based on the optimization level.

bob · Post by **bob** » Wed Dec 11, 2013 5:42 pm

rbarreira wrote:
bob wrote: What I am asking of the compiler is "get out of the way and let the hardware do what it will do." Don't try to make assumptions.
That is not a reasonable request of you, because not only is there different hardware out there but even on x86 there are many different ways to calculate "x * 2":

- the compiler can use imul
- the compiler can use add
- the compiler can use shl
- the compiler can use lea
- the compiler could conceivably use SIMD (SSE) instructions, if it's multiplying several values in a loop.

Why do you assume only one of those is a valid response by the compiler?

If you notice * and / in same expression, imul with one operand is a natural choice. "do the right thing." My students do this on class assignments without having to be told, because they want to make sure that my input won't break their code. multiplying two numbers > 65535 is problematic, unless you then divide by something.

So, "do the right thing" as opposed to "do something." I mean, for an int multiply, it COULD use the floating point processor. Should it?

rbarreira · Post by **rbarreira** » Wed Dec 11, 2013 5:44 pm

bob wrote:
rbarreira wrote:
bob wrote: What I am asking of the compiler is "get out of the way and let the hardware do what it will do." Don't try to make assumptions.
That is not a reasonable request of you, because not only is there different hardware out there but even on x86 there are many different ways to calculate "x * 2":

- the compiler can use imul
- the compiler can use add
- the compiler can use shl
- the compiler can use lea
- the compiler could conceivably use SIMD (SSE) instructions, if it's multiplying several values in a loop.

Why do you assume only one of those is a valid response by the compiler?
If you notice * and / in same expression, imul with one operand is a natural choice. "do the right thing." My students do this on class assignments without having to be told, because they want to make sure that my input won't break their code. multiplying two numbers > 65535 is problematic, unless you then divide by something.

So, "do the right thing" as opposed to "do something." I mean, for an int multiply, it COULD use the floating point processor. Should it?

So you're asking compilers to detect the presence of divisions in an expression, and use a different instruction for multiplication if that is the case?

I should stop arguing with you but this is starting to become hilarious, so I probably won't.

AlvaroBegue · Post by **AlvaroBegue** » Wed Dec 11, 2013 5:48 pm

So you think mapping `x * 2 / 2' to `x' is a valid optimization, but mapping it to `(x + x) / 2' or `(x * 2) >> 1' (with the appropriate shift that propagates the sign bit) is not. Is that what you are saying? Also, do you think it's OK for the compiler to give you different results if you store `x * 2' in a local variable and then divide that one by 2?

I really think by now you don't know what you want anymore, and there is a slight chance you might end up seeing the point of leaving things undefined and letting the compiler give you different answers for different levels of optimization, given that your program invokes undefined behavior. But I am not keeping my hopes up.

How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?