How could a compiler break the lockless hashing method?

bob · Post by **bob** » Thu Dec 12, 2013 9:26 pm

wgarvin wrote:
bob wrote:Somewhere in there or in another post I had pointed out this was not "undefined behavior" as someone else had claimed. It is "implementation dependent". Much of it was obviously "in jest".
Ok, fair enough. I think I was mostly reacting to this part:
bob wrote: It is undefined if I do it, but apparently not if the compiler does it. How exactly does that work, again???
and it seemed like you were implying those two scenarios were the same or even remotely similar. But since that is obviously absurd, you would never imply that. Right???

If the compiler says I can't do it, but it can, that would be just a TAD inconsistent, would it not? I mean, if it can use that behavior, why can't I? Why must the compiler intentionally break my attempt to use it?

As a compiler person, those two do not reconcile. If the specs for the language forbade shifting negative numbers right, that would be one thing. But if it is called "undefined behavior" then why on earth would the compiler be ok doing it, but not passing through MY request to do the same?

There are two ways to read part of the C specs.

1. What does the semantics say, and how close can I come to doing what the semantics say OR imply, on this hardware?

2. If the semantics say "undefined" than I will do anything BUT the thing that would make the statement do what the programmer obviously means, which the hardware obviously can do, but since the spec says "undefined behavior" I am going to do something else entirely.

I go with #1 every last time, always have, and have NEVER gotten a complaint about compilers I worked on, other than to report bugs.

syzygy · Post by **syzygy** » Thu Dec 12, 2013 10:59 pm

bob wrote:
syzygy wrote:
bob wrote:BTW the semantics are anything BUT "clearly spelled out in the standards." I don't consider "undefined behavior" to be "clearly spelled out."
The whole point of UB is that the semantics of a C program is undefined once it exhibits any kind of UB.

Particularly when things like signed integer overflow are clearly defined in the x86 specs.
Very relevant for the semantics of an x86 assembly program, but completely irrelevant for the semantics of a C program.

This distinction is important. Maybe I should repeat it.

Very relevant for the semantics of an x86 assembly program, but completely irrelevant for the semantics of a C program.

There.
Fine. Yet I am writing C on a compiler written for the X86, where I expect the program to run on x86, so I don't expect oddball things to break that work perfectly ON x86. And the compiler COULD do exactly that. It SHOULD do exactly that. But a few keep repeating the "but undefined behavior means anything can happen" which is actually a false statement much of the time.

If it is only a false statement "most of the time", then obviously it is a very true statement. You don't want to have a program that works as intended only "most of the time", do you?

What a C99 compiler SHOULD do is defined by the standard.

If you need a compiler that guarantees more than C99 (or some other C standard), then it would be a good idea to find a compiler (or a set of compiler options) that provides this guarantee.

Certainly integer overflow on any hardware from the last 30 years, excepting for Univac which has been dead for about that long, is not "undefined at all.

So what is then? Signed overflow wraps? You have already explained that that is not what you want in the case of x * 2 / 2.

But the very basic mistake you keep making is confusing x86 semantics with C semantics.

A C programmer should only be concerned with C semantics. He can rely on what is defined by the C standard. He can not rely on anything else (unless he restricts himself to a particular compiler that offers additional guarantees). The contract between the C programmer and the C compiler is the C standard.

A C compiler for x86 is concerned with the x86 semantics. The compiler relies on what is guaranteed by Intel's manuals. The contract between the C compiler and the x86 architecture is the Intel manual.

I realize that there are those of us in software development that have reasonable interpretations of how things SHOULD be done, and there are those that lather up when they see things that mention "undefined behavior" in the spec.

You really should not be so deluded as to believe that all your "adveraries" in this thread have no good idea of what a direct mapping from C syntax to x86 assembly could look like. We all understand this very very well. But we, adversaries, are able to think at a level abstract enough that we can distinguish between this naive machine model of C and the more mathematically defined promises provided by the C standard.

wgarvin · Post by **wgarvin** » Thu Dec 12, 2013 11:57 pm

syzygy wrote:But the very basic mistake you keep making is confusing x86 semantics with C semantics.

A C programmer should only be concerned with C semantics. He can rely on what is defined by the C standard. He can not rely on anything else (unless he restricts himself to a particular compiler that offers additional guarantees). The contract between the C programmer and the C compiler is the C standard.

A C compiler for x86 is concerned with the x86 semantics. The compiler relies on what is guaranteed by Intel's manuals. The contract between the C compiler and the x86 architecture is the Intel manual.

Yeah, this is mind-boggling. Its been explained at least 3 times in this thread--although I think your explanation here is the clearest. But this is CS 101 type stuff. How is it possible for programmers to work every day in a dangerous language like C and not be aware of this stuff? "The semantics of C" are spelled out in the language spec, which is the only standard you can trust if you want to write portable, correct code. [edit: I guess standards like POSIX and pthreads are widely portable too, so you might also be fine relying on those.]

During this ongoing discussion, Bob actually argued that portability was not one of his goals, and that Crafty now targets x64 and he doesn't care about any other platforms.
But I think its not really true, it was just him stubbornly trying to argue that his C code should have x64 semantics as if C were a "high level assembler".

Chris Lattner, primary author of LLVM/clang wrote: This blog post (the first in a series of three) tries to explain some of these issues so that you can better understand the tradeoffs and complexities involved, and perhaps learn a few more of the dark sides of C. It turns out that C is not a "high level assembler" like many experienced C programmers (particularly folks with a low-level focus) like to think, and that C++ and Objective-C have directly inherited plenty of issues from it.

bob wrote:If the compiler says I can't do it, but it can, that would be just a TAD inconsistent, would it not? I mean, if it can use that behavior, why can't I? Why must the compiler intentionally break my attempt to use it?

Because its a C compiler, and the code you are writing is C code, not x86 assembly code. If you want to write x86 assembly code, gas works just fine and you can then overflow signed integers to your heart's content, and the code will then have the semantics you want. But you apparently thought the C language would give you those semantics too, but unfortunately, it doesn't.

Chris Lattner, primary author of LLVM/clang wrote: Oversized Shift Amounts: Shifting a uint32_t by 32 or more bits is undefined. My guess is that this originated because the underlying shift operations on various CPUs do different things with this: for example, X86 truncates 32-bit shift amount to 5 bits (so a shift by 32-bits is the same as a shift by 0-bits), but PowerPC truncates 32-bit shift amounts to 6 bits (so a shift by 32 produces zero). Because of these hardware differences, the behavior is completely undefined by C (thus shifting by 32-bits on PowerPC could format your hard drive, it is *not* guaranteed to produce zero). The cost of eliminating this undefined behavior is that the compiler would have to emit an extra operation (like an 'and') for variable shifts, which would make them twice as expensive on common CPUs.

(emphasis in original)

bob · Post by **bob** » Fri Dec 13, 2013 5:12 am

syzygy wrote:
bob wrote:
syzygy wrote:
bob wrote:BTW the semantics are anything BUT "clearly spelled out in the standards." I don't consider "undefined behavior" to be "clearly spelled out."
The whole point of UB is that the semantics of a C program is undefined once it exhibits any kind of UB.

Particularly when things like signed integer overflow are clearly defined in the x86 specs.
Very relevant for the semantics of an x86 assembly program, but completely irrelevant for the semantics of a C program.

This distinction is important. Maybe I should repeat it.

Very relevant for the semantics of an x86 assembly program, but completely irrelevant for the semantics of a C program.

There.
Fine. Yet I am writing C on a compiler written for the X86, where I expect the program to run on x86, so I don't expect oddball things to break that work perfectly ON x86. And the compiler COULD do exactly that. It SHOULD do exactly that. But a few keep repeating the "but undefined behavior means anything can happen" which is actually a false statement much of the time.
If it is only a false statement "most of the time", then obviously it is a very true statement. You don't want to have a program that works as intended only "most of the time", do you?

A accept this kind of nonsense all the time. On a Cray, an int is 64 bits, on a pc, it is 32. On a Cray, a long is 64 bits. On my linux box on x86_64, a long is 64. On windows a long is 32. So there are already issues, and I deal with them without getting lost. I generally know the hardware characteristics of the platforms I run on, I know what their strengths are (vectors on a Cray) and when useful, I write things that will run on that machine efficiently. If it doesn't run so efficiently elsewhere, I figure out a way to compensate. IE we JUST got a popcnt instruction on X86. But I have been doing a popcnt operation since Crafty version 1.0 in 1994. I like having that level of control.

What a C99 compiler SHOULD do is defined by the standard.

Unfortunately the C99 standard has more holes in it than a wheel of swiss cheese. Undefined behavior, implementation-dependent behavior, etc. A standard is usually "specific". I can't imagine any of those undefined behaviors or "do whatever you want" options in the space shuttle specs, for example.

If you need a compiler that guarantees more than C99 (or some other C standard), then it would be a good idea to find a compiler (or a set of compiler options) that provides this guarantee.

Certainly integer overflow on any hardware from the last 30 years, excepting for Univac which has been dead for about that long, is not "undefined at all.
So what is then? Signed overflow wraps? You have already explained that that is not what you want in the case of x * 2 / 2.

It wraps in ALL cases. The only question is, does it wrap at 32 or more bits? I'd like to see it do the best the hardware can offer. I suppose C COULD just use 16 bit values on x86, since that is all the 286 can do. But there is more in the hardware.

But the very basic mistake you keep making is confusing x86 semantics with C semantics.

A C programmer should only be concerned with C semantics. He can rely on what is defined by the C standard. He can not rely on anything else (unless he restricts himself to a particular compiler that offers additional guarantees). The contract between the C programmer and the C compiler is the C standard.

For a naive or sloppy programmer, that is likely true. However some of us want as much as we can get. For example, BSF/BSR as soon as they became available on X86, rather than waiting a few years for the compiler guys to realize that was useful and adding some non-portable intrinsics. I suppose I should just find a version of gcc that does what I like, and save that source forever, before the authors "improve" it to the point where I consider it unusable. I've never been shy about using asm in a C program when the compiler didn't give me a way to access some hardware feature that C had no semantics for. So trying to hide behind this "I don't care about the underlying machine" sounds like something Java programmers say over and over while breathing incense. I DO care about the underlying machine.

Check this out from the Wiki:

wiki wrote: n computing, C (/ˈsiː/, as in the letter C) is a general-purpose programming language initially developed by Dennis Ritchie between 1969 and 1973 at AT&T Bell Labs.[4] Like most imperative languages in the ALGOL tradition, C has facilities for structured programming and allows lexical variable scope and recursion, while a static type system prevents many unintended operations. Its design provides constructs that map efficiently to typical machine instructions, and therefore it has found lasting use in applications that had formerly been coded in assembly language, most notably system software like the Unix computer operating system.

"we" never wanted C to get in the way of getting to the hardware, we wanted it to help. You seem to be on the opposite side of the street, continually repeating over and over "it is about the abstract machine not the underlying hardware." I've been programming in C far longer than you, long enough to remember when the above was the primary reason for the popularity of the language. You probably dislike casts because they can get you into trouble. I like them because while they can get you in trouble if you are sloppy or don't know what you are doing, they can save some effort if you use them correctly.

A C compiler for x86 is concerned with the x86 semantics. The compiler relies on what is guaranteed by Intel's manuals. The contract between the C compiler and the x86 architecture is the Intel manual.

I realize that there are those of us in software development that have reasonable interpretations of how things SHOULD be done, and there are those that lather up when they see things that mention "undefined behavior" in the spec.
You really should not be so deluded as to believe that all your "adveraries" in this thread have no good idea of what a direct mapping from C syntax to x86 assembly could look like. We all understand this very very well. But we, adversaries, are able to think at a level abstract enough that we can distinguish between this naive machine model of C and the more mathematically defined promises provided by the C standard.

See the quote from the Wiki. Many of us do NOT agree with you. But keep repeating "it is about the standard" over and over. Perhaps one day it will be.

bob · Post by **bob** » Fri Dec 13, 2013 5:19 am

wgarvin wrote:
syzygy wrote:But the very basic mistake you keep making is confusing x86 semantics with C semantics.

A C programmer should only be concerned with C semantics. He can rely on what is defined by the C standard. He can not rely on anything else (unless he restricts himself to a particular compiler that offers additional guarantees). The contract between the C programmer and the C compiler is the C standard.

A C compiler for x86 is concerned with the x86 semantics. The compiler relies on what is guaranteed by Intel's manuals. The contract between the C compiler and the x86 architecture is the Intel manual.
Yeah, this is mind-boggling. Its been explained at least 3 times in this thread--although I think your explanation here is the clearest. But this is CS 101 type stuff. How is it possible for programmers to work every day in a dangerous language like C and not be aware of this stuff? "The semantics of C" are spelled out in the language spec, which is the only standard you can trust if you want to write portable, correct code. [edit: I guess standards like POSIX and pthreads are widely portable too, so you might also be fine relying on those.]

Apparently your mind is quite easy to boggle. I've already quoted this once, here it goes again. From the Wiki (and NO I did not write it):

wiki wrote: n computing, C (/ˈsiː/, as in the letter C) is a general-purpose programming language initially developed by Dennis Ritchie between 1969 and 1973 at AT&T Bell Labs.[4] Like most imperative languages in the ALGOL tradition, C has facilities for structured programming and allows lexical variable scope and recursion, while a static type system prevents many unintended operations. Its design provides constructs that map efficiently to typical machine instructions, and therefore it has found lasting use in applications that had formerly been coded in assembly language, most notably system software like the Unix computer operating system.

So it looks like MOST of the long-term C programmers have quite a different view from those of you that are apparently a product of the Java abstract-machine crowd. What kind of language would target assembly language programming? One that lets you get to the hardware to do the things the hardware is capable of, WITHOUT getting in the way and being an obstructionist. The above quote contains the basic justification for C for the past 40+ years. I will agree that the java genre seems happy with pointer less, mindless, try to prevent the programmer from having any idea about the underlying hardware. Which means C gives too much freedom and has to be fixed to be more like its useless Java cousin. Useless for anyone wanting to write high-performance code.

During this ongoing discussion, Bob actually argued that portability was not one of his goals, and that Crafty now targets x64 and he doesn't care about any other platforms.
But I think its not really true, it was just him stubbornly trying to argue that his C code should have x64 semantics as if C were a "high level assembler".

You need to get a refund on your mind-reading training. Have you looked at Crafty? Did you look at the functions to do popcnt, MSB and LSB? DO they appear to be "portable"? When I had an alpha here, I just wrote the correct asm routines for that. When I was using a sun SPARC ditto. For the Cray. Ditto.

Chris Lattner, primary author of LLVM/clang wrote: This blog post (the first in a series of three) tries to explain some of these issues so that you can better understand the tradeoffs and complexities involved, and perhaps learn a few more of the dark sides of C. It turns out that C is not a "high level assembler" like many experienced C programmers (particularly folks with a low-level focus) like to think, and that C++ and Objective-C have directly inherited plenty of issues from it.

bob wrote:If the compiler says I can't do it, but it can, that would be just a TAD inconsistent, would it not? I mean, if it can use that behavior, why can't I? Why must the compiler intentionally break my attempt to use it?
Because its a C compiler, and the code you are writing is C code, not x86 assembly code. If you want to write x86 assembly code, gas works just fine and you can then overflow signed integers to your heart's content, and the code will then have the semantics you want. But you apparently thought the C language would give you those semantics too, but unfortunately, it doesn't.

Chris Lattner, primary author of LLVM/clang wrote: Oversized Shift Amounts: Shifting a uint32_t by 32 or more bits is undefined. My guess is that this originated because the underlying shift operations on various CPUs do different things with this: for example, X86 truncates 32-bit shift amount to 5 bits (so a shift by 32-bits is the same as a shift by 0-bits), but PowerPC truncates 32-bit shift amounts to 6 bits (so a shift by 32 produces zero). Because of these hardware differences, the behavior is completely undefined by C (thus shifting by 32-bits on PowerPC could format your hard drive, it is *not* guaranteed to produce zero). The cost of eliminating this undefined behavior is that the compiler would have to emit an extra operation (like an 'and') for variable shifts, which would make them twice as expensive on common CPUs.
(emphasis in original)

bob · Post by **bob** » Fri Dec 13, 2013 5:28 am

abulmo wrote:
syzygy wrote:
bob wrote:BTW the semantics are anything BUT "clearly spelled out in the standards." I don't consider "undefined behavior" to be "clearly spelled out."
The whole point of UB is that the semantics of a C program is undefined once it exhibits any kind of UB.

Particularly when things like signed integer overflow are clearly defined in the x86 specs.
Very relevant for the semantics of an x86 assembly program, but completely irrelevant for the semantics of a C program.

This distinction is important. Maybe I should repeat it.

Very relevant for the semantics of an x86 assembly program, but completely irrelevant for the semantics of a C progrwam.

There.
It looks like the misunderstanding comes from the important source of undefined behaviour being the binary representation of negative numbers. Actually the problem is the possibility offered by the C standard of trap representations. An integer overflow can lead to a trap representation and abort the program, for example. Or anything else allowed by undefined behaviour. gcc invoked the -ftrapv option is an example of such a compiler.

That appears to not even work on gcc 4.7.3... I wrote a small test that overflowed multiple times, and absolutely nothing happened other than the printf() results showed the effects of the expected overflows...

wgarvin · Post by **wgarvin** » Fri Dec 13, 2013 5:40 am

Its design provides constructs that map efficiently to typical machine instructions, and therefore it has found lasting use in applications that had formerly been coded in assembly language, most notably system software like the Unix computer operating system.

Correct. C provides constructs that map efficiently TO the target hardware. Notice what that description does not say: it does not say that every aspect of the target hardware necessarily gets such an efficient mapping from a C construct, or any mapping at all.

Example: Just because you can shift bits into and out of the "sign bit" of a register in x86 assembly language, doesn't mean you can do it portably in C with a signed int.

Example: Just because loading from a zero address will trap on your x86 chip on your favorite operating system/memory model, doesn't mean you can dereference a NULL pointer in C and expect it to trap at that point in the program. Dereferencing NULL is not defined to trap, it is actually UB, and may instead summon the nasal demons.

Example: Several C compilers recognize idioms that they can turn into a 'rotate' instruction. Which is nice and friendly of them, since the C language doesn't have a rotate operator of any kind. C has shift operators but no rotate operators, even though nearly every machine capable of running compiled C code has both shift and rotate instructions.

Anyway, C does provide useful constructs that map efficiently to the target hardware, but in the interests of portability and maximum performance, it also left a lot of stuff undefined. And fixing that without tanking the performance is perhaps more difficult that you make it sound.

wgarvin · Post by **wgarvin** » Fri Dec 13, 2013 5:54 am

bob wrote: That appears to not even work on gcc 4.7.3... I wrote a small test that overflowed multiple times, and absolutely nothing happened other than the printf() results showed the effects of the expected overflows...

Broken compiler feature... bleah. -ftrapv sounds like it is not reliable on gcc.

Chris Lattner's blog, may 2011 wrote:Clang also fully supports the -ftrapv flag (not to be confused with -fwrapv) which causes signed integer overflow bugs to trap at runtime (GCC also has this flag, but it is completely unreliable/buggy in my experience).

What about -fwrapv ? gcc has that too, its supposed to just make signed integer overflows defined to wrap around like you (perhaps) want. So it inhibits some loop optimizations etc., but it gives it a defined behavior (as long as you don't mind the code not being portable to other compilers of course). [Edit: it probably makes X * 2 / 2 emit those math ops 'as written' too. unless the compiler notices it can fold the 2 / 2 first... I guess I forgot throughout the posts about that toy example, that it could just fold the constant part first.. or else I'm tired and not thinking clearly.. hmm.]

wgarvin · Post by **wgarvin** » Fri Dec 13, 2013 6:24 am

bob wrote:So it looks like MOST of the long-term C programmers have quite a different view from those of you that are apparently a product of the Java abstract-machine crowd. What kind of language would target assembly language programming? One that lets you get to the hardware to do the things the hardware is capable of, WITHOUT getting in the way and being an obstructionist.

You keep saying "Java" and "abstract-machine" together, but the abstract machine stuff in the C language specification pre-dates the existence of Java by many years. So I'm not quite sure what your point is there. Maybe the "Java" is a reference to some kind of "dumbing down" of C? If anything, the opposite is happening--the compilers are getting cleverer and it seems to me that programmers have a greater need now to understand difficult topics like UB than they did in the past. Even though the UB stuff puts a few things out of bounds, C does still let you access a lot of the power of the underlying hardware.

I learned x86 assembly before I learned C or C++, then I learned a bit of C and then early C++ (back then, C++ was not nearly the abomination of a language that it was today). I didn't really learn Java until my first job after university, where I used it every day for about 4 years. For the last 7 years, it's been C++ on somewhat limited platforms: game consoles. I've spent countless days trying to save 100 kb of code size, or shave a few microseconds off of something that gets executed 30 times a second, or tweak some float code to use vector intrinsics just to avoid a performance-killing load-hit-store. So I appreciate the value in being able to get close to the hardware from inside a high-level language like C or C++, I really do! But portability is also something we must have. Some of my code has to work on 6 different compilers for 4 different CPU architectures, so using gcc-specific extensions or something, usually won't fly. And of course it needs to be as fast as possible, and if those compilers will do something extremely clever for me then I'll take it even if the price is that I have to be careful to avoid >191 kinds of undefined behavior or really weird things might occur.

wgarvin · Post by **wgarvin** » Fri Dec 13, 2013 7:09 am

bob wrote:You need to get a refund on your mind-reading training. Have you looked at Crafty? Did you look at the functions to do popcnt, MSB and LSB? DO they appear to be "portable"? When I had an alpha here, I just wrote the correct asm routines for that. When I was using a sun SPARC ditto. For the Cray. Ditto.

I've looked at Crafty before, though not lately. Its a nice program by the way, pretty readable. popcnt,MSB etc., sure.

Earlier this evening I happened to be reading the SMP thread from 2009 when you discovered an atomicity problem with the hash, unless I am mistaken that was the birth of the XOR hashing trick, right? You did seem to care about portability back then, at least a little bit. I suggested you could atomically store the 16-byte entries using SSE and you rejected the idea because it wasn't portable!

bob in 2009 wrote:The problem with XMM is that I am using C and do not want to add more x86 assembly beyond the BSF/BSR code already used. And C doesn't let me work directly with the XMM registers to do this portably.

I know, I know... there was also the 32-byte pawn hash to worry about... I can't resist teasing a bit, sorry!

How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?