How could a compiler break the lockless hashing method?

Rein Halbersma · Post by **Rein Halbersma** » Sun Dec 08, 2013 11:24 pm

syzygy wrote:
5.1.2.3 ad 2 wrote:Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression may produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place. (A summary of the sequence points is given in annex C.)
So I don't think eliminating the xor-operations is allowed by the C standard.

It doesn't have to eliminate the XOR when storing into the hash table (which indeed is a side effect), all it needs to do is to cancel this XOR against the other XOR when computing the expression inside matching_signature. In other words, the compiler could optimize matching_signature(e) to always return true, even without touching any of the side effects.

bob · Post by **bob** » Sun Dec 08, 2013 11:26 pm

Rein Halbersma wrote:
bob wrote:
syzygy wrote:
bob wrote:I think that one problem is that some here seem to think the compilers have almost god-like skills in optimizing. In reality, a good human ASM programmer can whip any optimizing compiler known or likely to be known.
No, your problem is that we are able to comprehend and accept the meaning and implications of the C standard, and that you seem to be plagued by demons and other types of UB.
As I said... we are talking about the C standard as written. The standard (new one) mentions race conditions, something it can't even detect. Seems like a really good standard to describe the behavior of something it can't even recognize???
You are confusing cause and effect here. The Standard defines behavior in general as "external appearance or action", and undefined behavior as "behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements." So the cause is the nonportable program, and the effect is undefined behavior.

Nowhere does the Standard require that the nonportable or erroneous program construct has to be detected at compile time. But since no requirements are imposed, a compiler can assume that the program does not contain such constructs (signed overflow, data races, overlapping ranges in strcpy(), what have you). That assumption combined with aggressive optimizations can lead to surprising output. Such surprises are known as undefined behavior. It is not just one out of a preconceived set of reasonable outcomes, it can be any outcome.

Unfortunately _I_ didn't start the confusion. Rewind just a bit.

Integer overflow is an undefined behavior operation, correct?

Undefined behavior can be ANYTHING including demons flying out of your nose, correct? (I do not remotely accept that, but it has been often-quoted in these two discussions).

Therefore integer overflow can cause demons to fly out of your nose? Even though the hardware gladly does the add or subtract, and produces the same answer given the same two values, every time it is done?

Or are we NOW going to define "undefined behavior" in a rational way, which I believe is exactly what was intended originally, that is "since integer overflow can produce different results on different hardware that represents negative numbers in different ways, we are going to do the right thing, and do the add, and let the chips fall where they may." I believe that was the intent, and in general, it is how most compilers work. Yes, I saw the overflow loop example where some compilers optimize the overflow away leaving an infinite loop, and others optimize the loop away presumably, leaving nothing. The RIGHT thing is to just do the add, and the program magically does EXACTLY what the author did with no confusion. There are NOT multiple ways any current machine stores a negative integer, so allowing for the possibility seems silly.

For races, I have been using shared memory parallel boxes since the middle 70's, to date not one shared memory box has handled races any differently than what we see today, "let 'em race". If it is a problem "let 'em use a lock to eliminate the race." So races are not a portability issue. Races are only possible through shared things, which today leaves only memory and disks, and since in chess we don't use disks for hashing, memory is it. And the compiler has no flexibility whatsoever when I tell it to store two values in memory, other than it can defer the stores for a while if it wants, or it can move the stores up a bit, both limited only by not violating program data flow requirements. But compilers have ALWAYS done that. Many pre-load memory values at the top so that by the time you need 'em, they are ready. No side-effect that hurts there. Ditto to storing later, so that all the stores don't happen at once and create a backlog that stalls a pipe. But store it must, if I tell it to do so, unless it has access to ALL of the code and can determine that the value is not needed ever again, so that it can avoid the store. Not an issue in a large program that is compiled as multiple files.

bob · Post by **bob** » Sun Dec 08, 2013 11:28 pm

Rein Halbersma wrote:
syzygy wrote:
5.1.2.3 ad 2 wrote:Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression may produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place. (A summary of the sequence points is given in annex C.)
So I don't think eliminating the xor-operations is allowed by the C standard.
It doesn't have to eliminate the XOR when storing into the hash table (which indeed is a side effect), all it needs to do is to cancel this XOR against the other XOR when computing the expression inside matching_signature. In other words, the compiler could optimize matching_signature(e) to always return true, even without touching any of the side effects.

How? It can't see both sides of that. One is done in one block of code, the other is done in another block of code, separated by billions of instruction execution cycles...

The match and the store are not done in the same place.

Rein Halbersma · Post by **Rein Halbersma** » Sun Dec 08, 2013 11:37 pm

bob wrote: Integer overflow is an undefined behavior operation, correct?

To be precise: signed integer overflow is a nonportable program construct that leads to undefined behavior.

Undefined behavior can be ANYTHING including demons flying out of your nose, correct? (I do not remotely accept that, but it has been often-quoted in these two discussions).

The Standard imposes no requirements, no. Of course, there's the laws of physics which seem to preclude demons in general, and demons flying out your nose in particular. In other words: demons are a figure of speech, no requirement the precise term.

Therefore integer overflow can cause demons to fly out of your nose? Even though the hardware gladly does the add or subtract, and produces the same answer given the same two values, every time it is done?

If the hardware is given the chance to do the add or subtract, yes, you get the same answer every time it is done, provided you have deterministic hardware. The problem is that the Standard does not impose any requirement on a compiler to give the hardware that instruction, and we have given plenty of examples where statements like a loop increment i+=i; were optimized away.

Or are we NOW going to define "undefined behavior" in a rational way, which I believe is exactly what was intended originally, that is "since integer overflow can produce different results on different hardware that represents negative numbers in different ways, we are going to do the right thing, and do the add, and let the chips fall where they may." I believe that was the intent, and in general, it is how most compilers work.

Yes, it is the intent of a good compiler to give you the best quality of machine code possible. That includes aggressive optimizations. But guess what? Optimizations can be more successful if one can assume no nonportable program constructs are present. This manifests itself as undefined behavior. Not to annoy you, but to help you in the cases where your program is correct.

bob · Post by **bob** » Sun Dec 08, 2013 11:49 pm

Rein Halbersma wrote:
bob wrote: Integer overflow is an undefined behavior operation, correct?
To be precise: signed integer overflow is a nonportable program construct that leads to undefined behavior.

Undefined behavior can be ANYTHING including demons flying out of your nose, correct? (I do not remotely accept that, but it has been often-quoted in these two discussions).
The Standard imposes no requirements, no. Of course, there's the laws of physics which seem to preclude demons in general, and demons flying out your nose in particular. In other words: demons are a figure of speech, no requirement the precise term.

Therefore integer overflow can cause demons to fly out of your nose? Even though the hardware gladly does the add or subtract, and produces the same answer given the same two values, every time it is done?
If the hardware is given the chance to do the add or subtract, yes, you get the same answer every time it is done, provided you have deterministic hardware. The problem is that the Standard does not impose any requirement on a compiler to give the hardware that instruction, and we have given plenty of examples where statements like a loop increment i+=i; were optimized away.

Doesn't address my issue. Given this C instruction:

a = b + c;

Can the compiler optimize that away or not? I'm not saying what the values of b and c are, they are computed at run-time. So will it for ALL cases add 'em or will it separate the cases and only add when they don't overflow, otherwise skip the add altogether?

Now what about this example:

int a=0x40000000;
int b=0x40000000;
...

c = a + b;

What now? It SEES that we have an overflow. If it optimizes them away, isn't that somewhere north of "inconsistent"? Suppose instead of the above, a and b are declared/initialized where the compiler can't see 'em, and it only has the declarations available. It behaves differently? Should it?

Compilers are not supposed to be impediments to writing good code, nor are they supposed to introduce inconsistent or confusing behavior. The standards guys were just lousy at writing standards, wherever they couldn't agree they either made it undefined behavior, implementation dependent, or left up to the compiler authors.

Or are we NOW going to define "undefined behavior" in a rational way, which I believe is exactly what was intended originally, that is "since integer overflow can produce different results on different hardware that represents negative numbers in different ways, we are going to do the right thing, and do the add, and let the chips fall where they may." I believe that was the intent, and in general, it is how most compilers work.
Yes, it is the intent of a good compiler to give you the best quality of machine code possible. That includes aggressive optimizations. But guess what? Optimizations can be more successful if one can assume no nonportable program constructs are present. This manifests itself as undefined behavior. Not to annoy you, but to help you in the cases where your program is correct.

Note that race is not "non-portable". Nor is integer overflow. I've been programming in asm since 1968. I know of no existing or even past machines of any significance that did not use 2's complement, once mainstream computing went binary. 2's complement is taught as the standard in architecture books even.

As a concluding remark, cite one reasonable example where optimizing an overflow away is a good idea. At worst, an overflow will break a program by producing a bad result. Why remove that. At best, an overflow is expected and causes no problems whatsoever. Why remove that. How often can a compiler actually RECOGNIZE that an overflow will happen? Only when it has access to all the constant values and computations. Which means in a tiny fraction of one percent of the total number of programs. Pass one of those numbers above to a function via a pointer, overflow detection can not be done at compile time. There are good optimizations, and there are stupid optimizations. Even spending the time to eliminate an integer overflow seems like wasted effort with absolutely no benefit.

syzygy · Post by **syzygy** » Sun Dec 08, 2013 11:52 pm

bob wrote:
syzygy wrote:Unless I'm mistaken, the C standard allows the compiler to load a value multiple times. Since the C99 standard does not know about threads, it may load the hash entry, do the xor-trick to verify that the entry can be used, and load the hash entry again. Between the two loads the hash entry may change its content due to other threads. There you go.
Doesn't break a thing, however.

Think about it.

I can only assume you did not read.

A compiler can legally rewrite:

Code: Select all

    word1 = htable->word1;
    word2 = htable->word2 ^ word1;
    if (word2 == temp_hashkey)
      break;

as

Code: Select all

  word2 = htable->word2 ^ htable->word1;
  if (word2 == temp_hashkey)
    break;
  word1 = htable->word1;

See the problem?

It can also choose to read htable->word1 at a later point in HashProbe() instead of accessing the local variable word1.

syzygy · Post by **syzygy** » Sun Dec 08, 2013 11:55 pm

bob wrote:
syzygy wrote:
bob wrote:You can not remove XORs on volatile data. You can't remove anything. If the XORs happened back to back, with NO intervening code, you STILL can not eliminate them if the target is a volatile variable.
So is crafty's hashtable declared as volatile? I don't see it.
For hashing, no. Why?

Duh. You're saying "there is no problem for volatile data". What kind of argument is that if we are not talking about volatile data?

syzygy · Post by **syzygy** » Mon Dec 09, 2013 12:00 am

bob wrote:I compile search.c, hash.c and other files SEPARATELY. How can the compiler be certain there are no locks around HashStore() when it is called from search (which it can't see when compiling hash.c) and around HashProbe() when it is called?

Apart from this being a non-argument given that compilers nowadays can easily optimise across compilation units, you have apparently forgotten about crafty.c:

Code: Select all

/* last modified 01/18/09 */
/*
 *******************************************************************************
 *                                                                             *
 *   This module is designed for the most efficient compiling, as it includes  *
 *   all of the source files into one large wad so that the compiler can see   *
 *   all function calls and inline whatever is appropriate.  The includes are  *
 *   loosely ordered so that the most common functions occur first, to help    *
 *   with cache layout when the code is actually loaded.                       *
 *                                                                             *
 *******************************************************************************
 */
#include "search.c"
#include "thread.c"
#include "repeat.c"
#include "next.c"
#include "killer.c"
#include "quiesce.c"
#include "evaluate.c"
#include "movgen.c"
#include "make.c"
#include "unmake.c"
#include "hash.c"
...

Rein Halbersma · Post by **Rein Halbersma** » Mon Dec 09, 2013 12:03 am

bob wrote: As a concluding remark, cite one reasonable example where optimizing an overflow away is a good idea.

Easy: http://www.airs.com/blog/archives/120

Code: Select all

int f(int i) { int j, k = 0; for (j = i; j < i + 10; ++j) ++k; return k; }

f() should return 10 for every value of i, agreed? Well, under the assumption that signed overflow cannot occur, the entire loop will be optimized and the constant 10 will be folded into the definition of f(). Not so with -fwrapv, because then the compiler has to emit special code for i = 0x7ffffff8.

Remember, compiler writers are worrying about generating the best possible code for the entire universe of correct programs, not for those programs that use undefined behavior and hope that their sins will never catch up with them.

syzygy · Post by **syzygy** » Mon Dec 09, 2013 12:06 am

hgm wrote:If the hash table is accessed through a pointer (as is usual for allocated memory), it cannot even know if the pointer points to memory that is shared with other processes. Therefore it can never mess with the stores, and store anything other than the exact thing that the program specified.

A C99-compliant compiler does not need to care about other processes and shared memory. The standard does not know about those.

(But I do agree it must store the expected values, because C99 requires it.)

How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?