lucasart wrote:According to the first paper, the C standard permits even crazier optimizations. Apparently (I was not aware of it and still cannot believe it myself), these two expressions are not equivalent:
The second one is fine, and no standard compliant compiler will break this code. But the first one is dangerous. The compiler is allowed to assume that x can only be 1 or 0, hence replace it by a[x]. This is a "nice" optimization because it produces branch-less code. The problem is that when x is not 0 or 1, all hell breaks loose. The program could cause segfault (if you're lucky!) or break silently...
The C standard says this:
1. the controlling expression of if statement shall be of scalar type
2. the statement after if executes if the expression compares unequal to 0
3. the statement after else (if present) executes if the expression compares equal to 0
Thus a compiler that would turn if (x) A[1] = y; else A[0] = y; into A[x] = y; would simply break the standard if x can ever be anything else than 0 or 1.
I think the author is simply wrong on this one.
lucasart wrote:According to the first paper, the C standard permits even crazier optimizations. Apparently (I was not aware of it and still cannot believe it myself), these two expressions are not equivalent:
The second one is fine, and no standard compliant compiler will break this code. But the first one is dangerous. The compiler is allowed to assume that x can only be 1 or 0, hence replace it by a[x]. This is a "nice" optimization because it produces branch-less code. The problem is that when x is not 0 or 1, all hell breaks loose. The program could cause segfault (if you're lucky!) or break silently...
Where do you get this from? In a boolean context "x" means "x != 0". I don't think that the standard allows to replace a[x ? 1 : 0] by a[x] since in the "?" expression "x" is in boolean context but in a[x] it is integer. This would break semantics, and also there is nothing about "undefined behaviour" here, it is simply "boolean vs. integer context". The first paper is also not the C standard, and in my view it contains several unproven statements. Many of them may be true, and I do not say that this paper is bad in general (there is even a lot of interesting stuff in it), but this "a[x]" point seems a bit far-fetched to me.
Note also that the original quote from the first paper is slightly different, it says:
lucasart wrote:According to the first paper, the C standard permits even crazier optimizations. Apparently (I was not aware of it and still cannot believe it myself), these two expressions are not equivalent:
The second one is fine, and no standard compliant compiler will break this code. But the first one is dangerous. The compiler is allowed to assume that x can only be 1 or 0, hence replace it by a[x]. This is a "nice" optimization because it produces branch-less code. The problem is that when x is not 0 or 1, all hell breaks loose. The program could cause segfault (if you're lucky!) or break silently...
The C standard says this:
1. the controlling expression of if statement shall be of scalar type
2. the statement after if executes if the expression compares unequal to 0
3. the statement after else (if present) executes if the expression compares equal to 0
Thus a compiler that would turn if (x) A[1] = y; else A[0] = y; into A[x] = y; would simply break the standard if x can ever be anything else than 0 or 1.
I think the author is simply wrong on this one.
Thanks for taking the time to look at the standard, and clarifying this point. It really did not make sense, but the author of the paper seemed to know what he was talking about, so I started to fear that he could be correct. This "optimization" would almost certainly have broken my code, and many other's.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
lucasart wrote:According to the first paper, the C standard permits even crazier optimizations. Apparently (I was not aware of it and still cannot believe it myself), these two expressions are not equivalent:
The second one is fine, and no standard compliant compiler will break this code. But the first one is dangerous. The compiler is allowed to assume that x can only be 1 or 0, hence replace it by a[x]. This is a "nice" optimization because it produces branch-less code. The problem is that when x is not 0 or 1, all hell breaks loose. The program could cause segfault (if you're lucky!) or break silently...
Where do you get this from? In a boolean context "x" means "x != 0". I don't think that the standard allows to replace a[x ? 1 : 0] by a[x] since in the "?" expression "x" is in boolean context but in a[x] it is integer. This would break semantics, and also there is nothing about "undefined behaviour" here, it is simply "boolean vs. integer context". The first paper is also not the C standard, and in my view it contains several unproven statements. Many of them may be true, and I do not say that this paper is bad in general (there is even a lot of interesting stuff in it), but this "a[x]" point seems a bit far-fetched to me.
Note also that the original quote from the first paper is slightly different, it says:
lucasart wrote:According to the first paper, the C standard permits even crazier optimizations. Apparently (I was not aware of it and still cannot believe it myself), these two expressions are not equivalent:
The second one is fine, and no standard compliant compiler will break this code. But the first one is dangerous. The compiler is allowed to assume that x can only be 1 or 0, hence replace it by a[x]. This is a "nice" optimization because it produces branch-less code. The problem is that when x is not 0 or 1, all hell breaks loose. The program could cause segfault (if you're lucky!) or break silently...
I am confused.
(x? 1: 0)
(x != 0)
are identical.
if (x) {}
if (x != 0) {}
are identical, but that does not mean you can assume x will be always 0 or 1. I thin I am missing what the author means.
lucasart wrote:According to the first paper, the C standard permits even crazier optimizations. Apparently (I was not aware of it and still cannot believe it myself), these two expressions are not equivalent:
The second one is fine, and no standard compliant compiler will break this code. But the first one is dangerous. The compiler is allowed to assume that x can only be 1 or 0, hence replace it by a[x]. This is a "nice" optimization because it produces branch-less code. The problem is that when x is not 0 or 1, all hell breaks loose. The program could cause segfault (if you're lucky!) or break silently...
I am confused.
(x? 1: 0)
(x != 0)
are identical.
if (x) {}
if (x != 0) {}
are identical, but that does not mean you can assume x will be always 0 or 1. I thin I am missing what the author means.
Miguel
The author is confused on that slide 15. He seems to think that using a value other than 0 or 1 in a conditional statement is undefined. He confuses conditionals with boolean operators, that always yield 0 or 1. Not every statement on the Internet is true.
lucasart wrote:Some of the other mistakes are quite obvious. For example: who would, deference a pointer, and check for nullity of the pointer after doing it? Idem for division by zero.
if (arg2 == 0)
ereport(ERROR, (errcode(ERRCODE_DIVISION_BY_ZERO),
errmsg("division by zero")));
/* No overflow is possible */
PG_RETURN_INT32((int32) arg1 / arg2);
It is clear that ereport() is intended to exit the program with a sensible error message. Depending on the kind of program it might not make sense to do anything else. The code looks quite natural and my first thought on reading it is certainly not that the check might get optimised away.
One way of fixing this is to mark the function ereport() with the attribute "noreturn". I suppose clang (which apparently does optimise the above check away) supports this attribute. But it is obviously not part of the C standard.
I guess the "standard C" way of doing this is to insert an "else".
lucasart wrote:According to the first paper, the C standard permits even crazier optimizations. Apparently (I was not aware of it and still cannot believe it myself), these two expressions are not equivalent:
The second one is fine, and no standard compliant compiler will break this code. But the first one is dangerous. The compiler is allowed to assume that x can only be 1 or 0, hence replace it by a[x]. This is a "nice" optimization because it produces branch-less code. The problem is that when x is not 0 or 1, all hell breaks loose. The program could cause segfault (if you're lucky!) or break silently...
I am confused.
(x? 1: 0)
(x != 0)
are identical.
if (x) {}
if (x != 0) {}
are identical, but that does not mean you can assume x will be always 0 or 1. I thin I am missing what the author means.
Miguel
See my comments above. I think the point is that the author of the first paper probably (or hopefully) did not intend to see "cond" as an integer variable but as a boolean expression. If "cond" were an integer variable then "A[cond]" would break semantics, which is not the case if you replace "cond" by "x != 0".
If he actually meant "cond" as an integer variable then his point would be wrong indeed.
lucasart wrote:According to the first paper, the C standard permits even crazier optimizations. Apparently (I was not aware of it and still cannot believe it myself), these two expressions are not equivalent:
The second one is fine, and no standard compliant compiler will break this code. But the first one is dangerous. The compiler is allowed to assume that x can only be 1 or 0, hence replace it by a[x]. This is a "nice" optimization because it produces branch-less code. The problem is that when x is not 0 or 1, all hell breaks loose. The program could cause segfault (if you're lucky!) or break silently...
Where do you get this from? In a boolean context "x" means "x != 0". I don't think that the standard allows to replace a[x ? 1 : 0] by a[x] since in the "?" expression "x" is in boolean context but in a[x] it is integer. This would break semantics, and also there is nothing about "undefined behaviour" here, it is simply "boolean vs. integer context". The first paper is also not the C standard, and in my view it contains several unproven statements. Many of them may be true, and I do not say that this paper is bad in general (there is even a lot of interesting stuff in it), but this "a[x]" point seems a bit far-fetched to me.
Note also that the original quote from the first paper is slightly different, it says:
where "cond" is definitely meant as a boolean expression which always evaluates to 0 or 1.
Sven
He should compile this and look at the asm:
x = !x;
It has a conditional test.
where the simpler
x ^= 1;
just flips a 0 to a 1 or vice-versa
Here's gcc 4.7.3's take on x = !x
testl %eax, %eax
sete %al
movzbl %al, %esi
as opposed to a single xor instruction.
Your x = !x is completely unrelated to our discussion. Besides, I never claimed that x != 0 or (x ? 1 : 0) were branch-less (if that's the part of my post you misread that lead you to x = !x). I said that a[x] was branch-less.
The problem is that the author of the paper on dangerous optimizations made a mistake in his claim that a compiler is allowed to replace
This was never MY CLAIM. I just said that's what the paper said, and I did not believe it, although I had not read the standard to check. Martin has checked the standard, and that claim is wrong. That's all.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.