A note for C programmers

mar · Post by **mar** » Sat Nov 30, 2013 1:07 pm

lucasart wrote:According to the first paper, the C standard permits even crazier optimizations. Apparently (I was not aware of it and still cannot believe it myself), these two expressions are not equivalent:
Code: Select all
a&#91;x ? 1 &#58; 0&#93;
a&#91;x != 0&#93;
The second one is fine, and no standard compliant compiler will break this code. But the first one is dangerous. The compiler is allowed to assume that x can only be 1 or 0, hence replace it by a[x]. This is a "nice" optimization because it produces branch-less code. The problem is that when x is not 0 or 1, all hell breaks loose. The program could cause segfault (if you're lucky!) or break silently...

The C standard says this:
1. the controlling expression of if statement shall be of scalar type
2. the statement after if executes if the expression compares unequal to 0
3. the statement after else (if present) executes if the expression compares equal to 0

Thus a compiler that would turn if (x) A[1] = y; else A[0] = y; into A[x] = y; would simply break the standard if x can ever be anything else than 0 or 1.
I think the author is simply wrong on this one.

Sven · Post by **Sven** » Sat Nov 30, 2013 1:13 pm

lucasart wrote:According to the first paper, the C standard permits even crazier optimizations. Apparently (I was not aware of it and still cannot believe it myself), these two expressions are not equivalent:
Code: Select all
a&#91;x ? 1 &#58; 0&#93;
a&#91;x != 0&#93;
The second one is fine, and no standard compliant compiler will break this code. But the first one is dangerous. The compiler is allowed to assume that x can only be 1 or 0, hence replace it by a[x]. This is a "nice" optimization because it produces branch-less code. The problem is that when x is not 0 or 1, all hell breaks loose. The program could cause segfault (if you're lucky!) or break silently...

Where do you get this from? In a boolean context "x" means "x != 0". I don't think that the standard allows to replace a[x ? 1 : 0] by a[x] since in the "?" expression "x" is in boolean context but in a[x] it is integer. This would break semantics, and also there is nothing about "undefined behaviour" here, it is simply "boolean vs. integer context". The first paper is also not the C standard, and in my view it contains several unproven statements. Many of them may be true, and I do not say that this paper is bad in general (there is even a lot of interesting stuff in it), but this "a[x]" point seems a bit far-fetched to me.

Note also that the original quote from the first paper is slightly different, it says:

Code: Select all

if &#40;cond&#41; &#123;
    A&#91;1&#93; = X;
&#125; else &#123;
    A&#91;0&#93; = X;
&#125;

where "cond" is definitely meant as a boolean expression which always evaluates to 0 or 1.

Sven

lucasart · Post by **lucasart** » Sat Nov 30, 2013 1:14 pm

mar wrote:
lucasart wrote:According to the first paper, the C standard permits even crazier optimizations. Apparently (I was not aware of it and still cannot believe it myself), these two expressions are not equivalent:
Code: Select all
a&#91;x ? 1 &#58; 0&#93;
a&#91;x != 0&#93;
The second one is fine, and no standard compliant compiler will break this code. But the first one is dangerous. The compiler is allowed to assume that x can only be 1 or 0, hence replace it by a[x]. This is a "nice" optimization because it produces branch-less code. The problem is that when x is not 0 or 1, all hell breaks loose. The program could cause segfault (if you're lucky!) or break silently...
The C standard says this:
1. the controlling expression of if statement shall be of scalar type
2. the statement after if executes if the expression compares unequal to 0
3. the statement after else (if present) executes if the expression compares equal to 0

Thus a compiler that would turn if (x) A[1] = y; else A[0] = y; into A[x] = y; would simply break the standard if x can ever be anything else than 0 or 1.
I think the author is simply wrong on this one.

Thanks for taking the time to look at the standard, and clarifying this point. It really did not make sense, but the author of the paper seemed to know what he was talking about, so I started to fear that he could be correct. This "optimization" would almost certainly have broken my code, and many other's.

bob · Post by **bob** » Sat Nov 30, 2013 6:37 pm

Sven Schüle wrote:
lucasart wrote:According to the first paper, the C standard permits even crazier optimizations. Apparently (I was not aware of it and still cannot believe it myself), these two expressions are not equivalent:
Code: Select all
a&#91;x ? 1 &#58; 0&#93;
a&#91;x != 0&#93;
The second one is fine, and no standard compliant compiler will break this code. But the first one is dangerous. The compiler is allowed to assume that x can only be 1 or 0, hence replace it by a[x]. This is a "nice" optimization because it produces branch-less code. The problem is that when x is not 0 or 1, all hell breaks loose. The program could cause segfault (if you're lucky!) or break silently...
Where do you get this from? In a boolean context "x" means "x != 0". I don't think that the standard allows to replace a[x ? 1 : 0] by a[x] since in the "?" expression "x" is in boolean context but in a[x] it is integer. This would break semantics, and also there is nothing about "undefined behaviour" here, it is simply "boolean vs. integer context". The first paper is also not the C standard, and in my view it contains several unproven statements. Many of them may be true, and I do not say that this paper is bad in general (there is even a lot of interesting stuff in it), but this "a[x]" point seems a bit far-fetched to me.

Note also that the original quote from the first paper is slightly different, it says:
Code: Select all
if &#40;cond&#41; &#123;
    A&#91;1&#93; = X;
&#125; else &#123;
    A&#91;0&#93; = X;
&#125;
where "cond" is definitely meant as a boolean expression which always evaluates to 0 or 1.

Sven

He should compile this and look at the asm:

x = !x;

It has a conditional test.

where the simpler

x ^= 1;

just flips a 0 to a 1 or vice-versa

Here's gcc 4.7.3's take on x = !x

testl %eax, %eax
sete %al
movzbl %al, %esi

as opposed to a single xor instruction.

michiguel · Post by **michiguel** » Sat Nov 30, 2013 7:30 pm

lucasart wrote:According to the first paper, the C standard permits even crazier optimizations. Apparently (I was not aware of it and still cannot believe it myself), these two expressions are not equivalent:
Code: Select all
a&#91;x ? 1 &#58; 0&#93;
a&#91;x != 0&#93;
The second one is fine, and no standard compliant compiler will break this code. But the first one is dangerous. The compiler is allowed to assume that x can only be 1 or 0, hence replace it by a[x]. This is a "nice" optimization because it produces branch-less code. The problem is that when x is not 0 or 1, all hell breaks loose. The program could cause segfault (if you're lucky!) or break silently...

I am confused.

(x? 1: 0)

(x != 0)

are identical.

if (x) {}

if (x != 0) {}

are identical, but that does not mean you can assume x will be always 0 or 1. I thin I am missing what the author means.

Miguel

mvk · Post by **mvk** » Sat Nov 30, 2013 8:26 pm

michiguel wrote:
lucasart wrote:According to the first paper, the C standard permits even crazier optimizations. Apparently (I was not aware of it and still cannot believe it myself), these two expressions are not equivalent:
Code: Select all
a&#91;x ? 1 &#58; 0&#93;
a&#91;x != 0&#93;
The second one is fine, and no standard compliant compiler will break this code. But the first one is dangerous. The compiler is allowed to assume that x can only be 1 or 0, hence replace it by a[x]. This is a "nice" optimization because it produces branch-less code. The problem is that when x is not 0 or 1, all hell breaks loose. The program could cause segfault (if you're lucky!) or break silently...
I am confused.

(x? 1: 0)

(x != 0)

are identical.

if (x) {}

if (x != 0) {}

are identical, but that does not mean you can assume x will be always 0 or 1. I thin I am missing what the author means.

Miguel

The author is confused on that slide 15. He seems to think that using a value other than 0 or 1 in a conditional statement is undefined. He confuses conditionals with boolean operators, that always yield 0 or 1. Not every statement on the Internet is true.

syzygy · Post by **syzygy** » Sat Nov 30, 2013 8:33 pm

lucasart wrote:Some of the other mistakes are quite obvious. For example: who would, deference a pointer, and check for nullity of the pointer after doing it? Idem for division by zero.

I wouldn't call that obvious at all.

Code: Select all

  if &#40;arg2 == 0&#41;
    ereport&#40;ERROR, &#40;errcode&#40;ERRCODE_DIVISION_BY_ZERO&#41;,
                     errmsg&#40;"division by zero")));
  /* No overflow is possible */
  PG_RETURN_INT32&#40;&#40;int32&#41; arg1 / arg2&#41;;

It is clear that ereport() is intended to exit the program with a sensible error message. Depending on the kind of program it might not make sense to do anything else. The code looks quite natural and my first thought on reading it is certainly not that the check might get optimised away.

One way of fixing this is to mark the function ereport() with the attribute "noreturn". I suppose clang (which apparently does optimise the above check away) supports this attribute. But it is obviously not part of the C standard.

I guess the "standard C" way of doing this is to insert an "else".

Sven · Post by **Sven** » Sat Nov 30, 2013 8:36 pm

michiguel wrote:
lucasart wrote:According to the first paper, the C standard permits even crazier optimizations. Apparently (I was not aware of it and still cannot believe it myself), these two expressions are not equivalent:
Code: Select all
a&#91;x ? 1 &#58; 0&#93;
a&#91;x != 0&#93;
The second one is fine, and no standard compliant compiler will break this code. But the first one is dangerous. The compiler is allowed to assume that x can only be 1 or 0, hence replace it by a[x]. This is a "nice" optimization because it produces branch-less code. The problem is that when x is not 0 or 1, all hell breaks loose. The program could cause segfault (if you're lucky!) or break silently...
I am confused.

(x? 1: 0)

(x != 0)

are identical.

if (x) {}

if (x != 0) {}

are identical, but that does not mean you can assume x will be always 0 or 1. I thin I am missing what the author means.

Miguel

See my comments above. I think the point is that the author of the first paper probably (or hopefully) did not intend to see "cond" as an integer variable but as a boolean expression. If "cond" were an integer variable then "A[cond]" would break semantics, which is not the case if you replace "cond" by "x != 0".

If he actually meant "cond" as an integer variable then his point would be wrong indeed.

Sven

mar · Post by **mar** » Sun Dec 01, 2013 12:00 am

Sven Schüle wrote:See my comments above.

They never read, they never listen, they're always right, they're never wrong. Guess who?

lucasart · Post by **lucasart** » Sun Dec 01, 2013 1:34 am

bob wrote:
Sven Schüle wrote:
lucasart wrote:According to the first paper, the C standard permits even crazier optimizations. Apparently (I was not aware of it and still cannot believe it myself), these two expressions are not equivalent:
Code: Select all
a&#91;x ? 1 &#58; 0&#93;
a&#91;x != 0&#93;
The second one is fine, and no standard compliant compiler will break this code. But the first one is dangerous. The compiler is allowed to assume that x can only be 1 or 0, hence replace it by a[x]. This is a "nice" optimization because it produces branch-less code. The problem is that when x is not 0 or 1, all hell breaks loose. The program could cause segfault (if you're lucky!) or break silently...
Where do you get this from? In a boolean context "x" means "x != 0". I don't think that the standard allows to replace a[x ? 1 : 0] by a[x] since in the "?" expression "x" is in boolean context but in a[x] it is integer. This would break semantics, and also there is nothing about "undefined behaviour" here, it is simply "boolean vs. integer context". The first paper is also not the C standard, and in my view it contains several unproven statements. Many of them may be true, and I do not say that this paper is bad in general (there is even a lot of interesting stuff in it), but this "a[x]" point seems a bit far-fetched to me.

Note also that the original quote from the first paper is slightly different, it says:
Code: Select all
if &#40;cond&#41; &#123;
    A&#91;1&#93; = X;
&#125; else &#123;
    A&#91;0&#93; = X;
&#125;
where "cond" is definitely meant as a boolean expression which always evaluates to 0 or 1.

Sven
He should compile this and look at the asm:

x = !x;

It has a conditional test.

where the simpler

x ^= 1;

just flips a 0 to a 1 or vice-versa

Here's gcc 4.7.3's take on x = !x

testl %eax, %eax
sete %al
movzbl %al, %esi

as opposed to a single xor instruction.

Your x = !x is completely unrelated to our discussion. Besides, I never claimed that x != 0 or (x ? 1 : 0) were branch-less (if that's the part of my post you misread that lead you to x = !x). I said that a[x] was branch-less.

The problem is that the author of the paper on dangerous optimizations made a mistake in his claim that a compiler is allowed to replace

Code: Select all

if &#40;x&#41;
  a&#91;1&#93; = 0;
else
  a&#91;0&#93; = 0;

by

Code: Select all

a&#91;x&#93; = 0

This was never MY CLAIM. I just said that's what the paper said, and I did not believe it, although I had not read the standard to check. Martin has checked the standard, and that claim is wrong. That's all.

A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers

Re: A note for C programmers