A note for C programmers

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: A note for C programmers

Post by mar »

lucasart wrote:According to the first paper, the C standard permits even crazier optimizations. Apparently (I was not aware of it and still cannot believe it myself), these two expressions are not equivalent:

Code: Select all

a[x ? 1 : 0]
a[x != 0]
The second one is fine, and no standard compliant compiler will break this code. But the first one is dangerous. The compiler is allowed to assume that x can only be 1 or 0, hence replace it by a[x]. This is a "nice" optimization because it produces branch-less code. The problem is that when x is not 0 or 1, all hell breaks loose. The program could cause segfault (if you're lucky!) or break silently...
The C standard says this:
1. the controlling expression of if statement shall be of scalar type
2. the statement after if executes if the expression compares unequal to 0
3. the statement after else (if present) executes if the expression compares equal to 0

Thus a compiler that would turn if (x) A[1] = y; else A[0] = y; into A[x] = y; would simply break the standard if x can ever be anything else than 0 or 1.
I think the author is simply wrong on this one.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: A note for C programmers

Post by Sven »

lucasart wrote:According to the first paper, the C standard permits even crazier optimizations. Apparently (I was not aware of it and still cannot believe it myself), these two expressions are not equivalent:

Code: Select all

a[x ? 1 : 0]
a[x != 0]
The second one is fine, and no standard compliant compiler will break this code. But the first one is dangerous. The compiler is allowed to assume that x can only be 1 or 0, hence replace it by a[x]. This is a "nice" optimization because it produces branch-less code. The problem is that when x is not 0 or 1, all hell breaks loose. The program could cause segfault (if you're lucky!) or break silently...
Where do you get this from? In a boolean context "x" means "x != 0". I don't think that the standard allows to replace a[x ? 1 : 0] by a[x] since in the "?" expression "x" is in boolean context but in a[x] it is integer. This would break semantics, and also there is nothing about "undefined behaviour" here, it is simply "boolean vs. integer context". The first paper is also not the C standard, and in my view it contains several unproven statements. Many of them may be true, and I do not say that this paper is bad in general (there is even a lot of interesting stuff in it), but this "a[x]" point seems a bit far-fetched to me.

Note also that the original quote from the first paper is slightly different, it says:

Code: Select all

if (cond) {
    A[1] = X;
} else {
    A[0] = X;
}
where "cond" is definitely meant as a boolean expression which always evaluates to 0 or 1.

Sven
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: A note for C programmers

Post by lucasart »

mar wrote:
lucasart wrote:According to the first paper, the C standard permits even crazier optimizations. Apparently (I was not aware of it and still cannot believe it myself), these two expressions are not equivalent:

Code: Select all

a[x ? 1 : 0]
a[x != 0]
The second one is fine, and no standard compliant compiler will break this code. But the first one is dangerous. The compiler is allowed to assume that x can only be 1 or 0, hence replace it by a[x]. This is a "nice" optimization because it produces branch-less code. The problem is that when x is not 0 or 1, all hell breaks loose. The program could cause segfault (if you're lucky!) or break silently...
The C standard says this:
1. the controlling expression of if statement shall be of scalar type
2. the statement after if executes if the expression compares unequal to 0
3. the statement after else (if present) executes if the expression compares equal to 0

Thus a compiler that would turn if (x) A[1] = y; else A[0] = y; into A[x] = y; would simply break the standard if x can ever be anything else than 0 or 1.
I think the author is simply wrong on this one.
Thanks for taking the time to look at the standard, and clarifying this point. It really did not make sense, but the author of the paper seemed to know what he was talking about, so I started to fear that he could be correct. This "optimization" would almost certainly have broken my code, and many other's.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

Sven Schüle wrote:
lucasart wrote:According to the first paper, the C standard permits even crazier optimizations. Apparently (I was not aware of it and still cannot believe it myself), these two expressions are not equivalent:

Code: Select all

a[x ? 1 : 0]
a[x != 0]
The second one is fine, and no standard compliant compiler will break this code. But the first one is dangerous. The compiler is allowed to assume that x can only be 1 or 0, hence replace it by a[x]. This is a "nice" optimization because it produces branch-less code. The problem is that when x is not 0 or 1, all hell breaks loose. The program could cause segfault (if you're lucky!) or break silently...
Where do you get this from? In a boolean context "x" means "x != 0". I don't think that the standard allows to replace a[x ? 1 : 0] by a[x] since in the "?" expression "x" is in boolean context but in a[x] it is integer. This would break semantics, and also there is nothing about "undefined behaviour" here, it is simply "boolean vs. integer context". The first paper is also not the C standard, and in my view it contains several unproven statements. Many of them may be true, and I do not say that this paper is bad in general (there is even a lot of interesting stuff in it), but this "a[x]" point seems a bit far-fetched to me.

Note also that the original quote from the first paper is slightly different, it says:

Code: Select all

if (cond) {
    A[1] = X;
} else {
    A[0] = X;
}
where "cond" is definitely meant as a boolean expression which always evaluates to 0 or 1.

Sven
He should compile this and look at the asm:

x = !x;

It has a conditional test. :)

where the simpler

x ^= 1;

just flips a 0 to a 1 or vice-versa

Here's gcc 4.7.3's take on x = !x

testl %eax, %eax
sete %al
movzbl %al, %esi

as opposed to a single xor instruction.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: A note for C programmers

Post by michiguel »

lucasart wrote:According to the first paper, the C standard permits even crazier optimizations. Apparently (I was not aware of it and still cannot believe it myself), these two expressions are not equivalent:

Code: Select all

a[x ? 1 : 0]
a[x != 0]
The second one is fine, and no standard compliant compiler will break this code. But the first one is dangerous. The compiler is allowed to assume that x can only be 1 or 0, hence replace it by a[x]. This is a "nice" optimization because it produces branch-less code. The problem is that when x is not 0 or 1, all hell breaks loose. The program could cause segfault (if you're lucky!) or break silently...
I am confused.

(x? 1: 0)

(x != 0)

are identical.

if (x) {}

if (x != 0) {}

are identical, but that does not mean you can assume x will be always 0 or 1. I thin I am missing what the author means.

Miguel
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: A note for C programmers

Post by mvk »

michiguel wrote:
lucasart wrote:According to the first paper, the C standard permits even crazier optimizations. Apparently (I was not aware of it and still cannot believe it myself), these two expressions are not equivalent:

Code: Select all

a[x ? 1 : 0]
a[x != 0]
The second one is fine, and no standard compliant compiler will break this code. But the first one is dangerous. The compiler is allowed to assume that x can only be 1 or 0, hence replace it by a[x]. This is a "nice" optimization because it produces branch-less code. The problem is that when x is not 0 or 1, all hell breaks loose. The program could cause segfault (if you're lucky!) or break silently...
I am confused.

(x? 1: 0)

(x != 0)

are identical.

if (x) {}

if (x != 0) {}

are identical, but that does not mean you can assume x will be always 0 or 1. I thin I am missing what the author means.

Miguel
The author is confused on that slide 15. He seems to think that using a value other than 0 or 1 in a conditional statement is undefined. He confuses conditionals with boolean operators, that always yield 0 or 1. Not every statement on the Internet is true.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: A note for C programmers

Post by syzygy »

lucasart wrote:Some of the other mistakes are quite obvious. For example: who would, deference a pointer, and check for nullity of the pointer after doing it? Idem for division by zero.
I wouldn't call that obvious at all.

Code: Select all

  if (arg2 == 0)
    ereport(ERROR, (errcode(ERRCODE_DIVISION_BY_ZERO),
                     errmsg("division by zero")));
  /* No overflow is possible */
  PG_RETURN_INT32((int32) arg1 / arg2);
It is clear that ereport() is intended to exit the program with a sensible error message. Depending on the kind of program it might not make sense to do anything else. The code looks quite natural and my first thought on reading it is certainly not that the check might get optimised away.

One way of fixing this is to mark the function ereport() with the attribute "noreturn". I suppose clang (which apparently does optimise the above check away) supports this attribute. But it is obviously not part of the C standard.

I guess the "standard C" way of doing this is to insert an "else".
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: A note for C programmers

Post by Sven »

michiguel wrote:
lucasart wrote:According to the first paper, the C standard permits even crazier optimizations. Apparently (I was not aware of it and still cannot believe it myself), these two expressions are not equivalent:

Code: Select all

a[x ? 1 : 0]
a[x != 0]
The second one is fine, and no standard compliant compiler will break this code. But the first one is dangerous. The compiler is allowed to assume that x can only be 1 or 0, hence replace it by a[x]. This is a "nice" optimization because it produces branch-less code. The problem is that when x is not 0 or 1, all hell breaks loose. The program could cause segfault (if you're lucky!) or break silently...
I am confused.

(x? 1: 0)

(x != 0)

are identical.

if (x) {}

if (x != 0) {}

are identical, but that does not mean you can assume x will be always 0 or 1. I thin I am missing what the author means.

Miguel
See my comments above. I think the point is that the author of the first paper probably (or hopefully) did not intend to see "cond" as an integer variable but as a boolean expression. If "cond" were an integer variable then "A[cond]" would break semantics, which is not the case if you replace "cond" by "x != 0".

If he actually meant "cond" as an integer variable then his point would be wrong indeed.

Sven
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: A note for C programmers

Post by mar »

Sven Schüle wrote:See my comments above.
They never read, they never listen, they're always right, they're never wrong. Guess who? :wink:
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: A note for C programmers

Post by lucasart »

bob wrote:
Sven Schüle wrote:
lucasart wrote:According to the first paper, the C standard permits even crazier optimizations. Apparently (I was not aware of it and still cannot believe it myself), these two expressions are not equivalent:

Code: Select all

a[x ? 1 : 0]
a[x != 0]
The second one is fine, and no standard compliant compiler will break this code. But the first one is dangerous. The compiler is allowed to assume that x can only be 1 or 0, hence replace it by a[x]. This is a "nice" optimization because it produces branch-less code. The problem is that when x is not 0 or 1, all hell breaks loose. The program could cause segfault (if you're lucky!) or break silently...
Where do you get this from? In a boolean context "x" means "x != 0". I don't think that the standard allows to replace a[x ? 1 : 0] by a[x] since in the "?" expression "x" is in boolean context but in a[x] it is integer. This would break semantics, and also there is nothing about "undefined behaviour" here, it is simply "boolean vs. integer context". The first paper is also not the C standard, and in my view it contains several unproven statements. Many of them may be true, and I do not say that this paper is bad in general (there is even a lot of interesting stuff in it), but this "a[x]" point seems a bit far-fetched to me.

Note also that the original quote from the first paper is slightly different, it says:

Code: Select all

if (cond) {
    A[1] = X;
} else {
    A[0] = X;
}
where "cond" is definitely meant as a boolean expression which always evaluates to 0 or 1.

Sven
He should compile this and look at the asm:

x = !x;

It has a conditional test. :)

where the simpler

x ^= 1;

just flips a 0 to a 1 or vice-versa

Here's gcc 4.7.3's take on x = !x

testl %eax, %eax
sete %al
movzbl %al, %esi

as opposed to a single xor instruction.
Your x = !x is completely unrelated to our discussion. Besides, I never claimed that x != 0 or (x ? 1 : 0) were branch-less (if that's the part of my post you misread that lead you to x = !x). I said that a[x] was branch-less.

The problem is that the author of the paper on dangerous optimizations made a mistake in his claim that a compiler is allowed to replace

Code: Select all

if (x)
  a[1] = 0;
else
  a[0] = 0;
by

Code: Select all

a[x] = 0
This was never MY CLAIM. I just said that's what the paper said, and I did not believe it, although I had not read the standard to check. Martin has checked the standard, and that claim is wrong. That's all.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.