A note for C programmers

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: A note for C programmers

Post by AlvaroBegue »

bob wrote:Does the C standard address SMP race conditions?
I am sure it does, since C11 "includes a memory model to better support multiple threads of execution": http://en.wikipedia.org/wiki/C11_%28C_s ... evision%29
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: A note for C programmers

Post by wgarvin »

bob wrote:An example:

char buf[4096];

Buf is initialized to "abc def ghi jkl" which is a 16 character string with the terminating null on the end.

strcpy(a, a+4);

simply converts that string to "def ghi jkl" with the terminating null still in place.

Before the copy, strlen(a) == 15;

n=4 (n < 15)

After the copy, strlen(a)=11;

There is nothing that can break that so long as it is copied left-to-right. Absolutely nothing. n can never be negative, and in fact can not even be zero in my code.

Again, as I have stated MANY times, the above will never misbehave unless someone decides to copy right to left. But with a null-terminated string that is not likely to happen as it is slower.
I'm going to pick on this example a bit. You seem to think there are only two ways to copy strings, "right to left" and "left to right", but that's not true at all. Most implementations of memcpy, memmove and yes, even strcpy copy multiple bytes at a time, sometimes 16 or more. Also, one common implementation of strcpy is

Code: Select all

memcpy&#40;dest,src,strlen&#40;src&#41;+1&#41;;
Agner Fog implements it that way, for example. Both his strlen and his memcpy are generally faster than the ones found in most C library implementations, so it wouldn't surprise me if the resulting strcpy was faster too.

But all of this is pointless nitpicking. The language spec for strcpy specifically disallowed overlapping copies so that implementors would have more freedom to optimize their implementations. The way someone implemented strcpy in some C library code you looked at a few decades ago doesn't really have anything to do with what modern implementations of strcpy do, or what the language says you can assume about it.

In summary: You're still trying to justify a (IMO) completely unjustifiable practice of relying on undocumented behavior of strcpy -- behavior that plenty of strcpy implementations don't even have.
abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: A note for C programmers

Post by abulmo »

bob wrote:The man page is written in English. Not C. As is the C standard..
The standard is written in LEGALESE English. You know, this language that can define white as black and reciprocally. Being a native English speaker does not help much to understand it.
Richard
Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: A note for C programmers

Post by Rein Halbersma »

bob wrote:
I use NO "atomic instructions" in the xor trick. I depend on NO specific hardware behavior. It can do in-order writes as the PC does, it can do out-of-order writes as the Dec Alpha does. NOTHING will break this unless the compiler guys suddenly decide that XOR means something beside "exclusive OR." I don't care whether you do the XOR with an XOR instruction, or you use adds and subtracts. Irrelevant. Same result. Works every time.

I am DETECTING the problem the undefined behavior MUST cause, on those rare occasions where it happens. I am not DEPENDING on any specific behavior whatsoever.

You should look at the lockless hashing paper. It explains it. No atomic locks, no tricks, works perfectly.
I read it, I understood it, and I liked it. Unfortunately, since C11, it also constitutes undefined behavior.

http://www.open-std.org/jtc1/sc22/wg14/ ... /n1548.pdf
5.1.2.4 Multi-threaded executions and data races
25 The execution of a program contains a data race if it contains two con&#64258;icting actions in
different threads, at least one of which is not atomic, and neither happens before the
other. Any such data race results in unde&#64257;ned behavior.
Whether you believe it or not, your XOR trick matches this clause in the Standard. It is undefined behavior. The "neither happens before the other" clause is the key here.
simon
Posts: 50
Joined: Sun Mar 20, 2011 6:42 pm

Re: A note for C programmers

Post by simon »

wgarvin wrote: Okay, we get that you're still annoyed about it, although I'm surprised that two weeks have passed and you still haven't admitted yet that you're completely wrong! :lol:
You're probably the only person who is surprised. :lol:
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

AlvaroBegue wrote:
bob wrote:You are simply not reading what I have written. Let me try once more. There is ABSOLUTELY no way my use of strcpy() will break.
I think you are the one not reading. I posted this link: http://stackoverflow.com/questions/1293 ... mplemented

It's not a long thread. In it, someone actually gets his code to behave inappropriately when changing compilers, even though all the same guarantees you list are satisfied, and the reason for the code breaking seems to be an optimization the compiler did, assuming the buffers don't overlap.
Who is interested in gcc here? GCC works. Every version I have used since Crafty was started in 1995. This is about Apple breaking THEIR library.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

wgarvin wrote:
bob wrote:An example:

char buf[4096];

Buf is initialized to "abc def ghi jkl" which is a 16 character string with the terminating null on the end.

strcpy(a, a+4);

simply converts that string to "def ghi jkl" with the terminating null still in place.

Before the copy, strlen(a) == 15;

n=4 (n < 15)

After the copy, strlen(a)=11;

There is nothing that can break that so long as it is copied left-to-right. Absolutely nothing. n can never be negative, and in fact can not even be zero in my code.

Again, as I have stated MANY times, the above will never misbehave unless someone decides to copy right to left. But with a null-terminated string that is not likely to happen as it is slower.
I'm going to pick on this example a bit. You seem to think there are only two ways to copy strings, "right to left" and "left to right", but that's not true at all. Most implementations of memcpy, memmove and yes, even strcpy copy multiple bytes at a time, sometimes 16 or more. Also, one common implementation of strcpy is

Code: Select all

memcpy&#40;dest,src,strlen&#40;src&#41;+1&#41;;
Agner Fog implements it that way, for example. Both his strlen and his memcpy are generally faster than the ones found in most C library implementations, so it wouldn't surprise me if the resulting strcpy was faster too.

But all of this is pointless nitpicking. The language spec for strcpy specifically disallowed overlapping copies so that implementors would have more freedom to optimize their implementations. The way someone implemented strcpy in some C library code you looked at a few decades ago doesn't really have anything to do with what modern implementations of strcpy do, or what the language says you can assume about it.

In summary: You're still trying to justify a (IMO) completely unjustifiable practice of relying on undocumented behavior of strcpy -- behavior that plenty of strcpy implementations don't even have.
No, what I am doing is complaining about a vendor intentionally breaking something that could have either been fixed to work correctly, or left alone. As a result, thousands of pieces of working software no longer work. Does that seem reasonable? Let's just bash all of those that have used this behavior, because they are not supposed to be doing so. I don't consider that reasonable. Fix it? Sure. Trivial to make strcpy use memmove(). Leave it alone? fine. Doesn't break tons of currently working software. But intentionally break it? That's the best choice?
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A note for C programmers

Post by bob »

Rein Halbersma wrote:
bob wrote:
I use NO "atomic instructions" in the xor trick. I depend on NO specific hardware behavior. It can do in-order writes as the PC does, it can do out-of-order writes as the Dec Alpha does. NOTHING will break this unless the compiler guys suddenly decide that XOR means something beside "exclusive OR." I don't care whether you do the XOR with an XOR instruction, or you use adds and subtracts. Irrelevant. Same result. Works every time.

I am DETECTING the problem the undefined behavior MUST cause, on those rare occasions where it happens. I am not DEPENDING on any specific behavior whatsoever.

You should look at the lockless hashing paper. It explains it. No atomic locks, no tricks, works perfectly.
I read it, I understood it, and I liked it. Unfortunately, since C11, it also constitutes undefined behavior.

http://www.open-std.org/jtc1/sc22/wg14/ ... /n1548.pdf
5.1.2.4 Multi-threaded executions and data races
25 The execution of a program contains a data race if it contains two con&#64258;icting actions in
different threads, at least one of which is not atomic, and neither happens before the
other. Any such data race results in unde&#64257;ned behavior.
Whether you believe it or not, your XOR trick matches this clause in the Standard. It is undefined behavior. The "neither happens before the other" clause is the key here.
Sure it does. So what should they do, make the program abort? In the simple case of the above, where thread one writes to X at the same time thread two writes, there are exactly two possible outcomes, one of them writes last and that is what is stored. Which writes last is not defined. NO other behavior is reasonable, however, because these are two non-synchronized memory writes. 8 byte writes on the PC ARE atomic, so one thread can't write one bit, then another can write the second bit. But IF that happens, my lockless hashing works perfectly, still. There is no possible behavior you can cite here that will break it. Hence whatever happens is fine, it will work. Just so SOMETHING happens. Undefined does NOT imply "maybe we won't write anything at all" or "maybe we will write, but to a completely different memory address."

The undefined behavior in this case is which is written first. I don't care. Either can be written first without breaking my xor hash trick. The only thing that should NOT happen is for the vendor to do something stupid just because he recognizes there's a race. If you know what you are doing, the race doesn't hurt a thing.
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: A note for C programmers

Post by mvk »

bob wrote:There is ABSOLUTELY no way my use of strcpy() will break.
Puhlease..... Crafty is full of buffer overflows. They are all over the place.

Image

Ok, lets see:

Code: Select all

crafty `python -c 'print 5000*"x"'`
unable to open book file &#91;/opt/local/share/crafty/book.bin&#93; for "write".
learning is disabled
found computer opening book file &#91;/opt/local/share/crafty/bookc.bin&#93;.
Abort trap&#58; 6
Or, on Linux:

Code: Select all

./crafty-235-64-ja `python 5000*"x"'` 
EPD Kit revision date&#58; 1996.04.21
unable to open book file &#91;./books.bin&#93;.
&#40;info&#41; command line option "xxxxxxxxxxxx
&#91;... snip ...&#93;
xxxxxx"
Segmentation fault
Mind that on Debian this crap was patched downstream, in 2003...! But not upstream of course, because mr-know-it-all doesn't make bugs. It is ALWAYS somebody else's fault. Fact is, Crafty's use of strcpy(), and buffers in general, is not something to be proud of, and Apple in this case does a nice service to many of their customers to try out weed out programs written in this style from the eco system.

P.S.: This thread, and the counterpart on open-chess, reminds me of one of the writing by one of my professors:
EWD wrote:"We shall do a much better programming job, provided that we approach the task with a full appreciation of its tremendous difficulty, provided that we stick to modest and elegant programming languages, provided that we respect the intrinsic limitations of the human mind and approach the task as Very Humble Programmers."
By that standard, you have a long way to go.
Last edited by mvk on Thu Dec 05, 2013 11:55 pm, edited 3 times in total.
Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: A note for C programmers

Post by Rein Halbersma »

bob wrote: Sure it does. So what should they do, make the program abort? In the simple case of the above, where thread one writes to X at the same time thread two writes, there are exactly two possible outcomes, one of them writes last and that is what is stored. Which writes last is not defined. NO other behavior is reasonable, however, because these are two non-synchronized memory writes. 8 byte writes on the PC ARE atomic, so one thread can't write one bit, then another can write the second bit. But IF that happens, my lockless hashing works perfectly, still. There is no possible behavior you can cite here that will break it. Hence whatever happens is fine, it will work. Just so SOMETHING happens. Undefined does NOT imply "maybe we won't write anything at all" or "maybe we will write, but to a completely different memory address."

The undefined behavior in this case is which is written first. I don't care. Either can be written first without breaking my xor hash trick. The only thing that should NOT happen is for the vendor to do something stupid just because he recognizes there's a race. If you know what you are doing, the race doesn't hurt a thing.
The proper way to do this is to learn about the C11 atomics and relaxed memory orders as pointed out by Peter Österlund. There are a lot of exotic architectures out there with deep memory / cache hierarchies where the compiler might not map to the instructions you think it will, but instead entirely eliminate pieces of code it can prove to be undefined behavior.