GCC, strict aliasing, C99

Big fat note 28/02/10: A deeper analysis of the standard forced me to change my opinion on this particular topic after writing this blog entry. In other words, I was wrong (about several things) in this blog post. If you want info on C99 aliasing, Don’t read this post; read the other one instead.

I recently wrote about how Mysql was breaking the C99 strict aliasing rules. In a related topic, there’s an interesting discussion thread which recently started on the GCC developers mailing list. It started with a post by Joshua Haberman where he questions the interpretation of C99’s so-called strict aliasing rules in GCC. It’s an interesting point that he raised.

Essentially, the strict aliasing rules say that you must access memory (“object”) only by references (“lvalue expressions”) to compatible types, or by references to certain other types (such as character types). Joshua points out that one of the other types allowed is an aggregate or union type containing a member of the original type. On first examination this seems quite natural; if you alter a whole structure (or union) object then any pointer to one of its members should not have its value cached in a register (for instance). However, it has the implication that accessing a variable by first casting it to a structure or union is allowed, even though it is probably a rare case.

The ongoing discussion, though, is hilarious:

Richard Guenther, in the first reply, complains that Joshua is “literally interpreting” the standard. This is ridiculous – surely the standard is meant to be literally interpreted; that’s why it is written in such precise (and mind-numbingly verbose) language (but – keep reading).

Andrew Haley, in the next reply, insists (contrary to Richard) that GCC complies with the standard as worded and doesn’t even attempt to explain how Joshua (and Richard) might have gotten it wrong. He then repeatedly attempts to claim that Joshua’s understanding of the standard is incorrect, using irrelevant sections of the standard to try and prove this. After Erik Trulsson steps in to support Andrew, Andrew then has the gall to call Joshua stubborn! (Incidentally, Erik’s argument is irrelevant – because the standard does define what happens when you dereference a valid pointer, and the rules for pointer conversion in 6.2.3.2 at least suggest that pointer would be valid in the example, as long as alignment requirements were met).

[Update: the defect report mentioned in the next paragraph isn’t really so relevant; I managed to conflate two separate issues when writing this entry; see clarification in the comments below. My main point still stands].

The discussion continues on and a relevant defect report (for the standard) is mentioned. The standard committee seem to have come to the wrong conclusion on this defect; that is, they say that example 1 in the defect report invokes undefined behavior when reading the standard makes it very clear that this is not the case (they do acknowledge the need for a wording change). This would seem to mean that the committee actually thinks the standard is wrong. That’s fine except that… it’s a standard. I mean, you’ve put it out there; that’s then set. You can revise it later, but to turn around and say “oh no, we didn’t mean that, no, ignore the wording, you’ve got to follow the spirit…” is just ridiculous. The whole point of having a standard is to be perfectly clear, and lay these things out without any ambiguity.

To put the issue in layman terms, the standard – as it is worded – allows re-using allocated memory to contain different types. That is, you can allocate some memory using malloc(), store something in it, use it, and then when you’re done with it you can store something else (even with an incompatible type) and re-use the memory. The argument on the GCC discussion list is that this should not be allowed (and in fact, GCC currently optimizes as if it is not allowed) because it interferes with optimization. [not correct; GCC allows this, though the C99 committee seem to think it needn’t].

It’s true that the standard as written disallows certain useful optimizations. Unless the compiler can prove for certain that two pointers don’t alias, it has to assume that a read from one pointer cannot be moved past a write to another, even if the types are incompatible. So, the reordering of R(p)/W(q) to W(q)/R(p) would not be allowed – though the reverse would be. Also, it doesn’t prevent the compiler from caching the result of pointer reads in registers if the only intermediate writes are through incompatible types – so, it cannot be said that the aliasing rules are pointless, even as they are presently written.

What really gets me is that the committee (in their conclusion on the defect report) seem to agree with the GCC interpretation. To turn around now and say that every one who wrote a program based on the wording of the standard got it wrong is crazy, particularly when there is a clear reason why you might want to rely on the original wording (so you could re-use allocated memory).

(I mean sure; the standard might not say or imply what it was intended to. If so, fix it in a later standard – C1X; and to be fair to the committee, maybe that’s what they’re intending. But don’t re-write history. As a programmer I need to be able to rely on what the standard says at the time that I read it).

As far as I’m concerned, this really is a bug in GCC. I mean, it does not follow the standard as written. Following the intention of the standard authors is irrelevant.

Advertisements

fork/exec is forked up

I don’t know why it’s taken me so long to realise this, but the whole fork/exec shebang is screwed. There doesn’t seem to be any simple standards-conformant way (or even a generally portable way) to execute another process in parallel and be certain that the exec() call was successful. The problem is, once you’ve fork()d and then successfully exec()d you can’t communicate with the parent process to inform that the exec() was successful. If the exec() fails then you can communicate with the parent (via a signal for instance) but you can’t inform of success – the only way the parent can be sure of exec() success is to wait() for the child process to finish (and check that there is no failure indication) and that of course is not a parallel execution.

If you have a working vfork() then you can communicate with the parent process using a shared memory location (any variable declared “volatile” should do the trick, bearing in mind that “volatile” is not particularly well defined) although doing so is not standards conformant (see http://opengroup.org/onlinepubs/007908775/xsh/vfork.html for instance). By “working” vfork() I mean a vfork() which 1. causes the address space to be shared between the child and parent process (rather than copying the address space as fork() does) and 2. causes execution of the parent process to be suspended until the child either successfully exec()s or _exit()s. The 2nd requirement is necessary to avoid the race condition whereby the parent process checks the status variable before the child process has actually tried to exec() or similar.

Even the vfork() solution, if it can be used, is reliant on the compiler doing the right thing both in terms of volatile access, and in terms of vfork() generally. (Note that “right thing” in terms of volatile access is the generally accepted definition, not that in the C standard. Also, the fact that vfork()s correct operation really requires compiler support is something that I’m not willing to go into right now, other than to say that the compiler is allowed to make the assumption that “if (vfork()) { } else { }” will always follow one branch and not the other which is not the case here as both branches will be executed with the same address space).

The only other solution I can think of is to use pipe() to create a pipe, set the output end to be close-on-exec, then fork() (or vfork()), exec(), and write something (perhaps errno) to the pipe if the exec() fails (before calling _exit()). The parent process can read from the pipe and will get an immediate end-of-input if the exec() succeeds, or some data if the exec() failed. This seems like it should be fairly portable, as long as FD_CLOEXEC is available, but I haven’t actually tried it out yet.


Update 8/1/2009: The latter solution really requires fork() instead of vfork(), because vfork() might suspend the parent and the write by the child might then block, causing a deadlock. (In practice a pipe is always going to have enough of a buffer for this not to happen). POSIX doesn’t even let a vfork()d child call write(), anyway.

13/11/2015 also worth mentioning is the possibility for priority inversion to occur if the child process is to be run at lower priority and a third process is running at some priority level between the other two. The parent, which is running at the highest priority, should receive notification of fork-exec status as soon as possible; however, the child (running at the lowest priority) will not send this notification while the third process is scheduled.

This can avoided only by not waiting for exec to succeed or fail, but instead to process the exec status (data coming over the pipe or pipe closure) asynchronously (17/11/2015 or by eg using ptrace() to stop the child after exec() so that its priority can be set at that point; this wouldn’t work however if the child was setuid [unless the parent was root] since you can’t change the priority of a process not owned by yourself).


Update 18/9/2012: There’s a reasonably portable solution to this whole mess in the form of posix_spawn. (Update again 11/11/2015 – no. posix_spawn allows various errors to result in the child process exiting with a status of 127, so implementations may exhibit the exact same issue as the fork/exec combination. I somehow missed this earlier).


TL;DR – it’s not easy to determine immediately and without blocking whether a fork/exec pair of calls succeeded. If the child process is to run at the same or higher priority as the parent, you can do it with a CLOEXEC pipe trick; if the child process is to run at lower priority, the pipe trick suffers from potential priority inversion problems.

HTTP specification

In the HTTP spec it says:

3.3 use of multipart/form-data

The definition of multipart/form-data is included in section 7. A
boundary is selected that does not occur in any of the data. (This
selection is sometimes done probabilisticly.)

Probabilisticly? Who wrote this shit? If you choose a boundary probabilisticly then given enough usage you will almost definitely, at some point in time, pick a boundary which will occur naturally in the data (though one alternative, choosing a boundary deterministically after scanning through the data, is not too appealing either). This sort of thing just shouldn’t be allowed to happen (even if the probability is low) because it can easily be prevented. There are two perfectly suited techniques to avoid the problem that this could cause:

  1. Mandate a Content-length header in each part (which obviates the need for a boundary tag anyway) OR
  2. Use content transfer encoding (escaping) so that the boundary cannot possibly occur in the encoded content

Neither of these techniques would be particularly difficult to implement or costly in terms of processing time or bandwidth (considering that Content-length for the entire body will generally need to be calculated anyway). The first one seems to be allowed by current standards, but not recommended anywhere and certainly not mandated (and arguably, doesn’t really solve the problem unless the standards are updated to say that it does – since the boundary tag should arguably be recognized wherever it occurs, even if it is partway through a body part according to that body part’s known content length). The second one has the problem that the only defined encoding which would be suitable is base64, and that incurs a 33% bandwidth overhead.

It’s really annoying that this sort of stupid blunder can make it into a standard which is so widely used. At least it seems it can’t lead to any security vulnerabilities (I think) but I pity the poor sucker whose file-upload-via-POST is failing due to a shoddy standard which says it’s ok that a boundary tag could possibly occur within the content.

ISO C99

I’m not sure when it happened but it looks like the C99 standard is now publicly available (for free download) in PDF form. Make sure to get the version which incorporates corrections TC1 and TC2. From the standard:

6.2.6.1 General – Representation of types

paragraph 3:

“Values stored in unsigned bit-fields and objects of type unsigned char shall be represented by a pure binary notation”.

A footnote tells us that a “pure binary notation” is:

“A positional representation for integers that uses the binary digits 0 and 1, in which the values represented by successive bits are additive, begin with 1, and are multiplied by successive integral powers of 2, except perhaps the bit with the highest position. … A byte contains CHAR_BIT bits, and the values of type unsigned char range from 0 to (2^CHAR_BIT) – 1.”

What the… ??

Let’s go through that footnote bit by bit, no pun intended:

A positional representation for integers that uses the binary digits 0 and 1 … ok, that’s fairly straightforward

… in which the values represented by successive bits are additive … I guess that means that to get the value as a whole, you add the values represented by the bits. Just like you do in any binary number. Although, this is a fairly retarded way of saying it.

… begin with 1… Does this mean that the first bit position has a value of 1? Or that the value for any bit position is one before it is multiplied by some power of 2? The latter is mathematically redundant so I guess it must be the former. Ok.

… and are multiplied by successive integral powers of 2 … Yes, ok, each bit is worth its face value (0 or 1) multiplied by “successive powers of two”, the begin with 1 beforehand means that 1 is the first power of 2 (it is 2^0), then next would be 2 (2^1) then 4, 8, 16 and so on. Again there must be a better way to say this.

… except perhaps the bit with the highest position. WTF!!?? So the highest bit position can be worth something completely different. This might make sense for representation of signed values in 2’s complement, but this was specifically referencing an unsigned type. Also:

A byte contains CHAR_BIT bits, and the values of type unsigned char range from 0 to (2^CHAR_BIT) – 1.

Do the math. If there are CHAR_BIT bits, the highest bit position is (CHAR_BIT – 1) if we number them starting from 0. Each bit except for that one is worth 2^position and the range we can represent using those bits as well as the highest bit is 2^CHAR_BIT – 1. What then must the highest bit position be worth? Why then specifically exclude it from this requirement?