GCC, strict aliasing, C99

Big fat note 28/02/10: A deeper analysis of the standard forced me to change my opinion on this particular topic after writing this blog entry. In other words, I was wrong (about several things) in this blog post. If you want info on C99 aliasing, Don’t read this post; read the other one instead.

I recently wrote about how Mysql was breaking the C99 strict aliasing rules. In a related topic, there’s an interesting discussion thread which recently started on the GCC developers mailing list. It started with a post by Joshua Haberman where he questions the interpretation of C99′s so-called strict aliasing rules in GCC. It’s an interesting point that he raised.

Essentially, the strict aliasing rules say that you must access memory (“object”) only by references (“lvalue expressions”) to compatible types, or by references to certain other types (such as character types). Joshua points out that one of the other types allowed is an aggregate or union type containing a member of the original type. On first examination this seems quite natural; if you alter a whole structure (or union) object then any pointer to one of its members should not have its value cached in a register (for instance). However, it has the implication that accessing a variable by first casting it to a structure or union is allowed, even though it is probably a rare case.

The ongoing discussion, though, is hilarious:

Richard Guenther, in the first reply, complains that Joshua is “literally interpreting” the standard. This is ridiculous – surely the standard is meant to be literally interpreted; that’s why it is written in such precise (and mind-numbingly verbose) language (but – keep reading).

Andrew Haley, in the next reply, insists (contrary to Richard) that GCC complies with the standard as worded and doesn’t even attempt to explain how Joshua (and Richard) might have gotten it wrong. He then repeatedly attempts to claim that Joshua’s understanding of the standard is incorrect, using irrelevant sections of the standard to try and prove this. After Erik Trulsson steps in to support Andrew, Andrew then has the gall to call Joshua stubborn! (Incidentally, Erik’s argument is irrelevant – because the standard does define what happens when you dereference a valid pointer, and the rules for pointer conversion in 6.2.3.2 at least suggest that pointer would be valid in the example, as long as alignment requirements were met).

[Update: the defect report mentioned in the next paragraph isn't really so relevant; I managed to conflate two separate issues when writing this entry; see clarification in the comments below. My main point still stands].

The discussion continues on and a relevant defect report (for the standard) is mentioned. The standard committee seem to have come to the wrong conclusion on this defect; that is, they say that example 1 in the defect report invokes undefined behavior when reading the standard makes it very clear that this is not the case (they do acknowledge the need for a wording change). This would seem to mean that the committee actually thinks the standard is wrong. That’s fine except that… it’s a standard. I mean, you’ve put it out there; that’s then set. You can revise it later, but to turn around and say “oh no, we didn’t mean that, no, ignore the wording, you’ve got to follow the spirit…” is just ridiculous. The whole point of having a standard is to be perfectly clear, and lay these things out without any ambiguity.

To put the issue in layman terms, the standard – as it is worded – allows re-using allocated memory to contain different types. That is, you can allocate some memory using malloc(), store something in it, use it, and then when you’re done with it you can store something else (even with an incompatible type) and re-use the memory. The argument on the GCC discussion list is that this should not be allowed (and in fact, GCC currently optimizes as if it is not allowed) because it interferes with optimization. [not correct; GCC allows this, though the C99 committee seem to think it needn't].

It’s true that the standard as written disallows certain useful optimizations. Unless the compiler can prove for certain that two pointers don’t alias, it has to assume that a read from one pointer cannot be moved past a write to another, even if the types are incompatible. So, the reordering of R(p)/W(q) to W(q)/R(p) would not be allowed – though the reverse would be. Also, it doesn’t prevent the compiler from caching the result of pointer reads in registers if the only intermediate writes are through incompatible types – so, it cannot be said that the aliasing rules are pointless, even as they are presently written.

What really gets me is that the committee (in their conclusion on the defect report) seem to agree with the GCC interpretation. To turn around now and say that every one who wrote a program based on the wording of the standard got it wrong is crazy, particularly when there is a clear reason why you might want to rely on the original wording (so you could re-use allocated memory).

(I mean sure; the standard might not say or imply what it was intended to. If so, fix it in a later standard – C1X; and to be fair to the committee, maybe that’s what they’re intending. But don’t re-write history. As a programmer I need to be able to rely on what the standard says at the time that I read it).

As far as I’m concerned, this really is a bug in GCC. I mean, it does not follow the standard as written. Following the intention of the standard authors is irrelevant.

About these ads

3 Responses to GCC, strict aliasing, C99

  1. Richard Guenther says:

    Actually

    “To put the issue in layman terms, the standard – as it is worded – allows re-using allocated memory to contain different types. That is, you can allocate some memory using malloc(), store something in it, use it, and then when you’re done with it you can store something else (even with an incompatible type) and re-use the memory. The argument on the GCC discussion list is that this should not be allowed (and in fact, GCC currently optimizes as if it is not allowed) because it interferes with optimization.”

    GCC never disallowed this as anonymous memory does not
    have declared type. What you cannot do (and what is
    optimized) is

    ptr = malloc ();
    (int *)ptr = 1;
    return (float *)ptr;

    because the effective type of the memory is that of the
    last store (int) and float is not a valid lvalue type
    to access it. Re-using the memory by

    (float *)ptr = 0.;

    is perfectly valid and properly handled.

    • davmac says:

      Richard, thanks for responding. I’m sure you’re correct in this case and I got myself slightly confused when writing the post. The actual way in which GCC doesn’t comply with the standard is the way that Joshua Haberman brought up in the initial post.

      (I even begin to understand Andrew Haley’s point, though I think he is wrong. Although the standard doesn’t specify that converting one pointer to another necessarily yields a valid pointer, we can know for a particular compiler/architecture that it does so; also the rules for conversion to/from character pointers imply that it is at least possible to convert from “A*” to “char *” and then to “B *” and be sure of having a valid pointer, as long as alignment requirements are met).

      I still think there is a GCC bug; even if the standard is “wrong” it is only so by later re-interpretation. My point is that the standard should be taken literally. Once we start interpreting it in any other way, we no longer have a standard.

      • davmac says:

        * Except, as now noted at the top of the post, I’ve reconsidered my position. The problem is really that the C99 standard is particularly badly worded.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: