POSIX timer APIs are borked

I’m currently working on Dasynq, an event loop library in C++ (which is not yet in a state of being ready for use by external projects, though the functionality it currently exposes does work correctly as far as I know). It has come to the point where I want to add timer functionality, and this has been frustratingly tricky, mostly due to horribly designed APIs.

There are a few basic requirements to set out before I start:

  • There are essentially two types of timer – relative and absolute. I either want the timer to expire some given interval from now, or I want it to expire at some specific (“wall clock”) time. In the latter case, if the system time is changed the timeout should be suitably adjusted. (Example: if I set an alarm for 04:00, and the system change is changed by the user from 03:25 to 04:15, the alarm should expire immediately).
  • I want to be able to be sure that I can use a timer, at some point in the future. That is, I need to be able to allocate timers in advance (without necessarily arming them immediately) or at least to be able to re-set an existing timer to a specified timeout. I should be able to avoid the situation where I need a timer, which I knew I would need in advance, but am unable to create one due to resource limits / exhaustion.
  • I need a reasonable level of resolution. Timers should be usable for everything from running weekly tasks to animation timing.

With the above points in mind, let’s take a look at what POSIX provides.

POSIX timer APIs

First, the most basic timer-like call provided by POSIX is the alarm(…) function. It has second granularity, which rules it out immediately.

Then, there’s setitimer(…). This isn’t a particularly nice interface and delivers timer expiry via a signal. There is only one timer (well, there is one timer for each of several of different kinds of clock), which means that to allow for multiple timers to be managed we need to essentially multiplex the single timer; by itself this isn’t such a huge problem, but other limitations of the API make it fundamentally difficult, and the API is pretty broken to begin with in several ways.

The first problem with setitimer is that the interval timers deliver timeout events via signals, which is awkward, especially since setitimer itself is not async-signal-safe, meaning you can’t call it from within the signal handler to set the next desired timeout; if you want to multiplex multiple timeouts over the interval timer interface, you’re forced to turn asynchronous event notifications into synchronous events (which of course is what libraries like Dasynq are all about, so by itself this isn’t a huge problem).

The next problem with setitimer is that it only allows setting a relative timeout. If I want to get a timer notification at an absolute time, I need to get the current clock time (clock_gettime), calculate the time remaining until that time, and then set the timer. Not allowing an absolute timeout means that setting an alarm for a wall-clock time is pretty much impossible – since if the system time is changed by the user, the timer’s timeout interval won’t be adjusted. However, there’s a more subtle issue here: time might elapse between the calculation and setting the timer – the process could be preempted just after calculating the interval, and in unusual cases might not be scheduled again for a significant period of time. By the time it finally arms the timer, the interval is significantly incorrect. The only work to work around this (that I can think of, other than pretending that the problem doesn’t exist) is to check the clock time immediately after setting the interval, to make sure that it’s within a certain window of tolerance of the original measurement – and if not, to re-calculate the interval and reset the timer.

The safety check just described requires a minimum of two clock_gettime calls (which generally means two calls into the kernel) just for setting a timer. If multiple timers are being managed over the top of a single interval timer, that’s going to mean that two clock_gettime calls are required on each timer expiry.

Generalised POSIX timer interface

The real-time POSIX extensions also define timer_create, which appears to solve some of the problems above:

  • It allows the creation of multiple independent timers
  • It allows for specifying absolute (as well as relative) timeouts, and when using the realtime clock the timeout interval will be adjusted appropriately (for absolute timeouts) if the system time is altered

However, notification is still either via a signal, or via a thread (SIGEV_THREAD); the latter is problematic for implementations, because there is usually no way to detect notification failure if a thread cannot be created due to resource limits, and because it requires userspace support; on Linux you need to link with -lrt (and thus also pull in the pthreads library) to use timer_create etc, even if you don’t use SIGEV_THREAD. On OpenBSD the situation is worse – the realtime extensions are generally not supported, and create_timer et al are not available at all.

Even using timers with SIGEV_SIGNAL notifications is less than ideal. Using such timers in different threads requires cooperation between all threads, to either choose different signal numbers for notification or otherwise to have a common signal handler somehow suitably dispatch notifications to the correct thread.

Non-POSIX solutions

On Linux, timerfds (timerfd_create et al) provide an apparently sane solution to the whole messy problem – it supports multiple timers, supports absolute and relative timeouts, and delivers events by file handle notifications (can be used with select/poll/epoll etc). A large number of timers could be feasibly multiplexed over a single timerfd, which is good from a resource management perspective.

On OpenBSD (and various other BSDs) there is the option of using kqueue timers. This supports multiple timers, and neatly solves the problem of notification; however it only allows relative timeouts, and the timeout/interval cannot be changed, meaning that multiple timers cannot be multiplexed over a single kqueue timer. Even worse, it is not possible to pre-allocate timers; once created, they begin countdown immediately. This makes it impossible to discover resource allocation failure until the point that the timer is actually needed.

In Conclusion

The POSIX timer APIs are awkward and clunky. The setitimer functions support only limited use cases.  On the other hand, the generalised interface would be difficult to use in a library (since it either requires signal handling or multi-threading). On Linux, the timerfd interface is an ideal substitute. On other systems the general timer interface can be used, with some caveats and trade-offs, but it is not always available; on systems where it is not, and there is no system-specific replacement, the only option for wall-clock timers is to assume that the system clock does not change (other than by the usual tick) while the system is running.

Advertisements

5 Gripes with C++

First of all, let me say that I actually like C++ as a programming language. This makes me a rarity among my associates, but in terms of a systems programming language it is, in my opinion, currently strides ahead of any existing alternative (especially C). But that’s enough of that; this post isn’t about how great C++ is; this post is in fact about a few of the things that I don’t like about C++. Here are they are in order of most to less annoying:

1. The stupid “empty base class optimization”

This is that thing where, if you have some class A that is empty (contains no data members and no virtual functions, thus not requiring a vtable and not, theoretically, requiring any space at all), then you discover that it’s not really empty because if you include it as a member in some other class, it will take up space.

class A { };
class B { };
class C { A a; B b; };

Now, sizeof(A)? Yeah, not 0. It will come out as 1. Same with sizeof(B) (which should not be surprising). And sizeof(C) is 2, which again is not surprising. How about if we change the definition of C though:

class C : A { B b; };

Now we get sizeof(C) = 1. You see, it turns out that objects of the same type are required to have distinct addresses – specifically: “Two distinct objects that are neither bit-fields nor base class subobjects of zero size shall have distinct addresses”, and sizeof() any class type must be at least 1, but because A and B are different types, and a special exception in the C++ language spec (C++11 1.8 para 5) that “Base class subobjects may have zero size”, it is now possible to locate the A (base class) subobject and the B (member) object at the same location, and the overall size of the derived class is reduced.

In a language with such fantastic meta-programming capabilities, where empty classes often serve as a way of containing a set of type traits for use in a template, this is significant. (A cheap example: C++ container types are templates with an element type and an allocator type; the container contains a member that is of the allocator type. Often, the allocator is an empty object, since it has no state; for example it just allocates memory using malloc()/free()).

So, ok, there is a trick to optimize size of objects by using inheritance, as shown above. In standard library implementations, this trick tends to be used heavily, because it can have a significant impact; implementations of std::pair and std::tuple, for instance, will generally use it to collapse empty members to zero size. That seems like a good idea, so why am I calling it stupid?

Because it shouldn’t be necessary.

The problem is that applying it disfigures the structure of your types. You end up inheriting from some type just because you want to make use of the empty base-class optimization, and your code becomes a right mess where accessing what should have been a member is now done instead by casting the “this” pointer… to make matters worse, you have to be careful when you use it that the potentially “empty” class really is empty, since if it has virtual methods you run the risk of accidentally overriding them.

There might be some good reasons to ensure that objects of the same type are always allocated at different addresses, but those reasons often don’t apply to the sort of classes that tend to be empty. It would be so easy, so very easy, to have some attribute (either on the type, or on members, or even both) saying that “this (object/type) does not need a unique address”, but for years we’ve instead had to perform acrobatics with our code to make use of what should be a simple and straightforward optimization.

2. Broken encapsulation model

So “private inheritance is for is-implemented-in-terms-of” and “public inheritance is for is-a relationships” are claims you may have heard at some point or other. I have no beef with how public inheritance works, but private inheritance is another kettle of fish.

Essentially private inheritance of some class X says, “I will be implemented via X. I will not be seen as an X to outside observers, however, I may pass myself of as an X when I deem it necessary to do so”. This is I suppose good for things like listener interfaces, where you want to receive events from another source (and so you need to inherit the event-listener base class) but you don’t want to expose the listener methods elsewhere. You still need to override some of the base class methods (otherwise, you could’ve used composition instead of inheritance: that is, have a member of type X, rather than privately inheriting from X).

Right, so what’s the problem? The problem is that it is still possible to override virtual private methods, including methods which are private by virtue of private inheritance by a class further up the hierarchy. If you have a class A, and a class B that privately inherits A, and then a class C that inherits B (publicly or privately), C shouldn’t know or care about B’s relationship to A, right? But it so happens that if you accidentally name a method (with an appropriate signature) the same as a method from A, you will now override that method and suitably screw up everything. That’s the problem: private inheritance is not private enough. Although, to be honest, I could envisage other changes to the language that could do away with the need for private inheritance altogether, which brings me to my next point.

3. Container object from member subobject is non-standard

Suppose I have an object of type A with a member, b, of type B. Further suppose that I have a pointer to the member b; maybe even it is a “this” pointer, because I am implementing a method in the B class. Now, if I know my B object is a singular member of an A container object, I should be able to convert a pointer-to-B to a pointer-to-A which points at the container object easily enough, right? Something like:

char * c = reinterpret_cast<char *> b_ptr;
A * a_ptr = reinterpret_cast<A *>(c - offsetof(A,b));

Easy, right? Now… hmm… I know C++’s private inheritance actually breaks encapsulation principles (see above), but could I use this little trick to overcome that problem? Let’s say I want A to “privately inherit” from some class C. Instead of using actual private inheritance in A, I use inheritance (public or private, doesn’t matter) in my member class B, and I make “B b;” a private member of A. This truly hides the relationship between A and C, since there’s no way I could subclass A and accidentally override one of C’s methods. If the overridden method (which is now in B) needs to access any of A’s data or methods, that’s fine, I can use the method above to do so; it’s a little ugly, but it works… right?

Well, yeah, it does work; it’s just that it’s not standard. “offsetof” is only required to work for plain-old-data types (which among other restrictions don’t contain any virtual methods, or any members that do). This amazingly-useful-in-the-real-world technique isn’t actually required to work by the language (in fact it explicitly classifies it as “undefined”).

The standards-compliant alternative of having an explicit pointer member in the sub-object which points to the containing object works but has a runtime cost. So, you’re faced with a choice: leaky encapsulation via private inheritance, or runtime penalty due to unnecessary extra pointer storage.

What I’d really like to see is a straightforward syntax which directly supported this technique, instead of having to jump through reinterpret_cast/offsetof-hoops to use it (only to be then warned by the compiler that your code is non-compliant). It would be easy enough to do this in such a way that it delivered the expected performance gain in real-world compilers while still behaving correctly in theoretical compilers which store objects via hashtables or something equally daft.

4. No proper mixins

What C++ programmers call “the mixin pattern” is inheritance-of-template-parameter, a technique that is occasionally useful to augment a class via another “mixin” class (usually designed for the purpose). So for example if I have:

class A<T> : public T { /* ... */ };

… then I can “mix in” any class that I like, causing the resulting template instantiation to include its methods. The main problem with this approach is that the mixed-in class is unable to call any methods from the class it is mixed into; it is, after not, not a true mix-in – it’s just plain old public inheritance, and that’s a one way street. The most direct way to work around this is to declare virtual methods in the mixin class which will then be overridden in the target class, but this has a performance overhead and also has the unfortunate effect, potentially, of allowing these methods to be accidentally overridden in subclasses of A<T>.

So, it would be sorta nice if there were real mixins – where I could just declare mixin classes specially, and then pull them into another class via some declaration (or even just overload the inheritance syntax). Obviously this would probably require the whole source of the mixin to be included in a header, but that’s already the case with templates anyway. The mixin classes would somehow need to declare members that they expect the mixed-to (or other mixed-in) class to provide.

5. There should be more flexibility in dealing with inherited members

We’re now scraping the bottom of the barrel a little, as the four points above are the main gripes I have with C++; but 5 is a nice round number.

Basically my complaint here is that names are fixed in the base class and can’t be changed in the derived class. If I have a class A with virtual method m and I publicly derive from A in another class B, then in B the method is also called m, and if I want to override it I have to use the same name, m, throughout the entire class hierarchy from that point. If I’m desperate enough I could implement a new method f which just delegated to m, and I could even make m final at the same time so that everyone’s forced to override f instead from that point, but of course there’s a runtime overhead.

Why can’t I just rename methods? Why can’t I say, “from this point in the hierarchy on, method m will now be called f”? (Or more accurately: method f overrides method m).

It seems like a small thing, but occasionally I’ve wanted something like this. There are other related issues: I can shadow a base class non-virtual method, why can’t I shadow a final method? How am I supposed to deal with multiple base classes declaring same-name same-signature methods that I need to override separately in a derived class (especially considering I need all the help I can get if I’m forced to use multiple inheritance, right?) Why can’t I remove a base-class method from visibility (causing it to be shadowed rather than overridden in further derived classes)? And of course, why can a class override a base class private method at all? (eh-hmm broken encapsulation model).

Conclusion

That about rounds it out. 5 things about C++ that I would like to see improved. Just throwing it out there… who knows, maybe someone on the committee will pay attention… pretty please?

OpenGL spec for glDrawRangeElementsBaseVertex is rubbish

The title says it all: the spec for the glDrawRangeElementsBaseVertex function is rubbish.

glDrawRangeElementsBaseVertex is a restricted form of glDrawElementsBaseVertex.

Ok, but:

mode, start, end, count and basevertex match the corresponding arguments to glDrawElementsBaseVertex, with the additional constraint that all values in the array indices must lie between start and end, inclusive, prior to adding basevertex.

glDrawElementsBaseVertex doesn’t have a start or end argument. Perhaps the above should say “mode, count, type, indices and basevertex”, since type and indices seem to have the same meaning for both functions?

Index values lying outside the range [start, end] are treated in the same way as glDrawElementsBaseVertex

But… you just said that all the index values must be inside that range. Perhaps substitute “outside” with “inside” to make this sentence make sense?

Does no-one proof-read this stuff? bug submitted.

Update: so it turns out that ‘in the same way as glDrawElementsBaseVertex’ is supposed to mean ‘in an implementation-defined manner consistent with how similarly out-of-range indices are treated by glDrawElementsBaseVertex’. I feel like the wording could be much clearer but I’m not going to argue this one. The parameter specifications are clearly incorrect and this should be fixed in a revision.

Why is there no decent, simple, structured text format?

So I want a structured text format usable for configuration files and data interchange. My key requirements can be boiled down to:

  • Syntax allows concise expression (this alone rules out XML)
  • Simple to parse (this also rules out XML)
  • Suitable for human “consumption” (reading, editing). To some degree, this rules out XML.

As you can see, XML is definitely out. But oddly enough, I’m struggling to find anything I’m really happy with.

Oh yeah, I know, I know, JSON, right? I have these main problems with JSON:

1. Excessive quotation marks required

So for a simple set of key-values I need something like:

{
    "key1" : "overquoted",
    "key2" : "the perils of javascript"
}

I mean this isn’t crazy bad, but why are the quotes even necessary? I mean wouldn’t it be nice if I could instead write:

{
    key1 : overquoted,
    key2 : "the perils of javascript"
}

Given that, at the point key1 and key2 appear, an alphabetical character may not otherwise legitimately be present, what would be the harm in allowing this? (sure, if you want spaces or punctuation in your key, then you should need to quote it, but not otherwise). Similarly for values. It would be nice if those unnecessary quotes weren’t actually required.

2. No comments

This one really irks me. For a configuration file, comments are pretty much mandatory. Douglas Crockford gives an explanation for why there are no comments in JSON (why he removed comments from the spec, in fact), and it sucks. Basically: people weren’t using comments the way I wanted them to, so I removed comments. Yeah, I think we just hit ludicrous speed. There are so many things wrong with this argument I barely know where to begin. At the very outset, anyone using comments as parsing directives was going to need a custom parser anyway – what they were dealing with wasn’t plain JSON. So actually changing JSON does not affect those people; they will continue to use their custom parsers. In fact all you do by removing comments is make the standard less useful generally. The follow up:

Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser.

… is equally ridiculous. I mean sure I could strip comments before handing off the the parser, but then my original data isn’t actually JSON, is it? And so interoperability is destroyed anyway, because I can no longer use any standard-JSON tools on my configuration files.

3. Unclear semantics

The current RFC for JSON has, in my opinion, a lot of guff about implementations that just doesn’t belong in a specification. Problematically, this discussion could be seen to legitimise limitations of implementations. Take for example:

An object whose names are all unique is interoperable in the sense that all software implementations receiving that object will agree on the name-value mappings. When the names within an object are not unique, the behavior of software that receives such an object is unpredictable.

What does that mean, exactly? That it’s allowed that objects with non-unique names cause software to behave “unpredictably”? This is meant to be a specification imposing requirements on implementations, but it’s hard to glean from this text precisely what the requirements are, in particular because (just above that):

The names within an object SHOULD be unique.

That’s SHOULD, not MUST (see RFC 2119); which implies that non-unique names are in fact permissible and must be handled by an implementation. I wonder how many JSON libraries will properly represent such an object… not many, I’d guess.

So How About YAML?

YAML does solve the problems that I identified with JSON above. It doesn’t require superfluous quotation marks, it’s clear about map semantics, and it allows comments. On the other hand, its specification is quite large. Part of the complexity comes from the concept of tags which are a way of identifying types. While the YAML core specification (“failsafe schema”) deals only with maps, sequences and strings, it allows for explicitly tagging any value as a particular type identified by a “tag”. The schema notionally defines how tags are mapped to actual types and allows for specifying rules for determining the type of otherwise untagged ‘plain scalar’ (roughly: unquoted string) values. So for instance the JSON schema – which makes YAML a superset of JSON – maps sequences of digits to an integer type rather than the string type. The fact that different schemas yield different semantics, and that arbitrary types (which a given implementation may not know how to handle) can be assigned to values, in my opinion reduces YAML’s value as an interchange format.

(For instance, a tool which merges two YAML maps needs to know whether 123 and “123” are the same or not. If using the failsafe schema, they are strings and are the same; if using the JSON schema, one is a number and they are not the same).

In fact, the whole notion of schemas leads to the question of whether it is really up to the text format to decide what type plain nodes really are. Certainly, maps and sequences have a distinct type and are usually unambiguous – even YAML doesn’t allow a schema to re-define those – and are enough to represent any data structure (in fact, just sequences would be enough for this). I also think it’s worth while having a standard quoting mechanism for strings, and this is necessary to be able to disambiguate scalar values from structures in some cases. But beyond that, to me it seems best just to let the application determine how to interpret each scalar string (and it can potentially use regular expressions for this, as YAML schemas do), but that for purposes of document structure scalars are always just strings. This is essentially what the YAML failsafe scheme does (and it even allows disambiguating quoted strings from unquoted strings, since the latter will be tagged with the ‘?’ unknown type).

It’s worth noting that YAML can handle recursive structures – sequences or maps that contain themselves as members either directly or indirectly. This isn’t useful for my needs but it could be for some applications. On the other hand, I imagine that it greatly complicates implementation of parsers, and could be used for attacks on poorly coded applications (it could be used to create unbounded recursion leading to stack overflow).

Or TOML?

TOML is a relative newcomer on the scene of simple structured text formats. It has a fixed set of supported types rather than allowing schemas as YAML does, and generally aims to be a simpler format; on the other hand it is much closer to YAML in syntax than JSON and so is much easier to read and edit by hand.

Among the supported types are the standard map / sequence / string, but also integer, float, boolean and date-time. This seems fine, but again I’m uncertain that having more just than the basic “string” scalar type is really necessary. On the other hand having these types properly standardised is unlikely to cause any harm.

I think the one downside to TOML is the ungainly syntax for sequences of maps – it requires double-square brackets with the name of the sequence repeated for each element:

[[sequencename]]
key1 = value1
key2 = value2
[[sequencename]]
key1 = value1
key2 = value2

Nested maps are also a bit verbose, requiring the parent map name to be given as a prefix to the child map name:

[parent]
[parent.child]
key1 = value1  # this key and all following keys are in the child map

The top level node of a TOML structure, if I understand correctly, must always be a map, since you specify key-value pairs. This is probably not a huge concern for my purposes but is certainly a limitation of the format. Once you’ve opened a map (“table” in TOML parlance) there’s also no way to close it, it seems, other than by opening another table.

I think the occasional ugliness of the syntax, together with the immaturity of the format, are deal breakers.

And so the winner…

… Is probably YAML, at this stage, with the failsafe schema, although the potential for recursive structures makes me a little uneasy and it’d be nicer if I didn’t have to explicitly choose a schema. It’s also a shame that the spec is so wordy and complex, but the syntax itself is nice enough I think and seems like a better fit than either JSON or TOML.

Custom memory allocation is not possible in standard C

I’ve recently been perusing the C’11 standard final draft, mostly hoping to find some resolutions to the various inconsistencies and problems I’ve noted previously with the C99 standard (with no real success). In particular I read section 7.22.3 (C11; 7.20.3 in C99), which discusses the malloc family of functions:

The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object or an array of such objects in the space allocated …
This little snippet is interesting because it hints at the intended usage of the methods. We call malloc (for instance), which returns a “void *”, and then we assign it to some other type of pointer and use that pointer to access the object that we allocated. What I begin to wonder about is how well this conversion of pointers is defined.
malloc returns a “void *”. When we assign a “void *” value to a pointer of another type, C99’s 6.5.16.1p2s says the value of the “void *” is “… converted to the type of the assignment expression and replaces the value stored in the object designated by …” the assignee. Fine; so what happens during the conversion? First, 6.3.2.3p1:
A pointer to void may be converted to or from a pointer to any incomplete or object type. A pointer to any incomplete or object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer.
Note with care “the result shall compare equal to the original pointer”; 6.5.9p6 tells us what this means:
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function …
(Plus a couple of other cases, to do with arrays). What’s interesting here is the stipulation that the pointers must compare equal, not that they are the same in other regards – a bit more on this later. What’s also particularly interesting is the notion that the situations for which pointers, after conversion, must compare equal and therefore point at the same object are quite limited. Other than 6.3.2.3p1 (as quoted earlier) we have only 6.3.2.3p7:
A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.
It’s not even made explicit that conversion from a “T *” (for some type T) to “char *” results in a pointer which must compare equal to the original, i.e. it’s not perfectly clear that the “lowest addressed byte” of 6.3.2.3p7 is the same thing as a “subobject at its beginning” as per 6.5.9p6 (though this case it seems a reasonable assumption). What’s particularly interesting is that conversion from a “void *” to some “T *” is not particularly well-defined. All that we have is 6.3.2.3p1, which says that the conversion is permissible, and that if inverted the result will compare equal with the original pointer (No no no! It’s the other way around – converting from T* to void* and back will compare equal; from void* to T* and back need not). Take the following code for example:
int a = 5;      // 1
void *va = &a;  // 2
float *fa = va; // 3
void *vb = fa;  // 4
float *fb = vb; // 5
int *pa = vb;   // 6

Now answer the following questions, assuming that the alignment requirements for all involved types are met (except when answering the first question). Write down your answers:

1. Is it possible for line (3) to produce undefined behavior, if the alignment requirements for ‘float’ are more stringent than that for ‘int’? (assume that line 3 does not produce undefined behavior for the following questions).

2. Would (fa == va) necessarily be true immediately after (3)?

3. Would (vb == va) necessarily be true immediately after (4)?

4. Would (fb == fa) necessarily be true immediately after (5)?

5. Would (pa == &a) necessarily be true immediately after (6)?

Here are my own answers, with discussion:

1. No, because 6.3.2.3p1 allows conversion from a void pointer to any type of pointer. The conversion is not necessarily allowed in both directions, however. 6.3.2.3p7 does not come into play because a pointer to void is not a pointer to an object type.

Yes, I think, because 6.3.2.3p7 comes into play. This does open the question of what the purpose of 6.3.2.3p1 is, however, since it appears to be completely encompassed by 6.3.2.3p7. (One could perhaps be forgiven for thinking that 6.3.2.3p1 is trying to remove the alignment requirements in the case of conversion from the “void *” type).

2. No. Nowhere does it state that these pointers must compare equal.

3. No.  As per costeau’s comment below, the transition from void-pointer-to-object-pointer-to-void-pointer is not guaranteed to yield a pointer equal to the original.

Yes. We converted a “void *” (va) to a “float *” and back again, so by 6.3.2.3p1 (and p7)  the result must compare equal with the original. This shouldn’t be surprising.

4. Yes, by 6.3.2.3.p7 p1. Again, this shouldn’t be surprising.

5. No.This is the really disturbing result.Although va and vb must compare equal (and therefore point at the same object), no further requirements are placed on the similarity of their behavior. (actually there is no requirement that va and vb compare equal! I have really goofed this up). Thus, converting them both to “int *” might yield different pointer values which do not compare equal (and which do not point at the same object!). Putting it another way, although va and vb must compare equal (wrong…), they need not be the same value!

Edit: A much better example, because I got that so wrong before:

int a = 5;
void *ap = &a;
void *bp = &a;

Note that it is not necessarily true that ap == bp, but it is true that (int *)ap == (int *)bp.

More generally, take this example:

char *a = malloc(sizeof(int));
void *r = a;
float *f = r;

Note that there is no guarantee now that f addresses the same object as does a (and indeed, there’s no guarantee that r holds a pointer equal to the one that was returned by the call to malloc!). So, if we were able to obtain a pointer to a chunk of memory (i.e. an object) there’s actually no way we could use it, unless it happened to be the right pointer type to begin with.

I really, really, do not like the answer for 5, but I can’t find a way to come to the opposite conclusion based on the standard as worded. Note that in general, conversion from “void *” to another type of pointer – in fact, conversion of any “T *” to another pointer type other than “char *” – is not very well defined. Interestingly, the only thing in the standard that lets us use the result of the malloc() function is the documentation for malloc() itself (as quoted earlier). But creating a custom, general-purpose memory allocation (malloc-like facility) is just not possible. You simply can’t return a usable “void *” value unless you derived it from a pointer to the desired type in the first place. Am I missing something?

What “restrict” really means

I recently stumbled across a web page claiming to be “Demystifying The Restrict Keyword“, a claim which I consider dubious at best. Of the semantics of C99’s restrict, the page (written by a Mike Acton) says:

[when declaring a restrict-qualified pointer] I promise that the pointer declared along with the restrict qualifier is not aliased. I certify that writes through this pointer will not effect the values read through any other pointer available in the same context which is also declared as restricted.

Err, wrong. Firstly, a pointer declared with restrict is allowed to have aliases, it’s just that restrictions are placed on how those aliases are used or created. The C99 standard as always makes use of completely inaccessible language to describe the semantics of “restrict” – see 6.7.3.1, “Formal definition of restrict” – but in this case we can wade through the cruft and extract some actual meaning. The key point is that if a value pointed at by a restrict-qualified pointer is accessed through that pointer in any way, and is also modified through some pointer expression, then the latter pointer expression must be “based on” the restrict-qualified pointer. “Based on” means what it sounds like it means – essentially, that the value of the expression depends on the value of the pointer on which it is based. Here’s a counter-example to the restriction on aliasing expressed by Mike Acton above:

int * restrict p = ...;
int a = *p;   // here we have used 'p' for access
int * q = p;  // q aliases p
*q = 4;       // q is "based on" p, so this is allowed

The second part – “I certify that writes through this pointer will not effect the values read through any other pointer available in the same context which is also declared as restricted” – is also wrong (and not just grammatically). What restrict means is (and I’m simplifying just a little) that – if the restrict-qualified pointer is used to access an object, and the object is modified through any pointer, then the latter pointer and all other pointers (regardless of whether they are qualified with restrict or not) which are used to access the object will be “based on” the restrict-qualified pointer. Because we are allowed (with some restrictions) to assign the value of one restrict-qualified pointer to another, we can easily devise another counter-example to Mike Acton’s contract:

int * restrict p = ...;
{
    int * restrict q = p;  // q aliases p, both are restricted
    // Note that both q and p are "available in [this] context"
    *q = 4;  // this writes through q and affects the value read through p,
             // and yet is perfectly legal! (because q is based on p).
}
(*p);  // note the read through p must occur outside the block above, because
       // p is not based on q.

“restrict” does have some tricky semantics and it’s a shame that someone would “demystify” them so misleadingly. It could be a grave mistake to think that a non-restrict-qualified pointer can always safely alias a restrict-qualified pointer for instance.

It doesn’t quite stop there; the page goes on with a length example of using restrict to improve optimisation of a small function. The solution presented goes too far, assigning the address of each member of three different objects of the same structure type to different restrict-qualified pointers before performing calculations on them. This shouldn’t really be necessary; qualifying the structure pointers themselves with restrict should suffice, since the compiler should already know that different members of the same struct cannot legally alias (however, I’ve done some testing with gcc 4.6.3 and I’m not sure that this is the case with this compiler). Also, I’m fairly certain the solution invokes undefined behaviour via the odd and unnecessary pointer arithmetic involving a ‘stride’; A better version follows:

void
move( vector3* restrict velocity, 
      vector3* restrict position, 
      vector3* restrict acceleration, 
      float             time_step,  
      size_t            count)
{

  for (size_t i=0;i<count;i++)
  {
    float* restrict acceleration_x = &acceleration[i].x;
    float* restrict velocity_x     = &velocity[i].x;
    float* restrict position_x     = &position[i].x;
    float* restrict acceleration_y = &acceleration[i].y;
    float* restrict velocity_y     = &velocity[i].y;
    float* restrict position_y     = &position[i].y;
    float* restrict acceleration_z = &acceleration[i].z;
    float* restrict velocity_z     = &velocity[i].z;
    float* restrict position_z     = &position[i].z;
    
    const float ax  = *acceleration_x;
    const float ay  = *acceleration_y;
    const float az  = *acceleration_z;
    const float vx  = *velocity_x;
    const float vy  = *velocity_y;
    const float vz  = *velocity_z;
    const float px  = *position_x;
    const float py  = *position_y;
    const float pz  = *position_z;

    const float nvx = vx + ( ax * time_step );
    const float nvy = vy + ( ay * time_step );
    const float nvz = vz + ( az * time_step );
    const float npx = px + ( vx * time_step );
    const float npy = py + ( vy * time_step );
    const float npz = pz + ( vz * time_step );

    *velocity_x   = nvx;
    *velocity_y   = nvy;
    *velocity_z   = nvz;
    *position_x   = npx;
    *position_y   = npy;
    *position_z   = npz;
  }
}

From my tests this was by far the most performant (on a core-i7 based machine) with gcc (compilation options: -std=c99 -O3 -march=core2). When using the clang compiler, there wasn’t much variation in results between any version, including the naive version which didn’t use restrict at all (a slight performance benefit can be noticed when simply introducing restrict qualification into the parameter declarations, but no further benefit comes from using the code above, and only a negligible benefit comes from Mike Acton’s code which I believe invokes undefined behaviour anyway).

 

C99 and type-punning – making sense of a broken language specification

Formerly: C99 revisited

[None of the issues discussed here have, to my knowledge, been addressed in the much newer C11 standard. I have edited this document from time to time, when I discovered relevant parts of the standard or made new revelations].

My previous posts (here and elsewhere) about C99’s so-called aliasing rules made statements about what those rules really mean that some people don’t agree with. It’s worth delving into the issue again, with critical reference to the actual standard. The rules are complex. Having now examined the standard thoroughly, I think my original take was not 100% correct, but I’m also left with the strong impression that the standard document has serious issues.

Status Quo

There are a few attempts to explain C99 aliasing in various places on the web. If you’re new to the issue, here’s a brief summary: C99 essentially requires that memory (an “object” in C99 parlance) is accessed via a consistent pointer type, so for instance if you access something via an “int *” (i.e. an lvalue of type “int”) then you shouldn’t then attempt to read it via a “float *”. The relevant section of the standard is 6.5 paragraph 7. There are some exceptions listed there to the general rule; importantly, you are always allowed to accesss memory via a character type (eg char *) and via memcpy() and memmove() (although even this can be problematic, as per later discussion). It is also allowed to access an object via an lvalue whose type is an aggregate type (structure or union) with a member whose type is that of the object being accessed. This was discussed in my previous blog entry; it essentially would seem to allows access to an int value (for example) via a pointer to a union with an int member, the contentious issue being whether or not the value must actually be a member of a union object (more discussion will follow).

GCC relaxes the aliasing rules in one sense, that is, access to the same object is allowed via two incompatible types so long as the types are members of a union and that access is, in both cases, done through the union. This extension is widely used in open-source code, and though it is not strictly conformant with C99 [nor C11] according to the current wording it is most likely intended to be supported of the standard (see the note on TC3, to follow). I will discuss it further below. Another way to access an object as another type is to copy its value to a variable of the other type via a “char *” or memcpy() (if the value is to be modified, the modification can then be copied back in the same manner, with significant caveats). Certainly this might seem less efficient, but a clever compiler should be able to recognize what is being attempted and optimize it accordingly (although at the time of writing, GCC doesn’t always do this very well).

Analysing the standard

6.5p7 states that “objects can have their stored value accessed only by an lvalue expression” that has one of various types. Note that the word “expression” in this paragraph is redundant seeing as an lvalue is defined (in 6.3.2.1p1) as being a kind of expression. It’s also not completely certain how an expression can access a stored value; it seems likely that what is meant here is that evaluation of an lvalue can cause access to a stored value.

Clearly 6.5p7 regulates access to objects; so what is an object? The definition in the standard is that an object is a region of data storage, the contents of which can represent one or more values; it also notes that objects may have an associated type. Also, 6.2.5p20 states that a structure type describes a set of member objects. There is an implication here that objects can overlap, that is, a structure object overlaps its various member objects.

The fact that access is allowed through an aggregate type (containing a member with a compatible type) is really interesting, as it appears to give us a clue as to how “access” is to be interpreted in the case of overlapping objects. Section 3.1 defines access as reading or modifying the value of an object, however it is not clear if accessing a member object of an aggregate type also should be considered an access of the aggregate object, and vice-versa.

For example: in an expression such as “a.b”, what is the type of the lvalue through which access occurs? Is it:

  1. The type of “a” (let’s call it “struct A”) or
  2. The type of “b” (let’s call it “B”) or
  3. There are two accesses, one via the type of “a” and one via the type of “b”

Also, there’s the question of which object or objects are accessed. Is it:

  1. The object representing the “b” member of the aggregate or
  2. The aggregate object “a” or
  3. Both the aggregate object and the member object or
  4. The aggregate object and all its member objects, recursively.

Unfortunately the definition of “access” provided is very weak. For guidance we look elsewhere in the standard. In 6.5.2.3, member access is defined (as a postfix operator), and specifies that the value is that of the member and that the lvalue type is that of the member; but it is not clear if the access is actually to the aggregate/union object (and the member value is then extracted from this), or is directly to the member, or whether there are two distinct accesses (one to the containing object and one to the member object). None of these would be ruled out by 6.5p7.

What about if we copy a structure directly however? For example, in the expression “s1 = s2”, where s1 and s2 are of the same union type, there is no explicit member access. It could however easily be argued that a struct object access also constitutes access to each of the struct members, and that is presumably why 6.5p7 allows access to an object via an expression which as “an aggregate type … containing one of the aforementioned types”. (In fact it seems justifiable to assume than an expression such as “a.b”, where “a” is a struct type, accesses both the container object and the member object, where the lvalue for access to the container object is “a”, and that it actually causes access to all member objects when evaluated).

Interestingly enough, though, there seems to be an assumption that access to a member object by itself does not access a containing object. For instance, consider the following:

struct A {
    int a;
    int b;
} s_a;
s_a.a = 1;
s_a.b = 2;
int *p = &s.a;
*p = 3;

I have a lot of trouble believing that the code sample above could have undefined behaviour, however the only way that this code’s behaviour could be defined according to 6.5p7 is if accessing a member object does not also constitute access to a containing aggregate object. It is unfortunate that the definition of access seems to preclude this – since an access is to “modify the value of an object”, and surely modifying an aggregate member value also modifies the value of the aggregate object?

Anyway, we have reached what seems to be a consistent understanding. Accessing an object does not constitute access to the containing object; however accessing a container object also accesses all contained members (bearing in mind that a union only ever “contains” one member at a time, as will be elaborated on below). This both allows access to a structure member via a pointer to the member type, and justifies the allowance for accessing a member object via the containing aggregate type. With this interpretation of “access” in mind, 6.5p7 seems quite reasonable.

Trap representations.

6.2.6.1 discusses trap representations. Essentially a “trap representation” is an invalid value which causes undefined behavior when it is accessed or generated. Paragraph 6 contains this interesting snippet:

The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation.

The purpose of this seems to be to allow access to structures (and unions) which may have, for instance, uninitialized fields. Unfortunately, this doesn’t work if we assume, as we did before in order to reconcile with 6.5p7, that access to an object also constitutes access to all contained objects. Thus, even if the structure value itself is not a trap representation, accessing a structure object with a field whose value was a trap representation would still cause undefined behavior according to immediately preceding paragraph (6.2.6.1p5):

If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined.

It is unfortunate that the C99 standard is not clearer on the relationship between overlapping (aggregate and union) objects. I can see no good way to reconcile the text of the standard with the current consensus understanding of the language. About the best we can do is say that accessing a member object does not constitute access to the containing object, nor vice versa; this however makes part of 6.5p7 redundant. (We could try to work around this by instead saying that only reading an aggregate object does not access its members, but that writing to one does; however, there doesn’t seem to be any text that would justify such a differentiation).

What about unions?

Union types presumably exist as a convenient way to allocate storage suitable for one of various object types, but they are also commonly used (as discussed above) for accessing the contents of an object (with an effective type) via a different type (“type punning”). For instance I can store a value into a float member of the union and then access the int member of the union to interpret the representation as an int. Or… can I? Although GCC allows this (so long as all accesses occur through the union type) and many open-source programs make use of it, the standard states that the value of a union member other than the last one stored is unspecified (Annex J.1) – specifically:

The following are unspecified:

  • The value of a union member other than the last one stored into

This is underneath the heading “J.1 Unspecified Behavior”. Since “unspecified behavior” and “unspecified values” are different things, it’s not really clear which is intended here, however an “an unspecified value cannot be a trap representation” (3.17.3). The TC3 amendments add a footnote which allows type-punning via a union, but notes that “This might be a trap representation.” [apparently Annex J was forgotten. It was, however, amended in the later C11 standard, removing the above point from the list].

(Note that the Annexes are not considered normative, that is, they supplement the main text, but they should not impose additional requirements. In this case the annex claims that there is no requirement on the value of inactive union members. The footnote added in the amendment would appear to change that, however, footnotes aren’t normative either. This implies that either that the footnote is incorrect, or that the statement in Annex J was incorrect even before the addition of the footnote.)

Now, 6.2.6.1p5 states that accessing a value that is a trap representation results in undefined behaviour. However, if we apply the same rules that we inferred for struct types, then accessing a union access all its members… and, if any one them has a trap representation, then the behaviour is undefined! Presumably, this is not intended, and in fact the standard does imply that only one member is accessed when accessing the union (as will be discussed).

What specifically does the standard have to say about unions?

  • 6.2.5p20, which tells us that a union has “an overlapping set of member objects”
  • Footnote 37 from 6.2.5p22, which says that “an object with union type can contain only one member at a time”
  • 6.2.6.1p6, which says that a union object may not be a trap representation, although its member objects may be
  • 6.5.2.3, which says that the value of an expression using the union member access operators (“.” or “->”) is “The value is that of the named member” (and the footnote added in TC3, mentioned below).
  • 6.7.2.1p14, which says that “the value of at most one of the members can be stored in a union object at any time”
  • Annex J.1 says that “the value of a union member other than the last one stored into” is unspecified [removed in C11].
  • In TC3, a footnote is added which specifies that accessing a member of a union other than the last one stored causes “the object representation” to be re-interpreted in the new type and specifically refers to this as “type-punning”. This conflicts to some degree with Annex J.1.

Note that footnote 82, added in C99’s Technical Corrigendum #3, seems to allow type punning through a union. However, being a footnote, it is non-normative, meaning that it should only clarify the regular text; unfortunately, in this case, it seems to actually contradict the normative text. Specifically, (6.7.2.1) “the value of at most one of the members can be stored in a union object at any time” together with (6.5.2.3) “The value is that of the named member” strongly implies that reading the value of a “non-active” union member should be impossible.

Defect Report 236

It’s worth referring to C99 defect report 236 at this point. The report includes two examples of code with aliasing concerns and asks whether they are conforming. I agree with the response for example 2 (it is impossible to store to the member of a union when that member isn’t active, and the union is the declared and therefore the effective type; therefore the store must be via either the active type [i.e. a pointer to the active member] or the union type). Example 1 on the other hand is clearly conforming; yet, the committee response is that both programs “invoke undefined behaviour”. It is acknowledged that the wording of the standard requires changing and yet I see no changes to 6.5p7 in any later drafts.

Note that the ability to change the effective type of an object (which has no declared type) by storing a value, as implied by 6.5p7, is useful for re-using allocated storage; that is, it becomes possible to store a value of one type and later use the same storage for a value of another type. This is a potentially worthwhile usage. I have been informed by a GCC developer that GCC supports this usage and my testing indicates that it does (version 4.8.3). The suggested change in DR236 disallows such usage.

It seems odd that the DR236 is closed, with a resolution that appears to conflict with the present wording of the standard, yet no amendment was ever made.

Note that the current wording doesn’t disallow all type-based alias analysis, and arguably doesn’t prevent most useful type-based alias analysis. For instance, it is still possible to re-order read operations.

Update 30-May-2014: More recently, I’ve stumbled on the N1520 proposal, titled “Fixing the rules for type-based aliasing“, which is interesting as it gives some insight into what at least one ISO committee member believes is the intended effect of the standard. However, it’s dated 2010, it has not (as far as I know) been implemented, and it fails to address several shortcomings in the current wording (the “one problem” mentioned above, and exactly how/when objects come into existence; the document itself acknowledges this second issue, but does not address it). At one point the author appears to be highly confused:

Basically, if it is kept in mind that an lvalue expression (as mentioned in the introductory clause of 6.5p7) with aggregate or union type implies access of all of its member subobjects (recursively), and that accessing a subobject of an aggregate or union constitutes an access to the containing aggregate or union, it seems to me that the fifth bullet is not necessary, and could just be deleted.

As best as I can tell, the purpose of the fifth bullet is specifically to allow this access (i.e. it allows aggregate object access to access all member objects). If you removed the fifth bullet, then such access would no longer be allowed, and if you assume – as N1520 suggests – that accessing an aggregate object also implies access to the subobjects, then you would no longer be allowed to access any aggregate object because 6.5p7 would now prevent the implied access to the subobjects.

So, can an object be accessed by first casting to an aggregate or union type?

The question brought up by Joshua Haberman on the GCC developers’ mailing list (which I discussed previously) was whether a variable value can be accessed by first casting its address to a union pointer where the union contained the original type as a member. 6.5p7 would, if one is liberal in interpretation, seem to allow this; the only two legitimate arguments against this were that alignment requirements need to be met (it was pointed out that this could be ensured by other means) and that it might not be valid to dereference a pointer obtained by performing such a cast. This latter argument was brought up by Andrew Haley. Initially I was skeptical, but to be fair we’d need to look at relevant portions of the standard.

6.5.3.2p4 states that if an invalid value has been assigned to a pointer which is dereferenced, the behaviour is undefined. Given that this seems almost tautological, it is strange that it explicitly requires that the invalid value has been assigned rather than just that the value is invalid. For instance, if I obtain a pointer via a cast (or some other operation) then surely the pointer can be invalid without an assignment of that invalid value ever having taken place, and surely its dereference should still result in undefined behaviour. (Really, in this case the behaviour is still undefined by virtue of it… well, not being defined).

6.3.2.3 states all the ways in which pointers can be converted to different types. Amongst other things, it ensures that we can convert a “T *” (for any type T) to a “void *” and back and guarantee that the “result will compare equal” to the original pointer. 6.5.9p5 adds that pointers which compare equal will point to the same object (including an object and a subobject at its beginning). 6.3.2.3p7 allows us to convert to any pointer type (not just “void *”) and back, yielding an equal pointer, so long as alignment requirements are met. It does not however say what the resulting pointer actually points at, except in the case of the converted-to type being “a pointer to character type” where the result points at the first byte of the object. 6.7.2.1p13 suggests that converting between a pointer to a struct type and a pointer to the type of its first member yields a pointer to the first member (i.e. the ‘expected’ semantics) and vice versa (the precise wording is “suitable converted”, an operation which is not actually defined).

That is enough information to think that we should be able to take a pointer to a structure’s first member (named “m”, of type “M”) and convert it to a pointer to the structure itself, safely, and dereference that pointer. However, we can only do this if an object of the structure type actually exists, since otherwise the result of the pointer conversion is not specified.

Aside: why does 6.5.9p6 assert that two pointers can compare equal if one points “one past the end of an array object” and the other points to “the start address of a different array object” [which happens to immediately follow the first in the address space]? Why isn’t it enough to assert that it point at the start address of any other object at that location? As worded, this would force compilers not to allocate non-array objects just beyond the end of an array, except that it could probably be argued that two pointers in this situation “point to the same object” anyway). (ah – there is already a defect report #215 about this, though Technical Corregendum 3 doesn’t fix it).

So, with a strict interpretation of the standard, you cannot in general dereference a “T *” if there is no “T” at the pointed-to location, even if the members of T are there. This is due primarily to the pointer conversion not being defined if we attempt to produce a “T *” for a location where there is no “T” object, which is likely a surprising result to many who assume that this is instead a result of the strict-aliasing rules in §6.5.

I believe that this is what led Josh Haberman eventually (see his comment on this post, below) to decide that 6.5p7 is not normative. However, this is erroneous, since it is still in fact possible to create a pointer to an object which is of the wrong type:

void * m = malloc(sizeof(int) + sizeof(float));
// object '*m' has no effective type at this point
int * i = m;
*i = 5;   // effective type is now int (6.5p6)
float * f = m;
*f = 5.0;  // effective type is now float (6.5p6)
// i still points to the object (which is now of type float)
int a = *i; // this access is not allowed due to 6.5p7

6.5p7 prevents this kind of aliasing.

On the other hand, although the result of a pointer conversion between a pointer to a member type and a pointer to the containing structure type is only specified if a structure object actually exists, in practice (for most common architectures and compilers) we know exactly what the result of such a conversion will be – it will be a pointer that points to the same location but having a different type. This can even be proved by asserting that the two pointers are equal after the conversion (such a comparison isn’t actually allowed in C99, because it limits the types of pointers that can be compared with each other; however, compilers generally lift this restriction and allow comparison of pointers of arbitrary type).

With this in mind, we could cast a pointer-to-first-member to a pointer-to-containing-structure and then 6.5p7 would not appear to prevent accessing the first member via the structure pointer. If the member were then accessed via the ‘->’ operator we could consult 6.5.2.3p4:

A postfix expression followed by the -> operator and an identifier designates a member of a structure or union object. The value is that of the named member of the object to which the first expression points, and is an lvalue. […]

Now, this could (and I suggest should) be interpreted as requiring that the pointer does point to an existing object of the structure type, since otherwise the named member cannot exist. (Possibly, it also causes access to this structure object). I think this is the strongest argument for why you cannot ‘cast into existence’ a structure or union object, even though the result of the pointer conversion is known (if implementation-defined).

Consensus Interpretation on unions (GCC and LLVM)

In general problems arise can arise when compiling with GCC and where an access to a struct member occurs where the there is no struct object. That is to say, if we do:

a = p->m;

… then this is seen as constituting an access to the object pointed to by ‘p’ rather than just the member object ‘m’, and the access to this object is bound by the strict aliasing rules.

Making sense of the union rules is difficult. The intent of footnote 82 is clearly to allow type-punning through a union, but if the compiler must assume that two arbitrary pointers may alias due to pointing at different members of the same union, then pretty much all type-based alias analysis (TBAA) goes out the window and the strict-aliasing rules (C99 6.5p7) are pointless. The GCC documentation for the -fstrict-aliasing switch clearly states that:

type-punning is allowed, provided the memory is accessed through the union type

I.e. you can use a union for type punning but access via both types must be “through the union type”, i.e. they must be performed as member accesses. LLVM seems to follow the same restriction (shown by testing behavior with some simple test cases; LLVM 3.6.1).

6.5.2.3 contains a paragraph seemingly legitimising a certain limited form of type punning, where structure types having a “common initial sequence” are punned via a union. C99 requires that the union declaration be visible, implying that (unlike the GCC requirement) access need not be through the union type (since if access were required to be through the union type, then the union type would necessarily be visible and it would not be necessary to require this explicitly). However, GCC does not appear to support even this limited form of type punning unless accesses are via the union type. (It’s worth noting that it’s not clear whether de-referencing a pointer to an inactive union member is legal anyway – does the member object actually exist? Would attempting such result instead in accessing the active member, thereby potentially violating the aliasing rules?).

Type punning via ‘char *’ and ‘memcpy’

6.5p7 does allow access to any object via a “char *” pointer and via the “memcpy” and “memmove” standard library functions. Naively, you might think this was sufficient to allow type punning (and it does allow a limited form). For instance, suppose we have a pointer (“fp”) to a “float” value which we want to reinterpret as an “int”:

int r;
memcpy(&r, fp, sizeof(int));

There is little doubt that the above code does not invoke undefined behavior, assuming that sizeof(float) >= sizeof(int), though the value of r is not specified and presumably may be a trap value. If we modify the value of “r” we should in theory then be able to copy the representation back to the original location:

r = ...;
memcpy(fp, &r, sizeof(int));

Again this should be fine. The problem is that we then attempt to read the floating point value (assume we can guarantee somehow that it is not a trap representation):

float f = *fp;

Surprisingly, this last line could fall foul of the strict aliasing rules (6.5). Whether this is the case depends on whether the “*fp” object has a declared type or not. According to the standard, copying an “int” actually alters the effective type of the destination object if it has no declared type. In the case that “fp” points to a dynamically allocated object (i.e. allocated via malloc()) then it doesn’t have a declared type. This makes it very difficult to perform type punning on any object with no declared type. We could copy via a temporary with the correct declared type:

r = ...;
float ftemp;
memcpy(&ftemp, &r, sizeof(int));
memcpy(fp, &ftemp, sizeof(int));
float f = *fp;

Seeing as we now copy from an object with a declared type of “float”, the effective type of “*fp” remains (or becomes) “float”. However, this is extremely clunky, and it requires us to know what the expected type of the object (if we have a “float *”, it seems reasonable to expect that the object it points to is a “float”, but what about if the original pointer is a “void *” or cannot otherwise reliably be assumed to point at a particular type?)

The whole “declared type” issue is also clouded by the existence aggregate types. If the destination object is a member object (i.e. has declared type) inside a dynamically allocated aggregate object  (no declared type), what happens? (Consider especially the case of unions, or structures having a single member). Again the definition of “access” becomes important, and the text in the standard is insufficient.

Conclusion

I have analysed the C99 standard in regards to aliasing reasonably thoroughly, more so than any other web resource that I’m aware of. The standard is unfortunately very unclear and we are forced to make various assumptions if we want to interpret it in anything like the common understanding (as implemented by compilers). Unresolved issues include:

  • the proper definition of, and lifetime of, objects; In particular:
    • Does a union member object exist when it is not the active member (i.e. not the last assigned-to member?) If it does, what is the effect of accessing it; does it have a specified value? Is behaviour defined if the object is modified?
  • what is meant (in 6.5p7) when it refers to accessing the “stored value” of an object, in terms of how this contrasts with accessing the object itself.
  • whether or not access to an object implies access to subobjects; and if so, what happens when the subobjects hold trap representations (i.e. what in this case is the significance of the requirement that structure/union objects have no trap representation)
  • whether or not access to an object implies access to any containing objects; and if so, why such access is forbidden by 6.5p7
  • If a value is copied (as per 6.5p7) using memcpy/memmove/character array, then what determines “the object from which the value is copied” if the region copied matches more than one object (such as a struct or union with a single member occupying the entire aggregate).
  • Is it really intended that copying a value (as per 6.5p7) into an object with no declared type always changes the effective type of the object to that of the source object? (If so, this effectively precludes type punning in a dynamically allocated object).

I’d like to see these issues addressed in a future version of the standard.