Lies, damned lies, and Rails documentation

I was recently playing with Rails again. A while back a site I was working on had an issue where the parameters from an uploaded form were all coming in as file objects – even the parts that didn’t correspond to uploaded files. The client in this case was some Java software using the Apache Httpclient library. We’d only recently upgraded Rails, and it turned out we were seeing the problem described here:

http://blogs.sun.com/mandy/entry/rack_and_content_type_fo

The problem was easily worked around. This led me on a hunt, however, for some documentation on how exactly you are meant to handle form uploads in Rails. About the only vaguely official-looking documentation I can find at this point is the Action Controller Overview guide, which has this to say:

Note that parameter values are always strings; Rails makes no attempt to guess or cast the type

Clearly baloney. If one of the form parts is a file upload, then what you get is an object with several methods defined including “read” to return the contents of the file as a string, as well as things like “original_filename” which tell you the filename as specified by the uploader. None of this is documented anywhere.

It’s this “we don’t need to document it” attitude that really annoys me about Rails. Sure, there’s all this magical stuff you can do with it and it has all these built-in goodies that can save you hours of coding just by using magic variable names (except for those times of course when you have to spend hours debugging, because you used one of those magic variable names by mistake and without realising). But there’s just no reliability with the API. The params hash for instance – what exactly am I allowed to do with a value I pull from the hash? Can I be sure that it will either be a string or one of these mysterious file upload objects? This isn’t the only omission in the documentation; for instance what does the “session” method in ActionController::Request actually return? What am I allowed to call on it?

This is the real strength of languages with strong typing – programs written in them are, to a certain degree, self-documenting. If I know what type a method actually returns then I know for certain what methods I can call. No nasty surprises when the framework developers suddenly decide to make it return a different kind of object!

Edit 27/8/2010: Another major strength of statically-typed languages that I forgot to mention is of course the ability to refactor code with the assistance of automated tools. As discussed in this blog post by Cedric (and I love the title).

Advertisements

Schwartz did not. Innocent until proven otherwise.

OSNews is running an article with this headline:

De Icaza: Sun’s Schwartz Pitched Google Lawsuit to Oracle

Seriously. De Icaza is saying that Schwartz had actually pitched the idea of a a Google lawsuit to Oracle; but he has nothing to back it up.

His report has been confirmed by James Gosling …

No it hasn’t. James Gosling’s blog post says nothing about J. Schwartz whatsoever. In this community I’d expect a bit more constraint. Icaza is full of crap in this. All you have to do is read James’ blog post. He, repeat, says nothing about Schwartz. Why is De Icaza pointing fingers?

Icaza specifically says:

So now we know that Jonathan shopped Sun with a big “Sue Google” sign. So much for his visionary patent defense against Apple and of course this jewel: …

Ehh, wrong. Icaza, you really are an idiot. Read James’ friggin’ post. OSNews, you are also idiots for reporting this.

Update 2012-05-07: So Google’s star witness in Oracle vs Google is… Jonathan Schwartz. So much for De Icaza’s credibility.

Updated C compiler benchmarks: llvm-2.7, libfirm-1.18.1

Since I did my previous C compiler benchmark, there have been a few new software releases, including Gcc 4.5, llvm 2.7 (and 2.6) and libfirm 1.18 (and 1.17). Here are results from benchmarking these compilers:


Most important things: Gcc version is now 4.4.4, though this hasn’t made much difference to the results. llvm-2.7 is benchmarked both with the “native” clang frontend, and the gcc-4.2 based frontend; note the difference in bm1, 3 and 6.

In the case of LLVM, results have improved significantly in bm2 and bm5.

For LibFirm, results have improved dramatically for bm5, but are also significantly worse for bm3.

Gcc is still (subjectively) the overall winner, though it does get beaten in a few of the benchmarks (bm2,6,7) by LLVM and even in a couple (bm5,7) by LibFirm.

Gcc 4.5 has been released but has yet to be benchmarked. Edit 13/7/2010: Gcc 4.5 benchmarked, and the results are so similar to those for Gcc 4.4.4 that it’s not worth reporting them separately.

There are three new tests:

bm8 tests the compiler’s ability to remove trivial loops (loops with empty bodies, but a large number of iterations). The loops are implemented with “goto” rather than a C loop structure, and there is an inner and outer loop. No tested compiler manages to  remove the loops.

bm9 tests the compiler’s ability to break up a loop into two separate loops, when doing so avoids an efficiency problem caused by a potential aliasing pointer. The function in the benchmark takes two arguments: an int array and an int pointer “p”. The inner part of the loop uses the value pointed at by “p” as part of an expression assigned to each array element in turn. Because “p” might alias one of the array elements, the stored value must be re-loaded each iteration, unless the compiler breaks the loop into two. None of the tested compilers perform this optimization.

bm10 tests that a shift-left of a value inside a large loop can be reduced to a constant (due to having shifted all the bits out). None of the tested compilers manage this.

It’s disappointing that some optimizations which potentially yield huge benefit are not implemented in any of the tested free compilers; however, GCC 4.5 has been released and is yet to be tested (edit 13/7/2010: tested, no surprises). Also, GCC includes some options for loop optimization which are not turned on at any optimization level, due to them requiring external libraries at build time; it would be interesting to see whether these would make any difference in any of the tests (13/7/2010. They don’t).

Further update 4/8/2010: I’ve just tested with gcc 4.5.1. No very noticable differences were seen from the performance of 4.4.4, however, I am running a new kernel and this seems to slightly affect the numbers for 4.4.4; for instance, bm3 comes down to about the 0.162 mark, more-or-less even with LLVM.

Floating point comparisons, by someone who doesn’t understand math.

A link to the following page was posted on Slashdot today:

http://floating-point-gui.de/errors/comparison/

Although the author asks Slashdot for comments (urgh) I’m not willing to comment there as whatever I will say will be lost in the noise. (yep, there are some clever people who post to Slashdot, it’s just hard to see them behind all the trolls and idiots). I would just send the author an email but they have for some reason neglected to include their name and contact details anywhere.

Anywya, the assertion is made that you shouldn’t compare floating point values using an epsilon, specifically:

if( Math.abs(a-b) < 0.00001) // wrong - don't do this

The reasoning however is difficult to follow:

This is a bad way to do it because a fixed epsilon chosen because it “looks small” could actually be way too large when the numbers being compared are very small as well.

Geez. If both numbers are small, then they are in fact close together in value and the above comparison is perfectly valid. It is in fact the whole point of the comparison. You don’t need to take the magnitude of the values into account unless, well, you have a good reason for it. And in general, if the magnitude is significant, you don’t want to be doing an equality comparison anyway. The key is not to have a dynamic epsilon (which is ultimately what the convoluted piece-of-filth code the author then presents ultimately, er, fails to achieve) but rather to have an epsilon value which is meaningful in the context of the comparison (i.e. how much of a difference in value are you willing to tolerate, taking error into account?).

Yes, larger magnitudes have less precision, but that doesn’t necessarily mean you want to widen the range of what is considered equal. Somtimes, exactly what you want in such cases is for the comparison to always fail (unless of course the values are actually identical).

Incidentally if you did want to take magnitude into account, it would probably be much simpler to do something like:

float d = a / b;
if (d > 0.999 && d < 1.001) {
    // near enough to equal
}

… though of course you may (depending on language and environment, and specifically divide-by-zero behaviour) still need to handle the case of “b” being 0 (most likely by considering the values unequal, unless “a” is also zero).

The point is, consider your needs when doing floating point comparison. It’s true that you’ll rarely want to compare floats for exact equality, but exactly how you test for “inexact equality” is dependent on the situation. Think about what you are trying to achieve, and don’t believe what some monkey’s web page says.

C99 and type-punning – making sense of a broken language specification

Formerly: C99 revisited

[None of the issues discussed here have, to my knowledge, been addressed in the much newer C11 standard. I have edited this document from time to time, when I discovered relevant parts of the standard or made new revelations].

My previous posts (here and elsewhere) about C99’s so-called aliasing rules made statements about what those rules really mean that some people don’t agree with. It’s worth delving into the issue again, with critical reference to the actual standard. The rules are complex. Having now examined the standard thoroughly, I think my original take was not 100% correct, but I’m also left with the strong impression that the standard document has serious issues.

Status Quo

There are a few attempts to explain C99 aliasing in various places on the web. If you’re new to the issue, here’s a brief summary: C99 essentially requires that memory (an “object” in C99 parlance) is accessed via a consistent pointer type, so for instance if you access something via an “int *” (i.e. an lvalue of type “int”) then you shouldn’t then attempt to read it via a “float *”. The relevant section of the standard is 6.5 paragraph 7. There are some exceptions listed there to the general rule; importantly, you are always allowed to accesss memory via a character type (eg char *) and via memcpy() and memmove() (although even this can be problematic, as per later discussion). It is also allowed to access an object via an lvalue whose type is an aggregate type (structure or union) with a member whose type is that of the object being accessed. This was discussed in my previous blog entry; it essentially would seem to allows access to an int value (for example) via a pointer to a union with an int member, the contentious issue being whether or not the value must actually be a member of a union object (more discussion will follow).

GCC relaxes the aliasing rules in one sense, that is, access to the same object is allowed via two incompatible types so long as the types are members of a union and that access is, in both cases, done through the union. This extension is widely used in open-source code, and though it is not strictly conformant with C99 [nor C11] according to the current wording it is most likely intended to be supported of the standard (see the note on TC3, to follow). I will discuss it further below. Another way to access an object as another type is to copy its value to a variable of the other type via a “char *” or memcpy() (if the value is to be modified, the modification can then be copied back in the same manner, with significant caveats). Certainly this might seem less efficient, but a clever compiler should be able to recognize what is being attempted and optimize it accordingly (although at the time of writing, GCC doesn’t always do this very well).

Analysing the standard

6.5p7 states that “objects can have their stored value accessed only by an lvalue expression” that has one of various types. Note that the word “expression” in this paragraph is redundant seeing as an lvalue is defined (in 6.3.2.1p1) as being a kind of expression. It’s also not completely certain how an expression can access a stored value; it seems likely that what is meant here is that evaluation of an lvalue can cause access to a stored value.

Clearly 6.5p7 regulates access to objects; so what is an object? The definition in the standard is that an object is a region of data storage, the contents of which can represent one or more values; it also notes that objects may have an associated type. Also, 6.2.5p20 states that a structure type describes a set of member objects. There is an implication here that objects can overlap, that is, a structure object overlaps its various member objects.

The fact that access is allowed through an aggregate type (containing a member with a compatible type) is really interesting, as it appears to give us a clue as to how “access” is to be interpreted in the case of overlapping objects. Section 3.1 defines access as reading or modifying the value of an object, however it is not clear if accessing a member object of an aggregate type also should be considered an access of the aggregate object, and vice-versa.

For example: in an expression such as “a.b”, what is the type of the lvalue through which access occurs? Is it:

  1. The type of “a” (let’s call it “struct A”) or
  2. The type of “b” (let’s call it “B”) or
  3. There are two accesses, one via the type of “a” and one via the type of “b”

Also, there’s the question of which object or objects are accessed. Is it:

  1. The object representing the “b” member of the aggregate or
  2. The aggregate object “a” or
  3. Both the aggregate object and the member object or
  4. The aggregate object and all its member objects, recursively.

Unfortunately the definition of “access” provided is very weak. For guidance we look elsewhere in the standard. In 6.5.2.3, member access is defined (as a postfix operator), and specifies that the value is that of the member and that the lvalue type is that of the member; but it is not clear if the access is actually to the aggregate/union object (and the member value is then extracted from this), or is directly to the member, or whether there are two distinct accesses (one to the containing object and one to the member object). None of these would be ruled out by 6.5p7.

What about if we copy a structure directly however? For example, in the expression “s1 = s2”, where s1 and s2 are of the same union type, there is no explicit member access. It could however easily be argued that a struct object access also constitutes access to each of the struct members, and that is presumably why 6.5p7 allows access to an object via an expression which as “an aggregate type … containing one of the aforementioned types”. (In fact it seems justifiable to assume than an expression such as “a.b”, where “a” is a struct type, accesses both the container object and the member object, where the lvalue for access to the container object is “a”, and that it actually causes access to all member objects when evaluated).

Interestingly enough, though, there seems to be an assumption that access to a member object by itself does not access a containing object. For instance, consider the following:

struct A {
    int a;
    int b;
} s_a;
s_a.a = 1;
s_a.b = 2;
int *p = &s.a;
*p = 3;

I have a lot of trouble believing that the code sample above could have undefined behaviour, however the only way that this code’s behaviour could be defined according to 6.5p7 is if accessing a member object does not also constitute access to a containing aggregate object. It is unfortunate that the definition of access seems to preclude this – since an access is to “modify the value of an object”, and surely modifying an aggregate member value also modifies the value of the aggregate object?

Anyway, we have reached what seems to be a consistent understanding. Accessing an object does not constitute access to the containing object; however accessing a container object also accesses all contained members (bearing in mind that a union only ever “contains” one member at a time, as will be elaborated on below). This both allows access to a structure member via a pointer to the member type, and justifies the allowance for accessing a member object via the containing aggregate type. With this interpretation of “access” in mind, 6.5p7 seems quite reasonable.

Trap representations.

6.2.6.1 discusses trap representations. Essentially a “trap representation” is an invalid value which causes undefined behavior when it is accessed or generated. Paragraph 6 contains this interesting snippet:

The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation.

The purpose of this seems to be to allow access to structures (and unions) which may have, for instance, uninitialized fields. Unfortunately, this doesn’t work if we assume, as we did before in order to reconcile with 6.5p7, that access to an object also constitutes access to all contained objects. Thus, even if the structure value itself is not a trap representation, accessing a structure object with a field whose value was a trap representation would still cause undefined behavior according to immediately preceding paragraph (6.2.6.1p5):

If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined.

It is unfortunate that the C99 standard is not clearer on the relationship between overlapping (aggregate and union) objects. I can see no good way to reconcile the text of the standard with the current consensus understanding of the language. About the best we can do is say that accessing a member object does not constitute access to the containing object, nor vice versa; this however makes part of 6.5p7 redundant. (We could try to work around this by instead saying that only reading an aggregate object does not access its members, but that writing to one does; however, there doesn’t seem to be any text that would justify such a differentiation).

What about unions?

Union types presumably exist as a convenient way to allocate storage suitable for one of various object types, but they are also commonly used (as discussed above) for accessing the contents of an object (with an effective type) via a different type (“type punning”). For instance I can store a value into a float member of the union and then access the int member of the union to interpret the representation as an int. Or… can I? Although GCC allows this (so long as all accesses occur through the union type) and many open-source programs make use of it, the standard states that the value of a union member other than the last one stored is unspecified (Annex J.1) – specifically:

The following are unspecified:

  • The value of a union member other than the last one stored into

This is underneath the heading “J.1 Unspecified Behavior”. Since “unspecified behavior” and “unspecified values” are different things, it’s not really clear which is intended here, however an “an unspecified value cannot be a trap representation” (3.17.3). The TC3 amendments add a footnote which allows type-punning via a union, but notes that “This might be a trap representation.” [apparently Annex J was forgotten. It was, however, amended in the later C11 standard, removing the above point from the list].

(Note that the Annexes are not considered normative, that is, they supplement the main text, but they should not impose additional requirements. In this case the annex claims that there is no requirement on the value of inactive union members. The footnote added in the amendment would appear to change that, however, footnotes aren’t normative either. This implies that either that the footnote is incorrect, or that the statement in Annex J was incorrect even before the addition of the footnote.)

Now, 6.2.6.1p5 states that accessing a value that is a trap representation results in undefined behaviour. However, if we apply the same rules that we inferred for struct types, then accessing a union access all its members… and, if any one them has a trap representation, then the behaviour is undefined! Presumably, this is not intended, and in fact the standard does imply that only one member is accessed when accessing the union (as will be discussed).

What specifically does the standard have to say about unions?

  • 6.2.5p20, which tells us that a union has “an overlapping set of member objects”
  • Footnote 37 from 6.2.5p22, which says that “an object with union type can contain only one member at a time”
  • 6.2.6.1p6, which says that a union object may not be a trap representation, although its member objects may be
  • 6.5.2.3, which says that the value of an expression using the union member access operators (“.” or “->”) is “The value is that of the named member” (and the footnote added in TC3, mentioned below).
  • 6.7.2.1p14, which says that “the value of at most one of the members can be stored in a union object at any time”
  • Annex J.1 says that “the value of a union member other than the last one stored into” is unspecified [removed in C11].
  • In TC3, a footnote is added which specifies that accessing a member of a union other than the last one stored causes “the object representation” to be re-interpreted in the new type and specifically refers to this as “type-punning”. This conflicts to some degree with Annex J.1.

Note that footnote 82, added in C99’s Technical Corrigendum #3, seems to allow type punning through a union. However, being a footnote, it is non-normative, meaning that it should only clarify the regular text; unfortunately, in this case, it seems to actually contradict the normative text. Specifically, (6.7.2.1) “the value of at most one of the members can be stored in a union object at any time” together with (6.5.2.3) “The value is that of the named member” strongly implies that reading the value of a “non-active” union member should be impossible.

Defect Report 236

It’s worth referring to C99 defect report 236 at this point. The report includes two examples of code with aliasing concerns and asks whether they are conforming. I agree with the response for example 2 (it is impossible to store to the member of a union when that member isn’t active, and the union is the declared and therefore the effective type; therefore the store must be via either the active type [i.e. a pointer to the active member] or the union type). Example 1 on the other hand is clearly conforming; yet, the committee response is that both programs “invoke undefined behaviour”. It is acknowledged that the wording of the standard requires changing and yet I see no changes to 6.5p7 in any later drafts.

Note that the ability to change the effective type of an object (which has no declared type) by storing a value, as implied by 6.5p7, is useful for re-using allocated storage; that is, it becomes possible to store a value of one type and later use the same storage for a value of another type. This is a potentially worthwhile usage. I have been informed by a GCC developer that GCC supports this usage and my testing indicates that it does (version 4.8.3). The suggested change in DR236 disallows such usage.

It seems odd that the DR236 is closed, with a resolution that appears to conflict with the present wording of the standard, yet no amendment was ever made.

Note that the current wording doesn’t disallow all type-based alias analysis, and arguably doesn’t prevent most useful type-based alias analysis. For instance, it is still possible to re-order read operations.

Update 30-May-2014: More recently, I’ve stumbled on the N1520 proposal, titled “Fixing the rules for type-based aliasing“, which is interesting as it gives some insight into what at least one ISO committee member believes is the intended effect of the standard. However, it’s dated 2010, it has not (as far as I know) been implemented, and it fails to address several shortcomings in the current wording (the “one problem” mentioned above, and exactly how/when objects come into existence; the document itself acknowledges this second issue, but does not address it). At one point the author appears to be highly confused:

Basically, if it is kept in mind that an lvalue expression (as mentioned in the introductory clause of 6.5p7) with aggregate or union type implies access of all of its member subobjects (recursively), and that accessing a subobject of an aggregate or union constitutes an access to the containing aggregate or union, it seems to me that the fifth bullet is not necessary, and could just be deleted.

As best as I can tell, the purpose of the fifth bullet is specifically to allow this access (i.e. it allows aggregate object access to access all member objects). If you removed the fifth bullet, then such access would no longer be allowed, and if you assume – as N1520 suggests – that accessing an aggregate object also implies access to the subobjects, then you would no longer be allowed to access any aggregate object because 6.5p7 would now prevent the implied access to the subobjects.

So, can an object be accessed by first casting to an aggregate or union type?

The question brought up by Joshua Haberman on the GCC developers’ mailing list (which I discussed previously) was whether a variable value can be accessed by first casting its address to a union pointer where the union contained the original type as a member. 6.5p7 would, if one is liberal in interpretation, seem to allow this; the only two legitimate arguments against this were that alignment requirements need to be met (it was pointed out that this could be ensured by other means) and that it might not be valid to dereference a pointer obtained by performing such a cast. This latter argument was brought up by Andrew Haley. Initially I was skeptical, but to be fair we’d need to look at relevant portions of the standard.

6.5.3.2p4 states that if an invalid value has been assigned to a pointer which is dereferenced, the behaviour is undefined. Given that this seems almost tautological, it is strange that it explicitly requires that the invalid value has been assigned rather than just that the value is invalid. For instance, if I obtain a pointer via a cast (or some other operation) then surely the pointer can be invalid without an assignment of that invalid value ever having taken place, and surely its dereference should still result in undefined behaviour. (Really, in this case the behaviour is still undefined by virtue of it… well, not being defined).

6.3.2.3 states all the ways in which pointers can be converted to different types. Amongst other things, it ensures that we can convert a “T *” (for any type T) to a “void *” and back and guarantee that the “result will compare equal” to the original pointer. 6.5.9p5 adds that pointers which compare equal will point to the same object (including an object and a subobject at its beginning). 6.3.2.3p7 allows us to convert to any pointer type (not just “void *”) and back, yielding an equal pointer, so long as alignment requirements are met. It does not however say what the resulting pointer actually points at, except in the case of the converted-to type being “a pointer to character type” where the result points at the first byte of the object. 6.7.2.1p13 suggests that converting between a pointer to a struct type and a pointer to the type of its first member yields a pointer to the first member (i.e. the ‘expected’ semantics) and vice versa (the precise wording is “suitable converted”, an operation which is not actually defined).

That is enough information to think that we should be able to take a pointer to a structure’s first member (named “m”, of type “M”) and convert it to a pointer to the structure itself, safely, and dereference that pointer. However, we can only do this if an object of the structure type actually exists, since otherwise the result of the pointer conversion is not specified.

Aside: why does 6.5.9p6 assert that two pointers can compare equal if one points “one past the end of an array object” and the other points to “the start address of a different array object” [which happens to immediately follow the first in the address space]? Why isn’t it enough to assert that it point at the start address of any other object at that location? As worded, this would force compilers not to allocate non-array objects just beyond the end of an array, except that it could probably be argued that two pointers in this situation “point to the same object” anyway). (ah – there is already a defect report #215 about this, though Technical Corregendum 3 doesn’t fix it).

So, with a strict interpretation of the standard, you cannot in general dereference a “T *” if there is no “T” at the pointed-to location, even if the members of T are there. This is due primarily to the pointer conversion not being defined if we attempt to produce a “T *” for a location where there is no “T” object, which is likely a surprising result to many who assume that this is instead a result of the strict-aliasing rules in §6.5.

I believe that this is what led Josh Haberman eventually (see his comment on this post, below) to decide that 6.5p7 is not normative. However, this is erroneous, since it is still in fact possible to create a pointer to an object which is of the wrong type:

void * m = malloc(sizeof(int) + sizeof(float));
// object '*m' has no effective type at this point
int * i = m;
*i = 5;   // effective type is now int (6.5p6)
float * f = m;
*f = 5.0;  // effective type is now float (6.5p6)
// i still points to the object (which is now of type float)
int a = *i; // this access is not allowed due to 6.5p7

6.5p7 prevents this kind of aliasing.

On the other hand, although the result of a pointer conversion between a pointer to a member type and a pointer to the containing structure type is only specified if a structure object actually exists, in practice (for most common architectures and compilers) we know exactly what the result of such a conversion will be – it will be a pointer that points to the same location but having a different type. This can even be proved by asserting that the two pointers are equal after the conversion (such a comparison isn’t actually allowed in C99, because it limits the types of pointers that can be compared with each other; however, compilers generally lift this restriction and allow comparison of pointers of arbitrary type).

With this in mind, we could cast a pointer-to-first-member to a pointer-to-containing-structure and then 6.5p7 would not appear to prevent accessing the first member via the structure pointer. If the member were then accessed via the ‘->’ operator we could consult 6.5.2.3p4:

A postfix expression followed by the -> operator and an identifier designates a member of a structure or union object. The value is that of the named member of the object to which the first expression points, and is an lvalue. […]

Now, this could (and I suggest should) be interpreted as requiring that the pointer does point to an existing object of the structure type, since otherwise the named member cannot exist. (Possibly, it also causes access to this structure object). I think this is the strongest argument for why you cannot ‘cast into existence’ a structure or union object, even though the result of the pointer conversion is known (if implementation-defined).

Consensus Interpretation on unions (GCC and LLVM)

In general problems arise can arise when compiling with GCC and where an access to a struct member occurs where the there is no struct object. That is to say, if we do:

a = p->m;

… then this is seen as constituting an access to the object pointed to by ‘p’ rather than just the member object ‘m’, and the access to this object is bound by the strict aliasing rules.

Making sense of the union rules is difficult. The intent of footnote 82 is clearly to allow type-punning through a union, but if the compiler must assume that two arbitrary pointers may alias due to pointing at different members of the same union, then pretty much all type-based alias analysis (TBAA) goes out the window and the strict-aliasing rules (C99 6.5p7) are pointless. The GCC documentation for the -fstrict-aliasing switch clearly states that:

type-punning is allowed, provided the memory is accessed through the union type

I.e. you can use a union for type punning but access via both types must be “through the union type”, i.e. they must be performed as member accesses. LLVM seems to follow the same restriction (shown by testing behavior with some simple test cases; LLVM 3.6.1).

6.5.2.3 contains a paragraph seemingly legitimising a certain limited form of type punning, where structure types having a “common initial sequence” are punned via a union. C99 requires that the union declaration be visible, implying that (unlike the GCC requirement) access need not be through the union type (since if access were required to be through the union type, then the union type would necessarily be visible and it would not be necessary to require this explicitly). However, GCC does not appear to support even this limited form of type punning unless accesses are via the union type. (It’s worth noting that it’s not clear whether de-referencing a pointer to an inactive union member is legal anyway – does the member object actually exist? Would attempting such result instead in accessing the active member, thereby potentially violating the aliasing rules?).

Type punning via ‘char *’ and ‘memcpy’

6.5p7 does allow access to any object via a “char *” pointer and via the “memcpy” and “memmove” standard library functions. Naively, you might think this was sufficient to allow type punning (and it does allow a limited form). For instance, suppose we have a pointer (“fp”) to a “float” value which we want to reinterpret as an “int”:

int r;
memcpy(&r, fp, sizeof(int));

There is little doubt that the above code does not invoke undefined behavior, assuming that sizeof(float) >= sizeof(int), though the value of r is not specified and presumably may be a trap value. If we modify the value of “r” we should in theory then be able to copy the representation back to the original location:

r = ...;
memcpy(fp, &r, sizeof(int));

Again this should be fine. The problem is that we then attempt to read the floating point value (assume we can guarantee somehow that it is not a trap representation):

float f = *fp;

Surprisingly, this last line could fall foul of the strict aliasing rules (6.5). Whether this is the case depends on whether the “*fp” object has a declared type or not. According to the standard, copying an “int” actually alters the effective type of the destination object if it has no declared type. In the case that “fp” points to a dynamically allocated object (i.e. allocated via malloc()) then it doesn’t have a declared type. This makes it very difficult to perform type punning on any object with no declared type. We could copy via a temporary with the correct declared type:

r = ...;
float ftemp;
memcpy(&ftemp, &r, sizeof(int));
memcpy(fp, &ftemp, sizeof(int));
float f = *fp;

Seeing as we now copy from an object with a declared type of “float”, the effective type of “*fp” remains (or becomes) “float”. However, this is extremely clunky, and it requires us to know what the expected type of the object (if we have a “float *”, it seems reasonable to expect that the object it points to is a “float”, but what about if the original pointer is a “void *” or cannot otherwise reliably be assumed to point at a particular type?)

The whole “declared type” issue is also clouded by the existence aggregate types. If the destination object is a member object (i.e. has declared type) inside a dynamically allocated aggregate object  (no declared type), what happens? (Consider especially the case of unions, or structures having a single member). Again the definition of “access” becomes important, and the text in the standard is insufficient.

Conclusion

I have analysed the C99 standard in regards to aliasing reasonably thoroughly, more so than any other web resource that I’m aware of. The standard is unfortunately very unclear and we are forced to make various assumptions if we want to interpret it in anything like the common understanding (as implemented by compilers). Unresolved issues include:

  • the proper definition of, and lifetime of, objects; In particular:
    • Does a union member object exist when it is not the active member (i.e. not the last assigned-to member?) If it does, what is the effect of accessing it; does it have a specified value? Is behaviour defined if the object is modified?
  • what is meant (in 6.5p7) when it refers to accessing the “stored value” of an object, in terms of how this contrasts with accessing the object itself.
  • whether or not access to an object implies access to subobjects; and if so, what happens when the subobjects hold trap representations (i.e. what in this case is the significance of the requirement that structure/union objects have no trap representation)
  • whether or not access to an object implies access to any containing objects; and if so, why such access is forbidden by 6.5p7
  • If a value is copied (as per 6.5p7) using memcpy/memmove/character array, then what determines “the object from which the value is copied” if the region copied matches more than one object (such as a struct or union with a single member occupying the entire aggregate).
  • Is it really intended that copying a value (as per 6.5p7) into an object with no declared type always changes the effective type of the object to that of the source object? (If so, this effectively precludes type punning in a dynamically allocated object).

I’d like to see these issues addressed in a future version of the standard.

GCC, strict aliasing, C99

Big fat note 28/02/10: A deeper analysis of the standard forced me to change my opinion on this particular topic after writing this blog entry. In other words, I was wrong (about several things) in this blog post. If you want info on C99 aliasing, Don’t read this post; read the other one instead.

I recently wrote about how Mysql was breaking the C99 strict aliasing rules. In a related topic, there’s an interesting discussion thread which recently started on the GCC developers mailing list. It started with a post by Joshua Haberman where he questions the interpretation of C99’s so-called strict aliasing rules in GCC. It’s an interesting point that he raised.

Essentially, the strict aliasing rules say that you must access memory (“object”) only by references (“lvalue expressions”) to compatible types, or by references to certain other types (such as character types). Joshua points out that one of the other types allowed is an aggregate or union type containing a member of the original type. On first examination this seems quite natural; if you alter a whole structure (or union) object then any pointer to one of its members should not have its value cached in a register (for instance). However, it has the implication that accessing a variable by first casting it to a structure or union is allowed, even though it is probably a rare case.

The ongoing discussion, though, is hilarious:

Richard Guenther, in the first reply, complains that Joshua is “literally interpreting” the standard. This is ridiculous – surely the standard is meant to be literally interpreted; that’s why it is written in such precise (and mind-numbingly verbose) language (but – keep reading).

Andrew Haley, in the next reply, insists (contrary to Richard) that GCC complies with the standard as worded and doesn’t even attempt to explain how Joshua (and Richard) might have gotten it wrong. He then repeatedly attempts to claim that Joshua’s understanding of the standard is incorrect, using irrelevant sections of the standard to try and prove this. After Erik Trulsson steps in to support Andrew, Andrew then has the gall to call Joshua stubborn! (Incidentally, Erik’s argument is irrelevant – because the standard does define what happens when you dereference a valid pointer, and the rules for pointer conversion in 6.2.3.2 at least suggest that pointer would be valid in the example, as long as alignment requirements were met).

[Update: the defect report mentioned in the next paragraph isn’t really so relevant; I managed to conflate two separate issues when writing this entry; see clarification in the comments below. My main point still stands].

The discussion continues on and a relevant defect report (for the standard) is mentioned. The standard committee seem to have come to the wrong conclusion on this defect; that is, they say that example 1 in the defect report invokes undefined behavior when reading the standard makes it very clear that this is not the case (they do acknowledge the need for a wording change). This would seem to mean that the committee actually thinks the standard is wrong. That’s fine except that… it’s a standard. I mean, you’ve put it out there; that’s then set. You can revise it later, but to turn around and say “oh no, we didn’t mean that, no, ignore the wording, you’ve got to follow the spirit…” is just ridiculous. The whole point of having a standard is to be perfectly clear, and lay these things out without any ambiguity.

To put the issue in layman terms, the standard – as it is worded – allows re-using allocated memory to contain different types. That is, you can allocate some memory using malloc(), store something in it, use it, and then when you’re done with it you can store something else (even with an incompatible type) and re-use the memory. The argument on the GCC discussion list is that this should not be allowed (and in fact, GCC currently optimizes as if it is not allowed) because it interferes with optimization. [not correct; GCC allows this, though the C99 committee seem to think it needn’t].

It’s true that the standard as written disallows certain useful optimizations. Unless the compiler can prove for certain that two pointers don’t alias, it has to assume that a read from one pointer cannot be moved past a write to another, even if the types are incompatible. So, the reordering of R(p)/W(q) to W(q)/R(p) would not be allowed – though the reverse would be. Also, it doesn’t prevent the compiler from caching the result of pointer reads in registers if the only intermediate writes are through incompatible types – so, it cannot be said that the aliasing rules are pointless, even as they are presently written.

What really gets me is that the committee (in their conclusion on the defect report) seem to agree with the GCC interpretation. To turn around now and say that every one who wrote a program based on the wording of the standard got it wrong is crazy, particularly when there is a clear reason why you might want to rely on the original wording (so you could re-use allocated memory).

(I mean sure; the standard might not say or imply what it was intended to. If so, fix it in a later standard – C1X; and to be fair to the committee, maybe that’s what they’re intending. But don’t re-write history. As a programmer I need to be able to rely on what the standard says at the time that I read it).

As far as I’m concerned, this really is a bug in GCC. I mean, it does not follow the standard as written. Following the intention of the standard authors is irrelevant.

Mysql, and C99 aliasing rules. A detective story.

When running mythfrontend after upgrading Mysql (to 5.1.40) I got:

Driver error was [2/1210]:
QMYSQL3: Unable to execute statement
Database error was:
Incorrect arguments to mysql_stmt_execute

After a bit of Google-ing I found a Myth bug ticket which seems to refer to the same (or a very similar) problem. The problem is apparently due to a Mysql bug and has something to do with an invalid cast and gcc optimizations, which makes it easy enough to work around – except of course that the bug was apparently fixed many versions ago.

However, conveniently, changing my configure options for Mysql (from -O3 to -O1 in my CFLAGS and CXXFLAGS) does make the problem go away. I suspected alias analysis to be the problematic optimization, so I checked whether “-O3 -fno-strict-aliasing” was enough – and it was. I resolved to track down the bug.

I started by compiling Mysql twice, in two different trees – one with my regular options and one with -fno-strict-aliasing. Then, I copied object files from the first tree to the second and repeated “make install” until the problem appeared (actually, I had to copy the .o file, as well as the stupidly hidden .libs/*.o file that libtool creates, and then touch the *.lo file; the build system is broken enough that just changing an .o file won’t cause the library to rebuilt). Fortunately I found the right file first try: it was libmysql.o (or rather, libmysql.c).

Unfortunately, libmysql.c is over  5000 lines long, and I didn’t know which function was causing the problem, so I had no idea where to look in the code. I proceeded by compiling the file both with and without “-fno-strict-aliasing” and with the “-save-temps” option so I could grab the generated assembly (i.e. libmysql.s) for each case. Then I did a manual binary search by hand-editing functions from one file into the other, assembling it and re-building the library until I was able to find which function had the problem. It turned out to be “cli_stmt_execute”. At that point I decided to look through the generated assembly and see whether I could spot where it didn’t seem to match up with what was expected.  First, though, I got the size of the function down by marking various other static function that it was calling with __attribute__((noinline)), each time re-testing to make sure that the problem still occurred.

Anyway, enough with the suspense. Here is the problematic line of code (2553 in libmysql.c):

    for (param= stmt->params; param < param_end; param++)
        store_param_type((char**) &net->write_pos, param);

The problem of course is the cast “(char **)” which casts to an incompatible type, thus the value net->write_pos is accessed as a “char *” when it is declared as a “uchar *” (“uchar” is a typedef for “unsigned char”). Now “char” and “unsigned char” are compatible types, but “char *” and “unsigned char *” are not. That means, if you access some value via a “char *” reference then you must always do so and never through any incompatible reference including “unsigned char *”. Doing so breaks the strict aliasing rules.

In this case, store_param_type is actually meant to modify net->writepos:

    static void store_param_type(char **pos, MYSQL_BIND *param)
    {
        uint typecode= param->buffer_type | (param->is_unsigned ? 32768 : 0);
        int2store(*pos, typecode);
        *pos+= 2;
    }

But, as gcc inlines the call, it effectively sees the “*pos+=2” as updating a location that must be different to the source location (net->write_pos) seeing as the types are incompatible, in other words, it effectively sees “*pos = *otherpos + 2” (where otherpos is an “unsigned char **”). Then, it sees that *otherpos must be constant for the duration of the loop, seeing as a value of a compatible type is not written. This means it can defer the statement and collapse its multiple instances (one per iteration of the loop) to a single instance. The result? kaboom.

The solution? Change the type of the “pos” parameter for store_param_type() from “char **” to “unsigned char **” (and remove the cast from the call in cli_stmt_execute()). The function isn’t used anywhere else so this makes sense even if it wasn’t causing a bug.

And the lesson to be learnt from this: be very careful with pointer casts. Avoid them whenever possible. And learn, I mean really learn the aliasing rules. There are way too many pieces of software out there written by people who do not understand them and the result is a buggy software.

Mysql bug filed.