C99/C11 errata, etc

An unofficial collection of errata and general problems with the C99 and C11 standards.

{C99,C11} 3.1 Weak definition; inconsistent usage of term

It is probably a bad idea to redefine a word with such common usage as “access”. The definition here is not very clear:

to read or modify the value of an object

Is the access then to the value, or to the object? I.e. is it correct to say that I “access an object” to read or modify its value, or do I “access the value of an object”? Paragraph 4 suggests that the former is correct, but 6.5p7 talks about “access to the stored value of an object”.

Paragraph 4 also implies that an expression accesses an object (in “expressions that are not evaluated do not access objects”). Surely object access only occurs during the evaluation of an expression (it is not the expression that accesses the object, regardless of whether the expression is evaluated or not; rather, it is that evaluation of an expression can cause access to an object, and saying that “an expression does not access an object” is redundant). But this is mostly a nit-pick.

{C99} 5.1.2.3p5: Nonsensical requirements (fixed in C11)

The paragraph reads:

When the processing of the abstract machine is interrupted by receipt of a signal, only the values of objects as of the previous sequence point may be relied on. Objects that may be modified between the previous sequence point and the next sequence point need not have received their correct values yet.

However, 7.14.1.1p5 restricts signal handlers to performing write access on objects of static storage duration. A signal handler cannot access objects of automatic storage duration (other than those declared in the signal handler itself or in functions that it calls) because there is no means for it do so. Because it cannot read objects of static storage duration nor access preexisting objects of automatic storage duration, there is no way for it to access objects of allocated storage duration. Therefore, the above paragraph makes no sense; it’s impossible to “rely on” a value that you cannot access anyway.

{C99,C11} 6.2.5p20: Incomplete description of union semantics

“A union type describes an overlapping nonempty set of member objects” (nit-pick: what is meant by “overlapping”?). However, footnote 37 (which, admittedly, being a footnote, is not normative) claims that “an object with union type can only contain one member at a time”, which seems contradictory. Note that 6.7.2.1(C11/C99) probably is more correct with “The value of at most one of the members can be stored in a union object at any time”. What is not clear is how a member is stored into the union; i.e. is it possible to take a pointer to a union member and store to it even if the union member is not active? (this either stores to the member and makes other members inactive, or is an aliasing violation).

{C99,C11} 6.3.2.1 Confusion between expressions and values which are the result of expression evaluation.

From p1, “An lvalue is an expression …” – ok. An lvalue can “designate an object when it is evaluated” – also ok. But then in p2: “Except when it is the operand of [various], an lvalue that does not have array type is converted to the value stored in the designated object (and is no longer an lvalue)“. To perform this conversion could require evaluation of the lvalue, due to “… designate an object when it is evaluated“. For example, in the expression:

a[5+5]

The expression ‘5 + 5’ would need to be evaluated before the “designated object” could be determined and thus the conversion from p2 performed (note that while ‘a’ has array type, the expression as a whole does not). After the conversion, the lvalue apparently becomes a value and therefore is no longer an expression, but this is inconsistent with various sub-chapters of 6.5 (see 6.5.x below) which seem to suppose that operands are expressions (unless we suppose that an already evaluated value/object designation is also an expression, though there is nothing to explicitly support this).

More importantly, it’s not clear when the conversion in 6.3.2.1 must be applied. The intention is presumably that the conversion need not be applied if the lvalue would not otherwise need to be evaluated, or putting it another way, the conversion is applied to every lvalue which is evaluated and which appears in an appropriate context (as listed in 6.3.2.1). But this is not stated.

{C99,C11} 6.3.2.3 Pointer conversion is not very well defined

It doesn’t seem like pointer conversion is defined well enough to allow that casting from a “void *” to another pointer type necessarily points at the same object – specifically it doesn’t seem to require that the resulting pointer need “compare equal” with the original pointer. This would make malloc etc. unusable (except that malloc specifically allows that its return value can be assigned to a pointer of a different type and will then still point at the allocated object, kind of; see 7.20.3); it certainly seems to preclude development of custom memory management routines.

Similarly, performing a cast from one pointer type to another (eg. a “struct A *” to a “struct B *”) does not, from 6.3.2.3 alone, guarantee that the resulting pointer will point at the same object, even if alignment requirements are met, since this is not explicitly required (except in limited cases as per 6.7.2.1p13). On the other hand if the result of conversion is not specified, it’s not clear how the “resulting pointer […] correctly aligned” requirement in paragraph 7 is at all in the control of the programmer (perhaps the requirement is meant to apply to the operand rather than the result).

The best-defined of all pointer conversions seems to be that to the “char *” type. In this case the result “points to the lowest addressed byte of the object”. It’s unclear however whether we can convert a pointer to an object into a pointer to a sub-object at its beginning, and back, by first casting to ‘char *’. Eg:

struct a { int a1; int a2; };
struct b { struct a b1; int b2; };

struct b bb;
struct a * aptr = &bb.b1;
char * cptr = (char *) aptr;
struct b * bptr = (struct b *) cptr;  // XXX
char * cptr2 = (char *) bptr;

… we know that cptr and cptr2 must compare equal since ‘when converted back again, the result shall compare equal to the original pointer.’ Can we be certain that bptr and aptr point into the same object (either the ‘bb’ object, or its first member)? This is intuitively correct but not, it seems, mandated, unless ‘suitably converted’ also covers an intermediate conversion to another type (or at least to ‘char *’ type).

Note that it is not mandated that pointers which “compare equal” must still “compare equal” when they are both subject to the same type conversion (even though they must necessarily “compare equal” again if converted back to the original type). So for instance:

int a = 0;
void * ap = &a;
void * bp = &a;
// It is NOT guaranteed that ap == bp, nor that &a == ap or &a == bp!!

However, comparison of void pointers with other pointers is allowed (6.5.9), which only makes sense if it can be done meaningfully. It is strange that the specification seemingly goes to great lengths to avoid saying that a converted pointer generally points to the same object (instead using the “cast to and back will compare equal” strategy in various cases), yet also contains paragraphs which are pointless unless this is the case.

Missing definitions for “address” and “points to”

The term “address” does not appear to be properly defined anywhere in the standard. The expression “points to” is only indirectly (and vaguely) defined.

{C99} 6.5.x Confusion as to whether operands are expressions or values, and when/if evaluation of operands must be performed

(The issue explained here spans several subchapters).

For example, evaluating the expression ‘5*3 + b’ requires that ‘5*3’ and ‘b’ are first evaluated and replaced with their respective values. However, this requirement is, quite critically, never explicated (though C11 in p1 adds “The value computations of the operands of an operator are sequenced before the value computation of the result of the operator“, which arguably suffices).

See for example 6.5.6p4 and p5 regarding the ‘+’ operator:

If both operands have arithmetic type, the usual arithmetic conversions are performed on
them. The result of the binary + operator is the sum of the operands.

If the operands are expressions, what does it mean to sum them? What is presumably intended is either that the value of the operands be summed (which would require evaluation of the operands), or otherwise that the operands have first been converted to their respective values (by means of evaluation), and so the operands are in fact values and not expressions. Consider on the other hand the “conditional” operator, 6.5.15p4:

The first operand is evaluated; there is a sequence point after its evaluation. The second operand is evaluated only if the first compares unequal to 0; the third operand is evaluated only if the first compares equal to 0; the result is the value of the second or third operand (whichever is evaluated), converted to the type described below.

Here, it has been made explicit that the operands must (or must not be) evaluated, and also that the result is the value of the chosen operand and not the operand itself. This is inconsistent. It should be stated that evaluating an operator expression requires first evaluating the operands, for those operators where this is true.

{C11,C99} 6.5p6 Unintended(?) differentiation between heap-allocated objects and objects that are variables (have declared type)?

It is stated that “if a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type” of the object becomes the type that was copied from. This is different for when the object copied to does have a declared type. It is not clear what the rationale behind this difference is. It disallows, for instance, allocating space for an int via malloc(), and then copying a float’s byte representation into it, and then reading it via a pointer-to-int – because the object type is now “float”. The same thing would be allowed if the space were allocated instead by declaring a local variable, since this has a declared type, and copying the float value does not therefore change its type.

It is not clear what it means to “copy a value” (in “If a value is copied into an object …“). For one thing (nit-pick) this doesn’t make logical sense (“value” is defined as “precise meaning of the contents of an object“, it is hard to see how a meaning can be copied); it should probably instead say “if the contents [or perhaps representation] of an object are copied into another object”. Also, what if only part of the source object is copied? What if the source object is copied to a non-zero offset within the destination object? Given that a ‘char *’ can point into multiple (overlapping) objects, such as a member object of a struct object, what is type being copied from in this case?

Not possible to dynamically allocate array?

Is it possible to create an array in an allocated (malloc’d) region, without copying an existing array? 6.5p6 implies that it is not; if I store individual array elements, the effective type of the object can at best be the element type. (See 6.7.2.1{C99p17,C11p18} however – it is possible to use a structure type with a flexible array member). It seems unlikely that this was intended.

{C99,C11} 6.5p7 Multiple problems

(For an in-depth discussion see this entry.)

Nitpick: This paragraph implies that access is to the “stored value” of an object, rather than the object itself; this is contradictory to 3.1p4.

Nitpick: The word “expression” in “lvalue expression” is redundant, as an lvalue is defined to be a kind of expression (6.3.2.1p1).

Note that “An object shall have its stored value accessed only by an lvalue expression that …” should be read as meaning that expressions which are not lvalues, or which do not satisfy the other given constraints, cannot access the stored value of an object (which is the strictly the correct grammatical interpretation), rather than that the following restrictions apply only to accesses that occur using lvalues.

The entire text “An object shall have its stored value accessed only by an lvalue expression that has one of the following types:” presumably is supposed to mean that “If an lvalue is evaluated, and the lvalue designates an object (as per 6.3.2.1), then the lvalue shall have one of the following types:”. This is a bit of a nit-pick, but it’s not clear how an expression (which, by 6.5p1 “is a sequence of operators and operands that specifies computation of a value, or that designates an object or a function, or that generates side effects, or that performs a combination thereof“) can itself access a value – rather, it is the evaluation of the expression that causes access. The expression designates an object (6.5p1), it should not now be referred to as accessing an object. Furthermore an expression can contain sub-expressions and the consensus understanding is that the sub-expressions can access objects with a type different than that of the containing expression.

In the bulleted list, access is allowed by “an aggregate or union type that includes one of the aforementioned types among its members” presumably because accessing an aggregate (array or structure) object is considered to access all its member objects. However, if one of the member values was a trap representation, then by 6.2.6.1p5 the behavior of such an access would be undefined. This is in contradiction with the spirit of 6.2.6.1p6, which says that “The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation“; that is, it hardly matters whether structure object values may or may not themselves be trap representations if accessing such a value might still invoke undefined behavior.

By the definition of access (3.1) it seems implied that accessing (and in particular storing to) a structure member implies access to the structure value itself, since to modify the value of the member object necessarily would also modify the value of the structure object; however, this is disallowed by 6.5p7. This is a clear case of error or omission. I would suggest that:

A. Definition of “access” is amended to make clear that access to an object does not also access the containing object, and vice versa; and
B. The ‘aggregate or union type’ clause is removed from 6.5p7
C. Aliasing restrictions, if they really need to be specified at all, be specified separately (and restore the ‘aggregate or union type’ clause for their specification only).

6.5.2.3{C99p5,C11p6) “Common initial sequence” shenanigans

In “it is permitted to inspect the common initial part of any of them anywhere that a declaration of the complete type of the union is visible” – what is meant by inspect? (It’s fairly intuitive, but not formally defined). It’s worth noting that this paragraph makes certain translations of some code invalid depending solely on the visibility of a completed union type, and that GCC (4.8.4) does not honor this requirement (it doesn’t allow inspection of the “common initial sequence” even when the union declaration is visible, unless the inspection is performed via the union object i.e. same rules as for type punning).

6.7.2.1{C99p13/14,C11p15/16} It is not defined what “suitably converted” means.

(In “A pointer to a structure object, suitably converted, points to its initial member”).

It’s probably safe to assume that in this sentence ‘suitably converted’ means ‘when converted to a suitable type’ and that 6.7.2.1p13 is meant to restrict the result of such conversion rather than be non-normative; however, this should really be explicit, and seems to belong in 6.3.2.3 rather than here.

{C99,C11} 6.7.3p6 Unlimited license for implementation to redefine “access” to volatile objects?

The wording allows that the implementation may define what “constitutes access” to a volatile object:

What constitutes an access to an object that has volatile-qualified type is implementation-defined

 It is not clear what is really meant by this nor why it is necessary. Surely “what constitutes access” to an object (volatile or otherwise) is defined by 3.1 (terms and definitions: access), where it states that access is an execution-time action “to read or modify the value of an object” (and notes that “expressions that are not evaluated do not access objects”).

5.1.2.3 states that “Accessing a volatile object …” is a side effect, which is a change “in the state of the execution environment”. In p3, “An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object)”, which implies that 6.7.3p6 does not allow implementations to define whether or not expressions need to be evaluated. This leaves the possibility that 6.7.3p6 only allows for a definition of what kinds of expression (and/or in what context) actually necessitate that access be performed when they are evaluated.

The implementation-defined behavior for Gcc is that an expression-statement “*p;” where ‘p’ is a volatile qualified pointer may or may not cause access depending on the type of ‘p’ (scalar types yes, most other types no). This implies that the Gcc folks believe that 6.7.3p6 allows for defining what kinds of expression, and/or in what context, actually necessitate that access be performed when they are evaluated. This seems to be an invalid interpretation however, because:

6.5.3.2p4: “The unary * operator denotes indirection. If the operand points to a function, the result is a function designator; if it points to an object, the result is an lvalue designating the object.”

and

 6.3.2.1p2: “Except when it is the operand of the sizeof operator, the unary & operator, the ++ operator, the — operator, or the left operand of the . operator or an assignment operator, an lvalue that does not have array type is converted to the value stored in the designated object (and is no longer an lvalue)”

Which taken together and in further conjunction with 5.1.2.3p3, imply that an expression-statement “*p;” must require access to be performed because:

  1. (6.5.3.2p4) The result of the expression is an lvalue and
  2. (6.3.2.1p2) This lvalue must be converted to the value stored in the designated object and
  3. (5.1.2.3p3) The expression must be evaluated, since it has the side-effect of accessing a volatile object (necessarily, due to point 2).

Putting it another way, it doesn’t make sense to mandate that expressions causing access to a volatile object must be evaluated even if their value is not used, if it is entirely up the implementation as to what expressions cause access to a volatile object.

I suspect what was really intended is that it should be implementation defined as to what effect access to a volatile should have, in terms of the implementation, along the lines of “any single access to a volatile shall be represented in the translated program by a single read of or write to the memory address used to contain the volatile object”. In other words, they are trying to address the problem that their execution model relies on an “abstract machine” so that “volatile” essentially has no meaning except on an actual, concrete system.

One thought on “C99/C11 errata, etc

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s