Sakura 2.4.0 build failure

March 18, 2011

I was just trying to build the Sakura terminal emulator. More-or-less following the instructions exactly, after running “cmake .”, I then ran “make” at which point I got the following:

Unknown option: u
make[2]: *** [CMakeFiles/man] Error 1
make[1]: *** [CMakeFiles/man.dir/all] Error 2
make: *** [all] Error 2

Yeah, it’s not exactly the most helpful output, is it? This is what I hate about CMake and non-standard build systems in general. Eventually I figured out the problem is in the file “CMakeFiles/man.dir/build.make” where the target “CMakeFiles/man:” calls the “pod2man” utility (from that almighty piece-of-crap scripting language, perl) with a “-u” argument which “pod2man” doesn’t seem to recognize (perl 5.8.8). Removing that makes the build complete.

 


Schwartz did not. Innocent until proven otherwise.

August 14, 2010

OSNews is running an article with this headline:

De Icaza: Sun’s Schwartz Pitched Google Lawsuit to Oracle

Seriously. De Icaza is saying that Schwartz had actually pitched the idea of a a Google lawsuit to Oracle; but he has nothing to back it up.

His report has been confirmed by James Gosling …

No it hasn’t. James Gosling’s blog post says nothing about J. Schwartz whatsoever. In this community I’d expect a bit more constraint. Icaza is full of crap in this. All you have to do is read James’ blog post. He, repeat, says nothing about Schwartz. Why is De Icaza pointing fingers?

Icaza specifically says:

So now we know that Jonathan shopped Sun with a big “Sue Google” sign. So much for his visionary patent defense against Apple and of course this jewel: …

Ehh, wrong. Icaza, you really are an idiot. Read James’ friggin’ post. OSNews, you are also idiots for reporting this.

Update 2012-05-07: So Google’s star witness in Oracle vs Google is… Jonathan Schwartz. So much for De Icaza’s credibility.


C99 revisited

February 26, 2010

[Edited again, 2012-02-17. Simplified argument].

My previous posts (here and elsewhere) about C99′s so-called aliasing rules made statements about what those rules really mean that some people don’t agree with. It’s worth delving into the issue again, with critical reference to the actual standard. The rules are complex. Having examined the standard thoroughly, I now believe my original take was wrong (although the arguments made by others against it were mostly incorrect).

Status Quo

There are a few attempts to explain C99 aliasing in various places on the web. If you’re new to the issue, here’s a brief summary: C99 essentially requires that memory (an “object” in C99 parlance) is accessed via a consistent pointer type, so for instance if you access something via an “int *” (i.e. an lvalue of type “int”) then you shouldn’t then attempt to read it via a “float *”. The relevant section of the standard is 6.5 paragraph 7. There are some exceptions listed there to the general rule; importantly, you are always allowed to accesss memory via a character type (eg char *) and via memcpy() and memmove(). The contentious part is that you can access an object via an lvalue whose type is an aggregate type (structure or union) with a member whose type is that of the object being accessed. This was discussed in my previous blog entry; it essentially allows access to an int value (for example) via a pointer to a union with an int member, the contentious issue being whether or not the value must actually be a member of the union in question (more discussion will follow).

GCC relaxes the aliasing rules in one sense, that is, access to the same object is allowed via two incompatible types so long as the types are members of a union and that access is, in both cases, done through the union. This extension is widely used in open-source code, however I must stress this technique is not strictly conformant with C99 and is not necessarily portable to other, fully C99 compliant compilers; I will discuss it further below. Another way to access an object as another type is to copy its value to a variable of the other type via a “char *” or memcpy() (if the value is to be modified, the modification can then be copied back in the same manner). Certainly this might seem less efficient, but a clever compiler should be able to recognize what is being attempted and optimize it accordingly (at the time of writing, GCC doesn’t do this very well).

Analysing the standard

6.5p7 states that “objects can have their stored value accessed only by an lvalue expression” that has one of various types. Note that the word “expression” in this paragraph is redundant seeing as an lvalue is defined (in 6.3.2.1p1) as being a kind of expression.

Aside: One problem with the statement in 6.5p7 is that expressions can be compound, that is, an expression can be made up of other expressions (as in “expression1 + expression2″) and in this case the subexpression might be an lvalue of an appropriate type, but the compound expression might not be (and indeed might not even be an lvalue). Arguably, the compound expression still accesses the object and thus breaks the aliasing rules; however, it is doubtful that there exists any non-trivial program which abides by this interpretation, seeing as it seems to make writing meaningful programs very difficult. We have to assume then that if an object is “accessed by an expression” it does not mean the object is accessed also by a compound expression containing that expression, which seems unintuitive but which is not (as far as I can tell) directly contradicted anywhere in the standard.

Clearly 6.5p7 regulates acess to objects; so what is an object? The definition in the standard is that an object is a region of data storage, the contents of which can represent one or more values; it also notes that objects may have an associated type. Also, 6.2.5p20 states that a structure type describes a set of member objects. Although it is never explicated, there is certainly an implication here that objects can overlap, that is, a structure object overlaps its various member objects.

The fact that access is allowed through an aggregate type (containing a member with a compatible type) is really interesting, as it appears to give us a clue as to how “access” is to be interpreted in the case of overlapping objects. Section 3.1 defines access as reading or modifying the value of an object, however it is not clear if accessing a member object of an aggregate type also should be considered an access of the aggregate object, and vice-versa.

For example: in an expression such as “a.b”, what is the type of the lvalue through which access occurs? Is it:

  1. The type of “a” (let’s call it “struct A”) or
  2. The type of “b” (let’s call it “B”) or
  3. There are two accesses, one via the type of “a” and one via the type of “b”

For guidance we look elsewhere in the standard. In 6.5.2.3, member access is defined (as a postfix operator). The type of such an expression is certainly the type of the member, so this seems like it is an access of the member object  via its type – which is clearly allowed.

What about if we copy a structure directly however? For example, in the expression “s1 = s2″, where s1 and s2 are of the same union type, there is no explicit member access. It could however easily be argued that a struct object access also constitutes access to each of the struct members, and that is presumably why 6.5p7 allows access to an object via an expression which as “an aggregate type … containing one of the aforementioned types”. (it is also arguable that an expression such as “a.b”, where “a” is a struct type, access both the container object and the member object).

Interestingly enough, though, there seems to be an assumption that access to a member object by itself does not access a containing object. For instance, consider the following:

struct A {
    int a;
    int b;
} s_a;
s_a.a = 1;
s_a.b = 2;
int *p = &s.a;
*p = 3;

I have a lot of trouble believing that the code sample above could have undefined behaviour, however the only way that this code’s behaviour could be defined according to 6.5p7 is if accessing a member object does not also constitute access to a containing aggregate object. It is unfortunate that the definition of access seems to preclude this – since an access is to “modify the value of an object”, and surely modifying an aggregate member value also modifies the value of the aggregate object?

It’s worth noting that for the case of indirect member access (e.g. “(&a)->b”) the standard is a little more clear as to what is being accessed: In this case the value is that of “the named member of the object to which the first expression points” (i.e. it is the “b” member of the “a” object).

Anyway, we have reached what seems to be a reasonable understanding. Accessing an object does not constitute access to the containing object; however accessing a container object also accesses all contained members (bearing in mind that a union only ever “contains” one member at a time, as will be elaborated on below). With this interpretation of “access” in mind, 6.5p7 seems quite reasonable.

There’s just one problem.

6.2.6.1 discusses trap representations. Essentially a “trap representation” is an invalid value which causes undefined behavior when it is accessed or generated. Paragraph 6 contains this interesting snippet:

The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation.

The purpose of this seems to be to allow access to structures (and unions) which may have, for instance, uninitialized fields. Unfortunately, this doesn’t work if we assume, as we did before in order to reconcile with 6.5p7, that access to an object also constitutes access to all contained objects. Thus, even if the structure value itself is not a trap representation, accessing a structure object with a field whose value was a trap representation would still cause undefined behavior (according to p5).

It is unfortunate that the C99 standard is not clearer on the relationship between overlapping (aggregate and union) objects. I can see no good way to reconcile the text of the standard with the current consensus understanding of the language. About the best we can do is say that accessing a member object does not constitute access to the containing object, nor vice versa; this however makes part of 6.5p7 redundant.

What about unions?

Union types presumably exist as a convenient way to allocate storage suitable for one of various object types, but they are also commonly used (as discussed above) for accessing the contents of an object (with an effective type) via a different type (“type punning”). For instance I can store a value into a float member of the union and then access the int member of the union to interpret the representation as an int. Or… can I? Although GCC allows this (so long as all accesses occur through the union type) and many open-source programs make use of it, the standard states that the value of a union member other than the last one stored is unspecified (Annex J.1).

For instance, 6.2.6.1p5 states that accessing a value that is a trap representation results in undefined behaviour. However, if we apply the same rules that we inferred for struct types, then accessing a union access all its members… now, if any one them has a trap representation, then the behaviour is undefined! Presumably, this is not intended, and in fact the standard does imply that only one member is accessed when accessing the union (as will be discussed).

What specifically does the standard have to say about unions?

  • 6.2.5p20, which tells us that a union has “an overlapping set of member objects”
  • Footnote 37 from 6.2.5p22, which says that “an object with union type can contain only one member at a time”
  • 6.2.6.1p6, which says that a union object may not be a trap representation, although its member objects may be
  • 6.7.2.1p14, which says that “the value of at most one of the members can be stored in a union object at any time”
  • Annex J.1 says that “the value of a union member other than the last one stored into” is unspecified.
  • In TC3, a footnote is added which specifies that accessing a member of a union other than the last one stored causes “the object representation” to be re-interpreted in the new type and specifically refers to this as “type-punning”. This seems to contradict Annex J.1.

The note in Annex J.1 seems to indicate that type punning via a union is not portable, but this is directly contradicted in Technical Corregendum 3.

Defect Report 236

It’s worth referring to C99 defect report 236 at this point. The report includes two examples of code with aliasing concerns and asks whether they are conforming. I agree with the response for example 2 (it is impossible to store to the member of a union when that member isn’t active, and the union is the declared and therefore the effective type; therefore the store must be via either the active type [i.e. a pointer to the active member] or the union type). Example 1 on the other hand is clearly conforming; yet, the committee response is that both programs “invoke undefined behaviour”. It is acknowledged that the wording of the standard requires changing and yet I see no changes to 6.5p7 in any later drafts.

Note that the ability to change the effective type of an object (which has no declared type) by storing a value, as implied by 6.5p7, is useful for re-using allocated storage; that is, it becomes possible to store a value of one type and later use the same storage for a value of another type. This is a potentially worthwhile usage. I have been informed by a GCC developer that GCC supports this usage.

It is certainly odd that the DR236 is closed, and yet the response clearly contradicts the standard, yet there is no resolution. Without such a resolution it can only be assumed that the response was drafted and was neglected to be amended when the report was closed. To take any other standpoint is to stand in contradiction of the wording of the standard.

Note also that the wording doesn’t disallow all type-based alias analysis, and arguably doesn’t prevent most useful type-based alias analysis. For instance, it is still possible to re-order read operations.

Can an object be accessed by first casting to an aggregate or union type?

The question brought up by Joshua Haberman on the GCC developers’ mailing list (which I discussed previously) was whether a variable value can be accessed by first casting its address to a union pointer where the union contained the original type as a member. 6.5p7 would seem to allow this; the only two legitimate arguments against this were that alignment requirements need to be met (it was pointed out that this could be ensured by other means) and that it might not be valid to dereference a pointer obtained by performing such a cast. This latter argument was brought up by Andrew Haley. Initially I was skeptical, but to be fair we must consult the standard document and see what it says in regards to pointer conversion and dereferencing.

6.5.3.2p4 states that if an invalid value has been assigned to a pointer which is dereferenced, the behaviour is undefined. Given that this seems almost tautological, it is strange that it explicitly requires that the invalid value has been assigned rather than just that the value is invalid. For instance, if I obtain a pointer via a cast (or some other operation) then surely the pointer can be invalid without an assignment of that invalid value ever having taken place, and surely its dereference should still result in undefined behaviour. (Really, in this case the behaviour is still undefined by virtue of it… well, not being defined).

6.3.2.3 states all the ways in which pointers can be converted to different types. Amongst other things, it ensures that we can convert a “T *” (for any type T) to a “void *” and back and guarentee that the “result will compare equal” to the original pointer. 6.5.9p5 adds that pointers which compare equal will point to the same object (including an object and a subobject at its beginning). 6.3.2.3p7 allows us to convert to any pointer type (not just “void *”) and back, yielding an equal pointer, so long as alignment requirements are met. It does not however say what the resulting pointer actually points at, except in the case of the converted-to type being “a pointer to character type” where the result points at the first byte of the object. 6.7.2.1p13 suggests that converting between a pointer to a struct type and a pointer to the type of its first member yields a pointer to the first member (i.e. the ‘expected’ semantics) and vice versa.

That is enough information to think that we should be able to take a pointer to a structure’s first member (named “m”, of type “M”) and convert it to a pointer to the structure itself, safely, and dereference that pointer.

Aside: why does 6.5.9p6 assert that two pointers can compare equal if one points “one past the end of an array object” and the other points to “the start address of a different array object” [which happens to immediately follow the first in the address space]? Why isn’t it enough to assert that it point at the start address of any other object at that location? As worded, this would force compilers not to allocate non-array objects just beyond the end of an array, except that it could probably be argued that two pointers in this situation “point to the same object” anyway). (ah – there is already a defect report #215 about this, though Technical Corregendum 3 doesn’t fix it).

Having determined that we can correctly convert a pointer-to-member into a pointer-to-structure, the next question is: what happens if we do so, and dereference the result, when there isn’t actually a structure object there? (replace “structure” with “union” as you please). This is essentially what Joshua Haberman was asking about. Considering all the above, I find I must reconsider my original position: that is, you cannot dereference a “T *” if there is no “T” at the pointed-to location, even if the members of T are there. This is essentially due to definition: dereferencing a pointer accesses the value of an object, which is clearly not possible if there is no object.

Conclusion

I have analysed the C99 standard in regards to aliasing reasonably thoroughly, more so than any other web resource that I’m aware of. The standard is unfortunately very unclear and we are forced to make various assumptions if we want to interpret it in anything like the common understanding (as implemented by compilers). Hopefully, the next revision of the standard will improve. Come on guys – get it right!


Mysql, and C99 aliasing rules. A detective story.

October 25, 2009

When running mythfrontend after upgrading Mysql (to 5.1.40) I got:

Driver error was [2/1210]:
QMYSQL3: Unable to execute statement
Database error was:
Incorrect arguments to mysql_stmt_execute

After a bit of Google-ing I found a Myth bug ticket which seems to refer to the same (or a very similar) problem. The problem is apparently due to a Mysql bug and has something to do with an invalid cast and gcc optimizations, which makes it easy enough to work around – except of course that the bug was apparently fixed many versions ago.

However, conveniently, changing my configure options for Mysql (from -O3 to -O1 in my CFLAGS and CXXFLAGS) does make the problem go away. I suspected alias analysis to be the problematic optimization, so I checked whether “-O3 -fno-strict-aliasing” was enough – and it was. I resolved to track down the bug.

I started by compiling Mysql twice, in two different trees – one with my regular options and one with -fno-strict-aliasing. Then, I copied object files from the first tree to the second and repeated “make install” until the problem appeared (actually, I had to copy the .o file, as well as the stupidly hidden .libs/*.o file that libtool creates, and then touch the *.lo file; the build system is broken enough that just changing an .o file won’t cause the library to rebuilt). Fortunately I found the right file first try: it was libmysql.o (or rather, libmysql.c).

Unfortunately, libmysql.c is over  5000 lines long, and I didn’t know which function was causing the problem, so I had no idea where to look in the code. I proceeded by compiling the file both with and without “-fno-strict-aliasing” and with the “-save-temps” option so I could grab the generated assembly (i.e. libmysql.s) for each case. Then I did a manual binary search by hand-editing functions from one file into the other, assembling it and re-building the library until I was able to find which function had the problem. It turned out to be “cli_stmt_execute”. At that point I decided to look through the generated assembly and see whether I could spot where it didn’t seem to match up with what was expected.  First, though, I got the size of the function down by marking various other static function that it was calling with __attribute__((noinline)), each time re-testing to make sure that the problem still occurred.

Anyway, enough with the suspense. Here is the problematic line of code (2553 in libmysql.c):

    for (param= stmt->params; param < param_end; param++)
        store_param_type((char**) &net->write_pos, param);

The problem of course is the cast “(char **)” which casts to an incompatible type, thus the value net->write_pos is accessed as a “char *” when it is declared as a “uchar *” (“uchar” is a typedef for “unsigned char”). Now “char” and “unsigned char” are compatible types, but “char *” and “unsigned char *” are not. That means, if you access some value via a “char *” reference then you must always do so and never through any incompatible reference including “unsigned char *”. Doing so breaks the strict aliasing rules.

In this case, store_param_type is actually meant to modify net->writepos:

    static void store_param_type(char **pos, MYSQL_BIND *param)
    {
        uint typecode= param->buffer_type | (param->is_unsigned ? 32768 : 0);
        int2store(*pos, typecode);
        *pos+= 2;
    }

But, as gcc inlines the call, it effectively sees the “*pos+=2″ as updating a location that must be different to the source location (net->write_pos) seeing as the types are incompatible, in other words, it effectively sees “*pos = *otherpos + 2″ (where otherpos is an “unsigned char **”). Then, it sees that *otherpos must be constant for the duration of the loop, seeing as a value of a compatible type is not written. This means it can defer the statement and collapse its multiple instances (one per iteration of the loop) to a single instance. The result? kaboom.

The solution? Change the type of the “pos” parameter for store_param_type() from “char **” to “unsigned char **” (and remove the cast from the call in cli_stmt_execute()). The function isn’t used anywhere else so this makes sense even if it wasn’t causing a bug.

And the lesson to be learnt from this: be very careful with pointer casts. Avoid them whenever possible. And learn, I mean really learn the aliasing rules. There are way too many pieces of software out there written by people who do not understand them and the result is a buggy software.

Mysql bug filed.


POSIX write()

October 3, 2009

Take a look at:

http://www.opengroup.org/onlinepubs/000095399/functions/write.html

What a mess – different requirements for regular files, pipes, and “other devices supporting non-blocking operation”.  For pipes, there are reasons for this (atomic writes), but I think they should have been abstracted out (why can’t other devices have atomic writes? Why isn’t there a general mechanism to determine maximum size of an atomic write for a given file descriptor?).

I also notice, and I think this is particularly stupid, that if write() is interrupted by a signal before transferring any data it returns -1 rather than 0. If it transfers some data before being interrupted, it returns the amount of transferred data. Why make a special case out of 0?!! This forces increased complexity in the application, which can not assume that the return from write is equal to the number of bytes actually written, and for which -1 is in almost any other case an abortive error.

Unfortunately there is no discussion of the topic I was most interested in: atomicity/ordering of reads and writes to regular files. Consider:

  1. Process A issues a “read()” call to read a particular part of a file. The block device is currently busy so the request is queued.
  2. Process B issues a “write()” request which writes to the same part of the file as process A requested.

The question is now: can the data written by process B be returned to process A, or must the data that was in the file at the time of the read() call being issued? Also, is it allowed that process B might see part of the data that process A wrote, and part of what was in the file at the time of the read request?

Read the rest of this entry »


C Compiler Benchmarks

April 28, 2009

Edit 1/5/09: I’ve kind of re-done this. I wasn’t happy with it and some of the results were incorrect. I’ve also added two new benchmark programs since the first version of this post.

Well gcc 4.4.0 has been released now (with a few bugs, of course…) and I thought it’d be interesting to do some benchmarks, especially as there are now a few open source compilers out there (gcc, llvm, libfirm/cparser, pcc, ack; I’m not so interested in the latter two just now). Now I don’t have access to SPEC so I cooked up just a few little programs specifically designed to test a compiler’s ability to perform certain types of optimization. Eventually, I might add some more tests and a proper harness; for now, here are the results (smaller numbers are better):

Results

Results

Thankfully, it looks like gcc 4.4.0 performs relatively well (compared to other compilers) in most cases. Noticable exceptions are bm5 (where gcc-3.4.6 does better), bm3 and bm6 (where llvm-2.5 does better, especially in bm6).

llvm-2.5 does pull ahead in a few of the tests, but fails dismally in the 4th, which tests how well the compiler manages a naive memcpy implementation. llvm-2.5 generated code is more than four times slower than gcc generated code in this case, which is ridiculously bad. If it wasn’t for this result, llvm would be looking like the better compiler.

Sadly, none of the compilers I tested did particularly well at optimising bm2. In theory bm2 run times could be almost idential to bm3 times, but no compiler yet performs the necessary optimizations. This is a shame because the potential speedup in this case is obviously huge.

It’s surprising how badly most compilers do at bm4, also.

So, nothing too exciting in these results, but it’s been an interesting exercise. I’ll post benchmark source code soon (ok, probably I won’t. So just yell if you want it).

Read the rest of this entry »


Follow

Get every new post delivered to your Inbox.