Updated C compiler benchmarks, Feb 2012

With several new compiler releases it’s time to update my compiler benchmark results (last round of results here, description of the benchmark programs here). Note that the tests were on a different machine this time and the number of iterations was tweaked so numerical results aren’t comparable.

Without further ado:

So, what’s interesting in this set of results? Generally, note that LLVM and GCC now substantially compete for dominance. Notably, GCC canes LLVM in bm4, where it appears that GCC generates very good “memmove” code, whereas LLVM generates much more concise but apparently also much slower code; also in bm8 (trivial loop removal – though neither compiler actually performs this optimization, GCC apparently generates faster code). On the other hand LLVM beats GCC quite handily in bm6 (a common subexpression elimination problem).

GCC 4.6.2 improves quite a bit over 4.5.2 in bm5 (essentially a common subexpression refactoring test). However, it’s slightly worse in bm3 and for some reason there is a huge drop in performance for the bm7 test (stack placement of returned structure).

The Firm suite, surprisingly, mostly loses ground with 1.20.0 and remains uncompetitive.

Edit 25/03/12: I’ve noticed a flaw in bm6, which when corrected causes GCC to perform much worse – about 0.5 seconds rather than the 0.288 reported above.

Evince installation documentation

I just upgraded Evince on my system. Initial attempts were unsuccessful; I got the following message when trying to start it:

(evince:22841): GLib-GIO-CRITICAL **: Settings schema ‘org.gnome.Evince.Default’ is not installed

I have no idea what this means so in the interim I try upgrading ‘glib’. This mostly goes without hiccups (except that newer versions of glib apparently have a dependency on libffi, for reasons which are tentatively explained in the glib README), however, I get the same error message when I try to start Evince. After some Googling, I discover the necessity of ‘compiling the schemas’:

# glib-compile-schemas /usr/share/glib-2.0/schemas

No mention at all of this in Evince build documentation. It looks like the schema might be compiled automatically if you “make install” Evince without setting DESTDIR, but frankly the necessity of compiling the schema should be documented.

Be Careful with Simplistic Locking

(or: Re-entrant mutexes considered harmful)

A common method of protecting data from race conditions in an object-oriented system is to use a simple mutex. The pattern goes something like this:

Call some access or modification method on an object
On entry to the method, mutex (associated with the object) is obtained
… method does its stuff
Mutex is released
Method returns

Java, for instance, provides built-in support for this particular sequence with the “synchronized” keyword. A method tagged as “synchronized” automatically obtains the mutex on entry and releases it on return, which removes a lot of the burden from the programmer who otherwise has to make sure that the mutex is properly released on every code path (including exceptions). This works fine in a lot of cases. Java’s mutexes are also re-entrant meaning that one synchronized method can call another one in the same object without causing instant deadlock, even though they share the same mutex (the mutex is associated with the object, not the method). However, “synchronized” is not suitable as a general purpose mutex, and in fact its re-entrancy can lead to bad design.

Consider a design which uses the listener pattern. That is, some class has a modifier method which modifies data and notifies a number of listeners of the change (via some listener interface). How could such a design be made thread-safe? it might be tempting to mark the modifier method as synchronized, and in many cases this would work fine. If a listener needs to access or modify data, it can do so, and the re-entrant nature of “synchronized” means no deadlock will occur. However, this approach is a bad design, for subtle reasons, which basically boil down to this: a mutex should not necessarily prevent data access or modification by threads which do not hold the mutex.

Huh, you may be thinking, what’s this guy smoking? That’s precisely what a mutex is for. But that’s not exactly what a mutex is for; a mutex is a mechanism to allow threads to co-ordinate access to data (or some resource) in order to avoid race conditions, but it is not specifically meant to limit resource access to the single thread holding the mutex. In particular, if the thread holding the mutex is waiting on a mutex held by a second thread, and the second thread is operating under the assumption that this is the case, then it should be fine for that thread to access the resource that is protected by the mutex – clearly there can be no race condition, since the only other thread that otherwise has any right to access the resource (i.e., the thread that holds the mutex) is blocked.

If you think this argument is absurd, consider two cases in Java where this situation can come about:

An arbitrary thread needs to modify the data (the “model”) of a Swing component, and needs to do so synchronously, from within a listener callback which is called with a mutex protecting an object “O” held. To modify the model safely the listener needs to use EventQueue.invokeAndWait() or equivalent. The code invoked on the dispatch thread also attempts to access the object “O”. Bang – deadlock.
Again, a listener callback is called with a mutex held. This time the listener calls a method on a remote object (in a second Java VM) via RMI. The remote invocation calls back in to the first VM and attempts to obtain the (supposedly re-entrant) mutex, however, due to the RMI implementation, is now running on a different thread in the first VM. Bang – deadlock.

The obvious workaround – invoking the listener callback(s) outside of a synchronized block, i.e. without the mutex held, does work; but it leaves the possibility that the data is modified between the modification event and the listener being notified, which can be undesirable. The best solution from a design point-of-view is to move the acquisition and release of the mutex outside of the responsibility of the object being protected by it. Thus the sequence at the start of the would be changed to:

Acquire mutex
Invoke method
… method does its thing
Method returns
Release mutex

Also, it should be assumed (even enforced) that the mutex is not re-entrant, to allow for the case where the executing thread is not the one which actually acquired the mutex. This is more complicated, and places more burden on the programmer – which is not desirable, especially when dealing with concurrency – but it is a better overall design. For one thing, it allows a solution for the two problems outlined above; secondly, it separates concerns – why should an object worry about synchronising if it may or may not be used in a multi-threaded system?

Ideally there’d be some language support for this sort of design too. It would be nice if there was a way to tag that a specific object should only be touched when an associated mutex was held, and have static analysis to determine when such a rule was being broken. Unfortunately such things are not yet readily available/usable, at least not as far as I’m aware.

In conclusion: be careful when using “synchronized” or re-entrant mutexes; consider using non-re-entrant mutexes instead.

Lies, damned lies, and Rails documentation

I was recently playing with Rails again. A while back a site I was working on had an issue where the parameters from an uploaded form were all coming in as file objects – even the parts that didn’t correspond to uploaded files. The client in this case was some Java software using the Apache Httpclient library. We’d only recently upgraded Rails, and it turned out we were seeing the problem described here:

http://blogs.sun.com/mandy/entry/rack_and_content_type_fo

The problem was easily worked around. This led me on a hunt, however, for some documentation on how exactly you are meant to handle form uploads in Rails. About the only vaguely official-looking documentation I can find at this point is the Action Controller Overview guide, which has this to say:

Note that parameter values are always strings; Rails makes no attempt to guess or cast the type

Clearly baloney. If one of the form parts is a file upload, then what you get is an object with several methods defined including “read” to return the contents of the file as a string, as well as things like “original_filename” which tell you the filename as specified by the uploader. None of this is documented anywhere.

It’s this “we don’t need to document it” attitude that really annoys me about Rails. Sure, there’s all this magical stuff you can do with it and it has all these built-in goodies that can save you hours of coding just by using magic variable names (except for those times of course when you have to spend hours debugging, because you used one of those magic variable names by mistake and without realising). But there’s just no reliability with the API. The params hash for instance – what exactly am I allowed to do with a value I pull from the hash? Can I be sure that it will either be a string or one of these mysterious file upload objects? This isn’t the only omission in the documentation; for instance what does the “session” method in ActionController::Request actually return? What am I allowed to call on it?

This is the real strength of languages with strong typing – programs written in them are, to a certain degree, self-documenting. If I know what type a method actually returns then I know for certain what methods I can call. No nasty surprises when the framework developers suddenly decide to make it return a different kind of object!

Edit 27/8/2010: Another major strength of statically-typed languages that I forgot to mention is of course the ability to refactor code with the assistance of automated tools. As discussed in this blog post by Cedric (and I love the title).

Schwartz did not. Innocent until proven otherwise.

OSNews is running an article with this headline:

De Icaza: Sun’s Schwartz Pitched Google Lawsuit to Oracle

Seriously. De Icaza is saying that Schwartz had actually pitched the idea of a a Google lawsuit to Oracle; but he has nothing to back it up.

His report has been confirmed by James Gosling …

No it hasn’t. James Gosling’s blog post says nothing about J. Schwartz whatsoever. In this community I’d expect a bit more constraint. Icaza is full of crap in this. All you have to do is read James’ blog post. He, repeat, says nothing about Schwartz. Why is De Icaza pointing fingers?

Icaza specifically says:

So now we know that Jonathan shopped Sun with a big “Sue Google” sign. So much for his visionary patent defense against Apple and of course this jewel: …

Ehh, wrong. Icaza, you really are an idiot. Read James’ friggin’ post. OSNews, you are also idiots for reporting this.

Update 2012-05-07: So Google’s star witness in Oracle vs Google is… Jonathan Schwartz. So much for De Icaza’s credibility.

Updated C compiler benchmarks: llvm-2.7, libfirm-1.18.1

Since I did my previous C compiler benchmark, there have been a few new software releases, including Gcc 4.5, llvm 2.7 (and 2.6) and libfirm 1.18 (and 1.17). Here are results from benchmarking these compilers:

Most important things: Gcc version is now 4.4.4, though this hasn’t made much difference to the results. llvm-2.7 is benchmarked both with the “native” clang frontend, and the gcc-4.2 based frontend; note the difference in bm1, 3 and 6.

In the case of LLVM, results have improved significantly in bm2 and bm5.

For LibFirm, results have improved dramatically for bm5, but are also significantly worse for bm3.

Gcc is still (subjectively) the overall winner, though it does get beaten in a few of the benchmarks (bm2,6,7) by LLVM and even in a couple (bm5,7) by LibFirm.

Gcc 4.5 has been released but has yet to be benchmarked. Edit 13/7/2010: Gcc 4.5 benchmarked, and the results are so similar to those for Gcc 4.4.4 that it’s not worth reporting them separately.

There are three new tests:

bm8 tests the compiler’s ability to remove trivial loops (loops with empty bodies, but a large number of iterations). The loops are implemented with “goto” rather than a C loop structure, and there is an inner and outer loop. No tested compiler manages to remove the loops.

bm9 tests the compiler’s ability to break up a loop into two separate loops, when doing so avoids an efficiency problem caused by a potential aliasing pointer. The function in the benchmark takes two arguments: an int array and an int pointer “p”. The inner part of the loop uses the value pointed at by “p” as part of an expression assigned to each array element in turn. Because “p” might alias one of the array elements, the stored value must be re-loaded each iteration, unless the compiler breaks the loop into two. None of the tested compilers perform this optimization.

bm10 tests that a shift-left of a value inside a large loop can be reduced to a constant (due to having shifted all the bits out). None of the tested compilers manage this.

It’s disappointing that some optimizations which potentially yield huge benefit are not implemented in any of the tested free compilers; however, GCC 4.5 has been released and is yet to be tested (edit 13/7/2010: tested, no surprises). Also, GCC includes some options for loop optimization which are not turned on at any optimization level, due to them requiring external libraries at build time; it would be interesting to see whether these would make any difference in any of the tests (13/7/2010. They don’t).

Further update 4/8/2010: I’ve just tested with gcc 4.5.1. No very noticable differences were seen from the performance of 4.4.4, however, I am running a new kernel and this seems to slightly affect the numbers for 4.4.4; for instance, bm3 comes down to about the 0.162 mark, more-or-less even with LLVM.