C Compiler Benchmarks

Edit 1/5/09: I’ve kind of re-done this. I wasn’t happy with it and some of the results were incorrect. I’ve also added two new benchmark programs since the first version of this post.

Well gcc 4.4.0 has been released now (with a few bugs, of course…) and I thought it’d be interesting to do some benchmarks, especially as there are now a few open source compilers out there (gcc, llvm, libfirm/cparser, pcc, ack; I’m not so interested in the latter two just now). Now I don’t have access to SPEC so I cooked up just a few little programs specifically designed to test a compiler’s ability to perform certain types of optimization. Eventually, I might add some more tests and a proper harness; for now, here are the results (smaller numbers are better):

Thankfully, it looks like gcc 4.4.0 performs relatively well (compared to other compilers) in most cases. Noticable exceptions are bm5 (where gcc-3.4.6 does better), bm3 and bm6 (where llvm-2.5 does better, especially in bm6).

llvm-2.5 does pull ahead in a few of the tests, but fails dismally in the 4th, which tests how well the compiler manages a naive memcpy implementation. llvm-2.5 generated code is more than four times slower than gcc generated code in this case, which is ridiculously bad. If it wasn’t for this result, llvm would be looking like the better compiler.

Sadly, none of the compilers I tested did particularly well at optimising bm2. In theory bm2 run times could be almost idential to bm3 times, but no compiler yet performs the necessary optimizations. This is a shame because the potential speedup in this case is obviously huge.

It’s surprising how badly most compilers do at bm4, also.

So, nothing too exciting in these results, but it’s been an interesting exercise. I’ll post benchmark source code soon (ok, probably I won’t. So just yell if you want it).

Continue reading “C Compiler Benchmarks” →

fork/exec is forked up

I don’t know why it’s taken me so long to realise this, but the whole fork/exec shebang is screwed. There doesn’t seem to be any simple standards-conformant way (or even a generally portable way) to execute another process in parallel and be certain that the exec() call was successful. The problem is, once you’ve fork()d and then successfully exec()d you can’t communicate with the parent process to inform that the exec() was successful. If the exec() fails then you can communicate with the parent (via a signal for instance) but you can’t inform of success – the only way the parent can be sure of exec() success is to wait() for the child process to finish (and check that there is no failure indication) and that of course is not a parallel execution.

If you have a working vfork() then you can communicate with the parent process using a shared memory location (any variable declared “volatile” should do the trick, bearing in mind that “volatile” is not particularly well defined) although doing so is not standards conformant (see http://opengroup.org/onlinepubs/007908775/xsh/vfork.html for instance). By “working” vfork() I mean a vfork() which 1. causes the address space to be shared between the child and parent process (rather than copying the address space as fork() does) and 2. causes execution of the parent process to be suspended until the child either successfully exec()s or _exit()s. The 2nd requirement is necessary to avoid the race condition whereby the parent process checks the status variable before the child process has actually tried to exec() or similar.

Even the vfork() solution, if it can be used, is reliant on the compiler doing the right thing both in terms of volatile access, and in terms of vfork() generally. (Note that “right thing” in terms of volatile access is the generally accepted definition, not that in the C standard. Also, the fact that vfork()s correct operation really requires compiler support is something that I’m not willing to go into right now, other than to say that the compiler is allowed to make the assumption that “if (vfork()) { } else { }” will always follow one branch and not the other which is not the case here as both branches will be executed with the same address space).

The only other solution I can think of is to use pipe() to create a pipe, set the output end to be close-on-exec, then fork() (or vfork()), exec(), and write something (perhaps errno) to the pipe if the exec() fails (before calling _exit()). The parent process can read from the pipe and will get an immediate end-of-input if the exec() succeeds, or some data if the exec() failed. This seems like it should be fairly portable, as long as FD_CLOEXEC is available, but I haven’t actually tried it out yet.

Update 8/1/2009: The latter solution really requires fork() instead of vfork(), because vfork() might suspend the parent and the write by the child might then block, causing a deadlock. (In practice a pipe is always going to have enough of a buffer for this not to happen). POSIX doesn’t even let a vfork()d child call write(), anyway.

13/11/2015 also worth mentioning is the possibility for priority inversion to occur if the child process is to be run at lower priority and a third process is running at some priority level between the other two. The parent, which is running at the highest priority, should receive notification of fork-exec status as soon as possible; however, the child (running at the lowest priority) will not send this notification while the third process is scheduled.

This can avoided only by not waiting for exec to succeed or fail, but instead to process the exec status (data coming over the pipe or pipe closure) asynchronously (17/11/2015 or by eg using ptrace() to stop the child after exec() so that its priority can be set at that point; this wouldn’t work however if the child was setuid [unless the parent was root] since you can’t change the priority of a process not owned by yourself).

Update 18/9/2012: There’s a reasonably portable solution to this whole mess in the form of posix_spawn. (Update again 11/11/2015 – no. posix_spawn allows various errors to result in the child process exiting with a status of 127, so implementations may exhibit the exact same issue as the fork/exec combination. I somehow missed this earlier).

TL;DR – it’s not easy to determine immediately and without blocking whether a fork/exec pair of calls succeeded. If the child process is to run at the same or higher priority as the parent, you can do it with a CLOEXEC pipe trick; if the child process is to run at lower priority, the pipe trick suffers from potential priority inversion problems.

Using $* in shell scripts is Nearly Always Wrong

(Bourne and Bash) Shell script writers bandy ‘$*’ about like it’s a stick of celery, but the truth is $* is nearly always the wrong thing to use. (For the clueless, $* expands to all the “positional parameters”, that is, all the parameters to the current script or function; it’s useful in cases where you want to pass the argument values for those parameters on to another program).

So why is $* wrong? Because it is evaluated before word splitting. In other words, if any of the arguments to your script had a space or other whitespace, then those arguments will become two when $* is evaluated. The following shell script can be used to verify this:

#!/bin/sh
# Call this script “mytouch”
touch $*

… Now try to “mytouch” a file with a space in its name (yeah, well done, you just created two files with one command).

Does quoting it work? will “$*” solve your problems? It will fix the above example, right up to the point where you try to “mytouch” two or more files at once and then you’ll see the fallacy (hint: “$*” always expands to a single “word”).

So what is the correct solution? It’s simple: use “$@” instead. Note, that’s $@, with quotes around it. Hence “$@”. This specifically expands to one string for each actual argument. It works. Hallelujah. (And while I’m blathering on shell scripts, note that any unquoted variable expansion is also Nearly Always Wrong for pretty much the same reasons).

The Joy of Wireless Networking with Linux

I just bought a cheap D-Link DWL-G120 “AirPlus” wireless adapter which connects to the PC via USB. I was hoping to either set it in AP (“Master”) mode and use it as an access point (so I can connect to my server from my MacBook wirelessly) or, failing that, in Ad-hoc mode; however, it seems that the prism (p54usb) driver in linux supports neither mode. Not that this is actually documented anywhere or anything, no, that would be too helpful! You can actually set Ad-hoc mode using iwconfig, but then trying to bring up the interface (with “ifconfig wlan0 up”) generates:

SIOCSIFFLAGS: Operation not supported

Needless to say, that’s not very helpful, and there’s no dmesg output which gives any clue as to what the problem is. Your humble narrator spent several hours sifting through kernel source code before he discovered where the problem was (drivers/net/wireless/p54/p54common.c, p54_add_interface() method refuses anything but managed mode; no trivial fix).

I found a patch which apparently adds support for both Master and Ad-hoc modes, and though I managed to apply it against my 2.6.26.3 kernel tree, it doesn’t work and caused a kernel panic when I was messing around with it (so, yeah, not good really). It didn’t apply cleanly so though I assume that there might be some kernel tree out there with working Master mode for the p54, but if there is I can’t find it. The patch doesn’t seem to be applied in the wireless-next tree. Looks like I’ll just have to find a wireless router for the time being.

Links:

http://wireless.kernel.org/en/users/Drivers/p54 (yeah, ok, here it says that only Station mode (= Managed mode) works).

Edit 2008-09-01: Hmm. I now have a linksys router running (as an AP) and still can’t connect to it using the D-Link. Can’t be bothered to debug it just now, but I suspect the driver just Does Not Work at all (though it can list various networks it finds on the airwaves, so it works partially at least).

Re-thinking the website sign-up

There are a bunch of websites where you have to sign up as a user, a proces which normally requires that you hand over an email address which the site will validate by sending you an email with some sort of secret token (a URL) that you need to actually activate the account. As it happens, I’m implementing such a feature at the moment. Assuming that these sites aren’t actually harvesting the addresses for purposes of on-selling them or spamming them directly, it’s usually because they need a way to do things like:

prevent a single person from creating an unlimited number of accounts
prevent all but the most clever bots from creating accounts
permanently ban troublesome users

All of these are perfectly valid concerns. Most sites that I am aware of require you to provide an email address when you create your account, at which time you also choose your username and password (and provide whatever other details are deemed necessary). However, there are problems with this approach, and I think there’s a better way to do it.

Continue reading “Re-thinking the website sign-up” →

Ruby on Rails “count” method

Check this out:

Ruby on Rails “count” method documentation

Can anyone, anywhere, tell me why “: order” would be needed if I am counting something? (Please note, space inserted so that wordpress doesn’t convert it into a smiley).

There are so many other holes and discrepancies in the Rails docs that I won’t try and list them, but this one takes the cake so far.