What “restrict” really means

I recently stumbled across a web page claiming to be “Demystifying The Restrict Keyword“, a claim which I consider dubious at best. Of the semantics of C99’s restrict, the page (written by a Mike Acton) says:

[when declaring a restrict-qualified pointer] I promise that the pointer declared along with the restrict qualifier is not aliased. I certify that writes through this pointer will not effect the values read through any other pointer available in the same context which is also declared as restricted.

Err, wrong. Firstly, a pointer declared with restrict is allowed to have aliases, it’s just that restrictions are placed on how those aliases are used or created. The C99 standard as always makes use of completely inaccessible language to describe the semantics of “restrict” – see 6.7.3.1, “Formal definition of restrict” – but in this case we can wade through the cruft and extract some actual meaning. The key point is that if a value pointed at by a restrict-qualified pointer is accessed through that pointer in any way, and is also modified through some pointer expression, then the latter pointer expression must be “based on” the restrict-qualified pointer. “Based on” means what it sounds like it means – essentially, that the value of the expression depends on the value of the pointer on which it is based. Here’s a counter-example to the restriction on aliasing expressed by Mike Acton above:

int * restrict p = ...;
int a = *p;   // here we have used 'p' for access
int * q = p;  // q aliases p
*q = 4;       // q is "based on" p, so this is allowed

The second part – “I certify that writes through this pointer will not effect the values read through any other pointer available in the same context which is also declared as restricted” – is also wrong (and not just grammatically). What restrict means is (and I’m simplifying just a little) that – if the restrict-qualified pointer is used to access an object, and the object is modified through any pointer, then the latter pointer and all other pointers (regardless of whether they are qualified with restrict or not) which are used to access the object will be “based on” the restrict-qualified pointer. Because we are allowed (with some restrictions) to assign the value of one restrict-qualified pointer to another, we can easily devise another counter-example to Mike Acton’s contract:

int * restrict p = ...;
{
    int * restrict q = p;  // q aliases p, both are restricted
    // Note that both q and p are "available in [this] context"
    *q = 4;  // this writes through q and affects the value read through p,
             // and yet is perfectly legal! (because q is based on p).
}
(*p);  // note the read through p must occur outside the block above, because
       // p is not based on q.

“restrict” does have some tricky semantics and it’s a shame that someone would “demystify” them so misleadingly. It could be a grave mistake to think that a non-restrict-qualified pointer can always safely alias a restrict-qualified pointer for instance.

It doesn’t quite stop there; the page goes on with a length example of using restrict to improve optimisation of a small function. The solution presented goes too far, assigning the address of each member of three different objects of the same structure type to different restrict-qualified pointers before performing calculations on them. This shouldn’t really be necessary; qualifying the structure pointers themselves with restrict should suffice, since the compiler should already know that different members of the same struct cannot legally alias (however, I’ve done some testing with gcc 4.6.3 and I’m not sure that this is the case with this compiler). Also, I’m fairly certain the solution invokes undefined behaviour via the odd and unnecessary pointer arithmetic involving a ‘stride’; A better version follows:

void
move( vector3* restrict velocity, 
      vector3* restrict position, 
      vector3* restrict acceleration, 
      float             time_step,  
      size_t            count)
{

  for (size_t i=0;i<count;i++)
  {
    float* restrict acceleration_x = &acceleration[i].x;
    float* restrict velocity_x     = &velocity[i].x;
    float* restrict position_x     = &position[i].x;
    float* restrict acceleration_y = &acceleration[i].y;
    float* restrict velocity_y     = &velocity[i].y;
    float* restrict position_y     = &position[i].y;
    float* restrict acceleration_z = &acceleration[i].z;
    float* restrict velocity_z     = &velocity[i].z;
    float* restrict position_z     = &position[i].z;
    
    const float ax  = *acceleration_x;
    const float ay  = *acceleration_y;
    const float az  = *acceleration_z;
    const float vx  = *velocity_x;
    const float vy  = *velocity_y;
    const float vz  = *velocity_z;
    const float px  = *position_x;
    const float py  = *position_y;
    const float pz  = *position_z;

    const float nvx = vx + ( ax * time_step );
    const float nvy = vy + ( ay * time_step );
    const float nvz = vz + ( az * time_step );
    const float npx = px + ( vx * time_step );
    const float npy = py + ( vy * time_step );
    const float npz = pz + ( vz * time_step );

    *velocity_x   = nvx;
    *velocity_y   = nvy;
    *velocity_z   = nvz;
    *position_x   = npx;
    *position_y   = npy;
    *position_z   = npz;
  }
}

From my tests this was by far the most performant (on a core-i7 based machine) with gcc (compilation options: -std=c99 -O3 -march=core2). When using the clang compiler, there wasn’t much variation in results between any version, including the naive version which didn’t use restrict at all (a slight performance benefit can be noticed when simply introducing restrict qualification into the parameter declarations, but no further benefit comes from using the code above, and only a negligible benefit comes from Mike Acton’s code which I believe invokes undefined behaviour anyway).

 

Advertisements

Cinnamon (again) and hacky non-bugfixes

After getting frustrated with Ubuntu’s Unity interface to the point of wanting to bludgeon myself to death with a bag of squirrels, I tried out Cinnamon by manually installing the latest version. It’s actually pretty nice, except for one major annoyance: every time I get a skype message, it un-minimises the relevant chat window and brings it over the top of my current foreground window/process. This is really annoying. There is a bug entry in Cinnamon’s bug database, but it closes with this frankly disturbing comment by someone called Clement Lefebvre (clefebvre):

I don’t like the idea of this being configurable.. mainly because this isn’t a question of preference but a question of fixing a bug. The focus should always be taken when the user launches a new app and never be taken on his behalf when he doesn’t trigger the creation of new content.

For Skype, I fixed the bug with this https://github.com/linuxmint/Cinnamon/commit/c5bbcad1cc4cc8183f2556deef867d0fae5f0109

Well, right, it’s a bug and should be fixed properly, but, check out the fix:

if (!window || window.has_focus() || window.is_skip_taskbar() || window.get_wm_class() == “Skype”)

What the … it actually checks for Skype and handles it specially. That’s got to be wrong. Clement then goes on to say:

For other apps which face a similar issue, I’d like people to insert the following code at line 27 in /usr/share/cinnamon/js/ui/windowAttentionHandler.js:

global.log_error(“Focus stolen by : ” + window.get_wm_class());

Then restart cinnamon, and when the focus is stolen from you, click on the ^ applet, troubleshoot, looking glass, click on the error tab, and you should see the wm class name of the app which stole your focus. Give me that wm class name, and we’ll add it to Cinnamon.

Oh for fuck’s sake… this is not how you do these things. Fix the bug in Cinnamon, make it respond to notifications properly for all apps; don’t just implement a workaround on an app-by-app basis. Am I missing something here? Surely other window managers aren’t implementing this workaround (because they don’t need to, because they handle the notification correctly in the first place)? This doesn’t exactly inspire confidence in the quality of Cinnamon.

PHP, thou art a wart on the face of the web

I’ve long despised most dynamically typed languages, for reasons which I’ve yet to properly enunciate, but I’ve begun to loathe one above all the others: yes, I’m referring to the abomination that is PHP. I’ve never got around to properly writing about why I abhor it so much but I was recently given a link to an article on another blog, which contains (amongst a much larger and very well argued discussion) this little gem (reproduced with permission):

I can’t even say what’s wrong with PHP, because— okay. Imagine you have uh, a toolbox. A set of tools. Looks okay, standard stuff in there.

You pull out a screwdriver, and you see it’s one of those weird tri-headed things. Okay, well, that’s not very useful to you, but you guess it comes in handy sometimes.

You pull out the hammer, but to your dismay, it has the claw part on both sides. Still serviceable though, I mean, you can hit nails with the middle of the head holding it sideways.

You pull out the pliers, but they don’t have those serrated surfaces; it’s flat and smooth. That’s less useful, but it still turns bolts well enough, so whatever.

And on you go. Everything in the box is kind of weird and quirky, but maybe not enough to make it completely worthless. And there’s no clear problem with the set as a whole; it still has all the tools.

Now imagine you meet millions of carpenters using this toolbox who tell you “well hey what’s the problem with these tools? They’re all I’ve ever used and they work fine!” And the carpenters show you the houses they’ve built, where every room is a pentagon and the roof is upside-down. And you knock on the front door and it just collapses inwards and they all yell at you for breaking their door.

That’s what’s wrong with PHP.

I’ve not really ever had to write much PHP but I’ve sat looking over the shoulder of someone who was in the process of coding some small PHP scripts, and I was horrified. The problems with the ‘==’ operator, and the overloading of the function return types, were particularly unsettling. I’ve written previously about my difficulty in just compiling PHP so that it worked as I wanted – which just adds to the feeling that PHP was simply slapped together from a load of unrelated pieces, and it was only by chance that a turing-complete language emerged. One of these days I’ll get around to explaining just why dynamic languages are bad, but in the meantime get some entertainment (and perhaps education) by reading Eevee’s post.

2012-05-05 addendum: Add “insecure” to my list of gripes.

Cairo 1.12.0 – buggy as buggery?

Cairo 1.12.0 has been released with a swag-full of new features (such as XCB backend) and improvements. Apparently. For me, it causes random text corruption, noticeable most prominently in Firefox (when built with –enable-system-cairo of course) but also in a few other GTK-based apps. I’ve tried compiling with both GCC 4.7.0, 4.6.3 and LLVM 3.0 without any resolution. Cairo 1.10.2 doesn’t produce corruption, 1.12.0 does. For the moment I recommend holding off on the upgrade.

The situation would be a lot better if Cairo had a halfway decent test suite. It does have a test suite, but when I run it (on 1.12.0 or 1.10.2) I get a huge number of failures which are completely inexplicable to me and apparently to LFS authors also. Quotes from that page:

As the test suite is currently unreliable, it is best to simply skip it at this time.

and

Note that the tests take a long time to run and many of them fail for unknown reasons.

… which pretty much sums up my experience as well. What a joke.

Update: seems I’m not the only one who’s seen this. Not clear at this stage whether it’s really a bug in cairo, in the radeon driver, or the EXA extension.

Cinnamon. WTF is it?

Check out the Cinnamon web site.

It’s not so bad, as web sites go, in terms of appearance. Nice width, good text/background contrast. They’re probably using WordPress in the backend. That transition between screenshots (if that’s what they are) effect is really nice, I like it.

Notice they had a release recently – up to version 1.4 now, it seems, and 1.3.1 was … well, I don’t know how long ago, because the news items aren’t dated, which does seem like a serious omission, but at least I can click the links to get the full story with date: 1.4 was released 14/3/2012 and 1.3.1 was 20/2/2012, less than a month earlier. Ok, I can live with that. Hmm, fast release schedule! Usually a sign of a project in its infancy: enthusiastic developers, and plenty of low-hanging fruit in terms of bugs to fix and features to add. Which makes the next observation even more puzzling.

No – mention – anywhere – of – what – the – heck – it – is.

How exactly do you manage to put together a website with flashy screenshots and monthly news items and still neglect this most basic, this most essential of details? I can only assume that the webmaster is some form of greater ape. Possibly an orangutang. Don’t get me wrong, I like orangutangs, they have an odd graceful sobriety about them, and you can train them to wash clothes and (apparently) bang up a quick website, but they just don’t have the intelligence of a real human.

So please, Cinnamon, ditch the ape, get a human, and put an “about” page on your crappy website.

(For anyone who’s just itching to know, LWN has an entry here which explains what Cinnamon is).

Updated C compiler benchmarks, Feb 2012

With several new compiler releases it’s time to update my compiler benchmark results (last round of results here, description of the benchmark programs here). Note that the tests were on a different machine this time and the number of iterations was tweaked so numerical results aren’t comparable.

Without further ado:

So, what’s interesting in this set of results? Generally, note that LLVM and GCC now substantially compete for dominance. Notably, GCC canes LLVM in bm4, where it appears that GCC generates very good “memmove” code, whereas LLVM generates much more concise but apparently also much slower code; also in bm8 (trivial loop removal – though neither compiler actually performs this optimization, GCC apparently generates faster code). On the other hand LLVM beats GCC quite handily in bm6 (a common subexpression elimination problem).

GCC 4.6.2 improves quite a bit over 4.5.2 in bm5 (essentially a common subexpression refactoring test). However, it’s slightly worse in bm3 and for some reason there is a huge drop in performance for the bm7 test (stack placement of returned structure).

The Firm suite, surprisingly, mostly loses ground with 1.20.0 and remains uncompetitive.

Edit 25/03/12: I’ve noticed a flaw in bm6, which when corrected causes GCC to perform much worse – about 0.5 seconds rather than the 0.288 reported above.