POSIX timer APIs are borked

I’m currently working on Dasynq, an event loop library in C++ (which is not yet in a state of being ready for use by external projects, though the functionality it currently exposes does work correctly as far as I know). It has come to the point where I want to add timer functionality, and this has been frustratingly tricky, mostly due to horribly designed APIs.

There are a few basic requirements to set out before I start:

  • There are essentially two types of timer – relative and absolute. I either want the timer to expire some given interval from now, or I want it to expire at some specific (“wall clock”) time. In the latter case, if the system time is changed the timeout should be suitably adjusted. (Example: if I set an alarm for 04:00, and the system change is changed by the user from 03:25 to 04:15, the alarm should expire immediately).
  • I want to be able to be sure that I can use a timer, at some point in the future. That is, I need to be able to allocate timers in advance (without necessarily arming them immediately) or at least to be able to re-set an existing timer to a specified timeout. I should be able to avoid the situation where I need a timer, which I knew I would need in advance, but am unable to create one due to resource limits / exhaustion.
  • I need a reasonable level of resolution. Timers should be usable for everything from running weekly tasks to animation timing.

With the above points in mind, let’s take a look at what POSIX provides.

POSIX timer APIs

First, the most basic timer-like call provided by POSIX is the alarm(…) function. It has second granularity, which rules it out immediately.

Then, there’s setitimer(…). This isn’t a particularly nice interface and delivers timer expiry via a signal. There is only one timer (well, there is one timer for each of several of different kinds of clock), which means that to allow for multiple timers to be managed we need to essentially multiplex the single timer; by itself this isn’t such a huge problem, but other limitations of the API make it fundamentally difficult, and the API is pretty broken to begin with in several ways.

The first problem with setitimer is that the interval timers deliver timeout events via signals, which is awkward, especially since setitimer itself is not async-signal-safe, meaning you can’t call it from within the signal handler to set the next desired timeout; if you want to multiplex multiple timeouts over the interval timer interface, you’re forced to turn asynchronous event notifications into synchronous events (which of course is what libraries like Dasynq are all about, so by itself this isn’t a huge problem).

The next problem with setitimer is that it only allows setting a relative timeout. If I want to get a timer notification at an absolute time, I need to get the current clock time (clock_gettime), calculate the time remaining until that time, and then set the timer. Not allowing an absolute timeout means that setting an alarm for a wall-clock time is pretty much impossible – since if the system time is changed by the user, the timer’s timeout interval won’t be adjusted. However, there’s a more subtle issue here: time might elapse between the calculation and setting the timer – the process could be preempted just after calculating the interval, and in unusual cases might not be scheduled again for a significant period of time. By the time it finally arms the timer, the interval is significantly incorrect. The only work to work around this (that I can think of, other than pretending that the problem doesn’t exist) is to check the clock time immediately after setting the interval, to make sure that it’s within a certain window of tolerance of the original measurement – and if not, to re-calculate the interval and reset the timer.

The safety check just described requires a minimum of two clock_gettime calls (which generally means two calls into the kernel) just for setting a timer. If multiple timers are being managed over the top of a single interval timer, that’s going to mean that two clock_gettime calls are required on each timer expiry.

Generalised POSIX timer interface

The real-time POSIX extensions also define timer_create, which appears to solve some of the problems above:

  • It allows the creation of multiple independent timers
  • It allows for specifying absolute (as well as relative) timeouts, and when using the realtime clock the timeout interval will be adjusted appropriately (for absolute timeouts) if the system time is altered

However, notification is still either via a signal, or via a thread (SIGEV_THREAD); the latter is problematic for implementations, because there is usually no way to detect notification failure if a thread cannot be created due to resource limits, and because it requires userspace support; on Linux you need to link with -lrt (and thus also pull in the pthreads library) to use timer_create etc, even if you don’t use SIGEV_THREAD. On OpenBSD the situation is worse – the realtime extensions are generally not supported, and create_timer et al are not available at all.

Even using timers with SIGEV_SIGNAL notifications is less than ideal. Using such timers in different threads requires cooperation between all threads, to either choose different signal numbers for notification or otherwise to have a common signal handler somehow suitably dispatch notifications to the correct thread.

Non-POSIX solutions

On Linux, timerfds (timerfd_create et al) provide an apparently sane solution to the whole messy problem – it supports multiple timers, supports absolute and relative timeouts, and delivers events by file handle notifications (can be used with select/poll/epoll etc). A large number of timers could be feasibly multiplexed over a single timerfd, which is good from a resource management perspective.

On OpenBSD (and various other BSDs) there is the option of using kqueue timers. This supports multiple timers, and neatly solves the problem of notification; however it only allows relative timeouts, and the timeout/interval cannot be changed, meaning that multiple timers cannot be multiplexed over a single kqueue timer. Even worse, it is not possible to pre-allocate timers; once created, they begin countdown immediately. This makes it impossible to discover resource allocation failure until the point that the timer is actually needed.

In Conclusion

The POSIX timer APIs are awkward and clunky. The setitimer functions support only limited use cases.  On the other hand, the generalised interface would be difficult to use in a library (since it either requires signal handling or multi-threading). On Linux, the timerfd interface is an ideal substitute. On other systems the general timer interface can be used, with some caveats and trade-offs, but it is not always available; on systems where it is not, and there is no system-specific replacement, the only option for wall-clock timers is to assume that the system clock does not change (other than by the usual tick) while the system is running.

Why do we keep building rotten foundations?

APIs are like bones: sometimes you have to break them to mend them, and it fucking hurts.

It’s not a perfect analogy, but it reflects an observation of (superficially) pragmatic behavioural tendencies, even if not a necessary truth. A broken implementation is bad, but it can be repaired. A broken API can be repaired too, but anything built on it then also needs to be fixed. And often when this occurs we like to throw the baby away with the bath water, start afresh, make a new clean design where we apply the lessons learned in the past attempts and produce something that, this time, will be perfect. Except, of course, we don’t always remember all the lessons, and it never is.

For instance, look at GTK. There was a recent blog post detailing plans for GTK development (and a further followup), which starts by discussing:

… the desire to create a modern toolkit with new features vs. the need to keep a stable API.

… which in my opinion, should not be a conflict, but let’s work our way there. Some choice snippets:

… and created a hesitation to expose “too much” API for fear of having to live with it “forever”.

(A legitimate concern, and one that gets to the heart of the matter, but let’s stick to the piece for now).

We want to improve this, and we have a plan.

And of course I read this and I am worried.

“Don’t worry Mr B, I have a cunning plan to solve the problem”

 We are going to increase the speed at which we do releases of new major versions of Gtk (ie: Gtk 4, Gtk 5, Gtk 6…). We want to target a new major release every two years.

Urgh.

The new release of Gtk is going to be fully parallel-installable with the old one. Gtk 4 and Gtk 3 will install alongside each other in exactly the same way as Gtk 2 and Gtk 3 — separate library name, separate pkg-config name, separate header directory. You will be able to have a system that has development headers and libraries installed for each of Gtk 2, 3, 4 and 5, if you want to do that.

No! DO NOT WANT! It’s bad enough having Gtk 2 and Gtk 3 (thankfully Gtk 1 is long gone), and you want me to have to litter my system with even more Gtk-sized turds… Please, no.

Oh well, I guess it can’t get any wors-

Meanwhile, Gtk 4.0 will not be the final stable API of what we would call “Gtk 4”. Each 6 months, the new release (Gtk 4.2, Gtk 4.4, Gtk 4.6) will break API and ABI vs. the release that came before it.

Oh for fucks sake, really? So your “major new release every two years” is, in effect, actually a major new pain-in-the-proverbial every 6 months fuck fuck fuck.

We will, of course, bump the soname with each new incompatible release — you will be able to run Gtk 4.0 apps alongside Gtk 4.2 and 4.4 apps, but you won’t be able to build them on the same system

Right, so every 6 months there will effectively be a new version of GTK with its own set of libraries and its own themes (getting to that, bear with me) and our systems will become a myriad of GTK because all the new apps developed during this time aren’t going to have any clue about which version they should target and trying to keep up the with latest API is going to be a futile effort because it’ll pretty much keep changing out from under your feet. Shit, there are still apps making the transition from Gtk 2 to Gtk 3 and they’ve literally had years. (I could at this point tell you how I much prefer Gtk 2 apps anyway, but there’s probably enough material there for a whole other blog post).

Before each new “dot 0” release, the last minor release on the previous major version will be designated as this “API stable” release. For Gtk 4, for example, we will aim for this to be 4.6 (and so on for future major releases).

I could mention that this seems like exactly the opposite of how major versions should work but, oops I just did. Just to illustrate:

4.0: development, 4.1: development, 4.2: development, …, 4.6: stable, 5.0: development, …

I mean, if 4.6 is the culmination of a series of changes leading to the next major release, then why isn’t it the next major release? Since when does “X.0” not designate a stable release?

“Gtk 4.0” is the first raw version of what will eventually grow into “Gtk 4”, sometime around Gtk 4.6

Oh lordy just listen to yourself for one second.

But really, forget about the ridiculous version numbering scheme; the real problem is the rapid-fire development with lack of API/ABI stability producing a plethora of incompatible versions. Why is this even necessary? Could I suggest that, rather than pumping out new and exciting APIs/features at a faster pace, what GTK really needs to do is sit down and flesh out a decent API that it won’t have to break every 6 months, for crying out loud. Because people are going to try to build software on top of your crappy toolkit, and you should accept some responsibility for the API design rather than kicking them in the nether regions twice a year. And if you really don’t want people to try and keep up with the latest and greatest and expect them to keep developing on GTK 3 despite that fact that you’re up to version 4.4 (or whatever) then at least use a sensible versioning scheme that reflects the notion of development and instability without having to learn that GTK devs used a different scheme than the rest of the software world, just because they felt like it.

this gives many application authors and desktop environments something that they have been asking for for a long time: a version of Gtk that has the features of Gtk 3, but the stability of Gtk 2.

You could give them that right now if you would just stop screwing around with the API. An API is stable if you don’t mess with it. And if you’re having to mess with your API on a continual basis, you’re doing it wrong. Like, the whole software development thing, all wrong.

By all means, declare parts of your API as unstable, document them as such and adjust them during a period of development, and then declare them stable (and never touch them again!). But the notion that you think it’s OK to potentially break the whole thing (or more accurately, any part of the whole thing) at any moment is disturbing. (Maybe that’s not what’s really meant; I certainly hope so, but it’s not clear). This idea that you’ll bang away in a frenzy with your software hammers for a few years and what comes out at the end will be “stable” is just hokey. It’s the wrong approach. You’re off the runway and into the harbour.

I really wonder, would it have been so hard to have GTK 3 add to the GTK 2 API rather than actually break it? I mean I know there’s the whole CSS thing (which still feels like a sick joke, frankly, especially because of the half-arsed implementation) and HiDPI support, but could these not have been implemented on top of the existing API? Could not the GDK API have been implemented on top of Cairo (if it wasn’t already), and retained for backwards compatibility? Etc etc? Yes, it’d be more work – much more work, perhaps – but that work only needs to get done once, rather than in each separate application that builds on the toolkit. And if it’d had been thought through properly at the start, it could have given us a new toolkit with a lot less pain.

The Wikipedia entry for GTK claims:

The most common criticism towards GTK+ is a lack of backwards-compatibility in major updates, most notably in the API[21] and theming.[22]

(I promise I did not edit that in myself. And I mentioned theming earlier, so I should elaborate: not only are GTK 2 themes not compatible with GTK 3, but GTK 3 themes aren’t either… it seems GTK 3.18 themes don’t generally work well with GTK 3.20, for instance).

That it’s so generally acknowledged that the API stability sucks is really telling.

GTK is a foundation library. And it’s rotten, and they know its rotten. And they keep tearing it down and replacing it – with another rotten foundation. And we all suffer because of it.


Note: I started out writing this post intending to discuss API breakage in general; unfortunately GTK is such an easy target that the whole post became about the one toolkit.

Boost.Asio and resource deallocation

So:

Boost.Asio is a cross-platform C++ library for network and low-level I/O programming that provides developers with a consistent asynchronous model using a modern C++ approach.

Ok. Also (“Threads and Boost.Asio“):

io_service provide a stronger guarantee that it is safe to use a single object concurrently

Multiple threads may call io_service::run() to set up a pool of threads from which completion handlers may be invoked.

Oh… so it’s thread-safe, and can dispatch events on multiple threads? That’s great! Let’s write a web server. Let’s see, we can set up a basic_stream_socket to handle an incoming connection:

    boost::asio::io_service io;

    // start some threads which call io.run() (implementation not shown)
    setup_service_threads();

    // supposed we've accepted a connection and have native handle in fd
    auto bss = new boost::asio::ip::tcp::socket(io, ..., fd);

Then we get a request and we set up an asynchronous write to the client:

    WriteHandler handler = ...; // (not an actual type, example only)
    bss->async_write_some(boost::asio::buffer(data, size), handler);

Ok; now the handler gets notified when the write completes (or fails) and can proceed to write the next chunk as appropriate.

Oh, but then we detect overload (or get shutdown, or …) and we need to drop the connection:

    bss->close();

According to documentation for close:

This function is used to close the socket. Any asynchronous send, receive or connect operations will be cancelled immediately, and will complete with the boost::asio::error::operation_aborted error.

Great, now we can delete the socket:

    delete bss;

Aaannnnnd segfault (in another thread).

What happened? Well, it turned out that the write handler had already been called in another thread, just before we called close() on the socket. That handler is now running with the expectation that the socket still exists, and when it tries to access the now-deleted socket (to write the next chunk of data for instance) everything goes brown.

Even if close() were to wait for the handler to finish executing before it returns, that would mean that our current thread is now blocked on an (unbounded) operation in another thread. What we really want, of course, is a way to close the socket and have an asynchronous callback that tells me when there are no more pending asynchronous operations on it, i.e. that it is safe to delete any associated data (including the socket object itself).

Of the course the above is a somewhat oversimplified example, but it demonstrates the essence of the problem; an event source has associated data; at some point, you don’t need the event source any more, and you want to free the data, but you need to be sure there are no pending events before you do. Boost.Asio doesn’t make this as easy as it needs to be.

To be clear: this is an API design issue, not really a bug. I realise that the problem isn’t technically in the Boost code, but the burden of managing this issue without support from the library is significant, and furthermore is a fundamental problem in multi-threaded event loop handling.  (One workaround is to maintain shared_ptr references to the relevant data; the write handler itself needs to maintain one reference in order to prevent the data being swept out from underneath it, as it were. But this would most definitely be a kludge).

5 Gripes with C++

First of all, let me say that I actually like C++ as a programming language. This makes me a rarity among my associates, but in terms of a systems programming language it is, in my opinion, currently strides ahead of any existing alternative (especially C). But that’s enough of that; this post isn’t about how great C++ is; this post is in fact about a few of the things that I don’t like about C++. Here are they are in order of most to less annoying:

1. The stupid “empty base class optimization”

This is that thing where, if you have some class A that is empty (contains no data members and no virtual functions, thus not requiring a vtable and not, theoretically, requiring any space at all), then you discover that it’s not really empty because if you include it as a member in some other class, it will take up space.

class A { };
class B { };
class C { A a; B b; };

Now, sizeof(A)? Yeah, not 0. It will come out as 1. Same with sizeof(B) (which should not be surprising). And sizeof(C) is 2, which again is not surprising. How about if we change the definition of C though:

class C : A { B b; };

Now we get sizeof(C) = 1. You see, it turns out that objects of the same type are required to have distinct addresses – specifically: “Two distinct objects that are neither bit-fields nor base class subobjects of zero size shall have distinct addresses”, and sizeof() any class type must be at least 1, but because A and B are different types, and a special exception in the C++ language spec (C++11 1.8 para 5) that “Base class subobjects may have zero size”, it is now possible to locate the A (base class) subobject and the B (member) object at the same location, and the overall size of the derived class is reduced.

In a language with such fantastic meta-programming capabilities, where empty classes often serve as a way of containing a set of type traits for use in a template, this is significant. (A cheap example: C++ container types are templates with an element type and an allocator type; the container contains a member that is of the allocator type. Often, the allocator is an empty object, since it has no state; for example it just allocates memory using malloc()/free()).

So, ok, there is a trick to optimize size of objects by using inheritance, as shown above. In standard library implementations, this trick tends to be used heavily, because it can have a significant impact; implementations of std::pair and std::tuple, for instance, will generally use it to collapse empty members to zero size. That seems like a good idea, so why am I calling it stupid?

Because it shouldn’t be necessary.

The problem is that applying it disfigures the structure of your types. You end up inheriting from some type just because you want to make use of the empty base-class optimization, and your code becomes a right mess where accessing what should have been a member is now done instead by casting the “this” pointer… to make matters worse, you have to be careful when you use it that the potentially “empty” class really is empty, since if it has virtual methods you run the risk of accidentally overriding them.

There might be some good reasons to ensure that objects of the same type are always allocated at different addresses, but those reasons often don’t apply to the sort of classes that tend to be empty. It would be so easy, so very easy, to have some attribute (either on the type, or on members, or even both) saying that “this (object/type) does not need a unique address”, but for years we’ve instead had to perform acrobatics with our code to make use of what should be a simple and straightforward optimization.

2. Broken encapsulation model

So “private inheritance is for is-implemented-in-terms-of” and “public inheritance is for is-a relationships” are claims you may have heard at some point or other. I have no beef with how public inheritance works, but private inheritance is another kettle of fish.

Essentially private inheritance of some class X says, “I will be implemented via X. I will not be seen as an X to outside observers, however, I may pass myself of as an X when I deem it necessary to do so”. This is I suppose good for things like listener interfaces, where you want to receive events from another source (and so you need to inherit the event-listener base class) but you don’t want to expose the listener methods elsewhere. You still need to override some of the base class methods (otherwise, you could’ve used composition instead of inheritance: that is, have a member of type X, rather than privately inheriting from X).

Right, so what’s the problem? The problem is that it is still possible to override virtual private methods, including methods which are private by virtue of private inheritance by a class further up the hierarchy. If you have a class A, and a class B that privately inherits A, and then a class C that inherits B (publicly or privately), C shouldn’t know or care about B’s relationship to A, right? But it so happens that if you accidentally name a method (with an appropriate signature) the same as a method from A, you will now override that method and suitably screw up everything. That’s the problem: private inheritance is not private enough. Although, to be honest, I could envisage other changes to the language that could do away with the need for private inheritance altogether, which brings me to my next point.

3. Container object from member subobject is non-standard

Suppose I have an object of type A with a member, b, of type B. Further suppose that I have a pointer to the member b; maybe even it is a “this” pointer, because I am implementing a method in the B class. Now, if I know my B object is a singular member of an A container object, I should be able to convert a pointer-to-B to a pointer-to-A which points at the container object easily enough, right? Something like:

char * c = reinterpret_cast<char *> b_ptr;
A * a_ptr = reinterpret_cast<A *>(c - offsetof(A,b));

Easy, right? Now… hmm… I know C++’s private inheritance actually breaks encapsulation principles (see above), but could I use this little trick to overcome that problem? Let’s say I want A to “privately inherit” from some class C. Instead of using actual private inheritance in A, I use inheritance (public or private, doesn’t matter) in my member class B, and I make “B b;” a private member of A. This truly hides the relationship between A and C, since there’s no way I could subclass A and accidentally override one of C’s methods. If the overridden method (which is now in B) needs to access any of A’s data or methods, that’s fine, I can use the method above to do so; it’s a little ugly, but it works… right?

Well, yeah, it does work; it’s just that it’s not standard. “offsetof” is only required to work for plain-old-data types (which among other restrictions don’t contain any virtual methods, or any members that do). This amazingly-useful-in-the-real-world technique isn’t actually required to work by the language (in fact it explicitly classifies it as “undefined”).

The standards-compliant alternative of having an explicit pointer member in the sub-object which points to the containing object works but has a runtime cost. So, you’re faced with a choice: leaky encapsulation via private inheritance, or runtime penalty due to unnecessary extra pointer storage.

What I’d really like to see is a straightforward syntax which directly supported this technique, instead of having to jump through reinterpret_cast/offsetof-hoops to use it (only to be then warned by the compiler that your code is non-compliant). It would be easy enough to do this in such a way that it delivered the expected performance gain in real-world compilers while still behaving correctly in theoretical compilers which store objects via hashtables or something equally daft.

4. No proper mixins

What C++ programmers call “the mixin pattern” is inheritance-of-template-parameter, a technique that is occasionally useful to augment a class via another “mixin” class (usually designed for the purpose). So for example if I have:

class A<T> : public T { /* ... */ };

… then I can “mix in” any class that I like, causing the resulting template instantiation to include its methods. The main problem with this approach is that the mixed-in class is unable to call any methods from the class it is mixed into; it is, after not, not a true mix-in – it’s just plain old public inheritance, and that’s a one way street. The most direct way to work around this is to declare virtual methods in the mixin class which will then be overridden in the target class, but this has a performance overhead and also has the unfortunate effect, potentially, of allowing these methods to be accidentally overridden in subclasses of A<T>.

So, it would be sorta nice if there were real mixins – where I could just declare mixin classes specially, and then pull them into another class via some declaration (or even just overload the inheritance syntax). Obviously this would probably require the whole source of the mixin to be included in a header, but that’s already the case with templates anyway. The mixin classes would somehow need to declare members that they expect the mixed-to (or other mixed-in) class to provide.

5. There should be more flexibility in dealing with inherited members

We’re now scraping the bottom of the barrel a little, as the four points above are the main gripes I have with C++; but 5 is a nice round number.

Basically my complaint here is that names are fixed in the base class and can’t be changed in the derived class. If I have a class A with virtual method m and I publicly derive from A in another class B, then in B the method is also called m, and if I want to override it I have to use the same name, m, throughout the entire class hierarchy from that point. If I’m desperate enough I could implement a new method f which just delegated to m, and I could even make m final at the same time so that everyone’s forced to override f instead from that point, but of course there’s a runtime overhead.

Why can’t I just rename methods? Why can’t I say, “from this point in the hierarchy on, method m will now be called f”? (Or more accurately: method f overrides method m).

It seems like a small thing, but occasionally I’ve wanted something like this. There are other related issues: I can shadow a base class non-virtual method, why can’t I shadow a final method? How am I supposed to deal with multiple base classes declaring same-name same-signature methods that I need to override separately in a derived class (especially considering I need all the help I can get if I’m forced to use multiple inheritance, right?) Why can’t I remove a base-class method from visibility (causing it to be shadowed rather than overridden in further derived classes)? And of course, why can a class override a base class private method at all? (eh-hmm broken encapsulation model).

Conclusion

That about rounds it out. 5 things about C++ that I would like to see improved. Just throwing it out there… who knows, maybe someone on the committee will pay attention… pretty please?

Understanding Git in 5 minutes

Git it seems is known for being confusing and difficult to learn. If you are transitioning from a “traditional” versioning system such as CVS or Subversion, here are the things you need to know:

  • A “working copy” in Subversion is a copy of the various files in a subversion repository, together with metadata linking it back to the repository. When using Git, your working copy (sometimes referred to as “working directory”, apparently, in Git parlance) is actually hosted inside a local copy (clone) of the remote repository. To create this clone you use the “git clone” command. So, generally, you would use “git clone” where you would have used “svn checkout”.
  • A repository has a collection of branches, some of which may be remote-tracking branches (exact copies of the upstream repository), and the rest of which are local branches (which generally have an associated upstream and remote-tracking branch).
  • In Git you make changes by committing them to the local branch. You can later push these commits upstream, which (if successful) also updates the associated remote tracking branch in your repository.
  • But actually, commit is a two-stage operation in Git. First you stage the files you want to commit (“git add” command), then you perform the commit (“git commit”).
  • You can fetch any new changes from the remote repository into a remote tracking branch, and you can merge these changes into your local branch; You can combine both operations by doing a pull (“git pull”), which is the normal way of doing it.
  • A Git “checkout” is to replace your working copy with a copy of a branch. So you switch branches use the “git checkout” command. You cannot (or at least don’t normally) checkout remote tracking branches directly; you instead checkout the associated local branch.
  • So actually a Git repository contains these different things:
    • A collection of branches, both local and remote-tracking;
    • A working copy
    • A “stash” (of changes to be committed)
    • Some configuration
    • (a few other things not mentioned here).
  • Except for the working copy itself, everything in a Git repository is stored in the “.git” directory within your working copy.
  • A “bare” Git repository doesn’t have a working copy (or stash). The data that would normally be inside “.git” is instead contained directly inside the repository root folder. A repository hosted on a server is often a “bare” repository; when you clone it, your clone will not be bare, so that you can perform checkouts / make commits etc.
  • Because you make commits to a local branch, a commit operation does not contact the origin server. This means that commits may occur in the origin repository simultaneously. A merge will be needed before the local branch commits can be pushed to the remote repository.
  • Git versions represent a snapshot of the repository state. They are identified by an SHA-1 checksum (of both file contents and some metadata, including version creation date and author). A Git version has a preceding version, and may have multiple preceding versions if it is the result of a merge.
  • To avoid complicating the commit history, you can rebase commits made to your local branch, which means that the changes they make are re-applied after the changes from remote commits. This re-writes the history of your commits, effectively making it appear that your commits were performed after the remote changes, instead of interleaved with them. (You can’t rebase commits after you have pushed them).

C++ and pass-by-reference-to-copy

I’m sure many are familiar with the terms pass-by-reference and pass-by-value. In pass-by-reference a reference to the original value is passed into a function, which potentially allows the function to modify the value. In pass-by-value the function instead receives a copy of the original value. C++ has pass-by-value semantics by default (except, arguably, for arrays) but function parameters can be explicitly marked as being pass-by-reference, with the ‘&’ modifier.

Today, I learned that C++ will in some circumstances pass by reference to a (temporary) copy.

Interjection: I said “copy”, but actually the temporary object will have a different type. Technically, it is a temporary initialised using the original value, not a copy of the original value.

Consider the following program:

#include <iostream>

void foo(void * const &p)
{
    std::cout << "foo, &p = " << &p << std::endl;
}

int main (int argc, char **argv)
{
    int * argcp = &argc;
    std::cout << "main, &argcp = " << &argcp << std::endl;
    foo(argcp);
    foo(argcp);
    return 0;
}

What should the output be? Naively, I expected it to print the same pointer value three times. Instead, it prints this:

main, &argcp = 0x7ffc247c9af8
foo, &p = 0x7ffc247c9b00
foo, &p = 0x7ffc247c9b08

Why? It turns out that what we end up passing is a reference to a temporary, because the pointer types aren’t compatible. That is, a “void * &” cannot be a reference to an “int *” variable (essentially for the same reason that, for example, a “float &” cannot be a reference to a “double” value). Because the parameter is tagged as const, it is safe to instead pass a temporary initialised with the value of of the argument – the value can’t be changed by the reference, so there won’t be a problem with such changes being lost due to them only affecting the temporary.

I can see some cases where this might cause an issue, though, and I was a bit surprised to find that C++ will do this “conversion” automatically. Perhaps it allows for various conveniences that wouldn’t otherwise be possible; for instance, it means that I can choose to change any function parameter type to a const reference and all existing calls will still be valid.

The same thing happens with a “Base * const &” and “Derived *” in place of “void * const &” / “int *”, and for any types which offer conversion, eg:

#include <iostream>

class A {};

class B
{
    public:
    operator A()
    {
        return A();
    }
};

void foo(A const &p)
{
    std::cout << "foo, &p = " << &p << std::endl;
}

int main (int argc, char **argv)
{
    B b;
    std::cout << "main, &b = " << &b << std::endl;
    foo(b);
    foo(b);
    return 0;
}

Note this last example is not passing pointers, but (references to) object themselves.

Takeaway thoughts:

  • Storing the address of a parameter received by const reference is probably unwise; it may refer to a temporary.
  • Similarly, storing a reference to the received parameter indirectly could cause problems.
  • In general, you cannot assume that the pointer object referred to by a const-reference parameter is the one actually passed as an argument, and it may not exist once the function returns.

OpenGL spec for glDrawRangeElementsBaseVertex is rubbish

The title says it all: the spec for the glDrawRangeElementsBaseVertex function is rubbish.

glDrawRangeElementsBaseVertex is a restricted form of glDrawElementsBaseVertex.

Ok, but:

mode, start, end, count and basevertex match the corresponding arguments to glDrawElementsBaseVertex, with the additional constraint that all values in the array indices must lie between start and end, inclusive, prior to adding basevertex.

glDrawElementsBaseVertex doesn’t have a start or end argument. Perhaps the above should say “mode, count, type, indices and basevertex”, since type and indices seem to have the same meaning for both functions?

Index values lying outside the range [start, end] are treated in the same way as glDrawElementsBaseVertex

But… you just said that all the index values must be inside that range. Perhaps substitute “outside” with “inside” to make this sentence make sense?

Does no-one proof-read this stuff? bug submitted.

Update: so it turns out that ‘in the same way as glDrawElementsBaseVertex’ is supposed to mean ‘in an implementation-defined manner consistent with how similarly out-of-range indices are treated by glDrawElementsBaseVertex’. I feel like the wording could be much clearer but I’m not going to argue this one. The parameter specifications are clearly incorrect and this should be fixed in a revision.