Bus1 is the new Kdbus

For some time there have been one or two developers essentially trying to move part of D-Bus into the kernel, mainly (as far as I understand) for efficiency reasons. This has so far culminated in the “Bus1” patch series – read the announcement from here (more discussion follows):

Bus1 is a local IPC system, which provides a decentralized infrastructure to share objects between local peers. The main building blocks are nodes and handles. Nodes represent objects of a local peer, while handles represent descriptors that point to a node. Nodes can be created and destroyed by any peer, and they will always remain owned by their respective creator. Handles on the other hand, are used to refer to nodes and can be passed around with messages as auxiliary data. Whenever a handle is transferred, the receiver will get its own handle allocated, pointing to the same node as the original handle.

Any peer can send messages directed at one of their handles. This will transfer the message to the owner of the node the handle points to. If a peer does not posess a handle to a given node, it will not be able to send a message to that node. That is, handles provide exclusive access management. Anyone that somehow acquired a handle to a node is privileged to further send this handle to other peers. As such, access management is transitive. Once a peer acquired a handle, it cannot be revoked again. However, a node owner can, at anytime, destroy a node. This will effectively unbind all existing handles to that node on any peer, notifying each one of the destruction.

Unlike nodes and handles, peers cannot be addressed directly. In fact, peers are completely disconnected entities. A peer is merely an anchor of a set of nodes and handles, including an incoming message queue for any of those. Whether multiple nodes are all part of the same peer, or part of different peers does not affect the remote view of those. Peers solely exist as management entity and command dispatcher to local processes.

The set of actors on a system is completely decentralized. There is no global component involved that provides a central registry or discovery mechanism. Furthermore, communication between peers only involves those peers, and does not affect any other peer in any way. No global communication lock is taken. However, any communication is still globally ordered, including unicasts, multicasts, and notifications.

Ok, and maybe I’m missing something, but: replace “nodes” with “unix domain sockets” and replace “handles” with “file descriptors” and, er, haven’t we already got that? (Ok, maybe not quite exactly – but is it perhaps good enough?)


Sockets represent objects of a local peer, while descriptors represent descriptors that point to a socket. Sockets can be created and destroyed by any peer, and they will always remain owned by their respective creator.


Descriptors on the other hand, are used to refer to socket connections and can be passed around with messages as auxiliary data. Whenever a descriptor is transferred, the receiver will get its own descriptor allocated, pointing to the same socket connection as the original handle.

It’s already possible to pass file descriptors to another process via a socket. Technically passing a file descriptor connected to a socket gives the other peer the same connection to the socket, which is probably not conceptually identical to passing handles, which (if I understand correctly) is more like having another connection to the same socket. But could it be so hard to devise a standard protocol for requesting a file descriptor with a secondary connection, specifically so that it can be passed to another process?

Any peer can send messages directed at one of their file descriptors. This will transfer the message to the owner of the socket the descriptor points to. If a peer does not posess a descriptor to a given socket, it will not be able to send a message to that socket. That is, descriptors provide exclusive access management. Anyone that somehow acquired a descriptor to a socket is privileged to further send this descriptor to other peers. As such, access management is transitive. Once a peer acquired a descriptor, it cannot be revoked again. However, a socket owner can, at anytime, close all connections to that socket. This will effectively unbind all existing descriptors to that socket on any peer, notifying each one of the destruction

right? (except that individual connections to a socket can be “revoked” i.e. closed, which is surely an improvement if anything).

Unlike sockets and file descriptors, peers cannot be addressed directly. In fact, peers are completely disconnected entities. A peer is merely an anchor of a set of sockets and file descriptors, including an incoming message queue for any of those. Whether multiple sockets are all part of the same peer, or part of different peers does not affect the remote view of those. Peers solely exist as management entity and command dispatcher to local processes.

I suspect the only difference is that each “peer” has a single receive queue for all its nodes, rather than one per connection.

The set of actors on a system is completely decentralized. There is no global component involved that provides a central registry or discovery mechanism. Furthermore, communication between peers only involves those peers, and does not affect any other peer in any way. No global communication lock is taken. However, any communication is still globally ordered, including unicasts, multicasts, and notifications.

I think this is meant to read as “no, it’s not the D-Bus daemon functionality being subsumed in the kernel”.

But as per all above, is Bus1 really necessary at all? Is multicasting to multiple clients so common that we need a whole new IPC mechanism to make it more efficient? Does global ordering of messages to different services ever actually matter? I’m not really convinced.


Cgroups v2: resource management done even worse the second time around


It’s with some bemusement that I watch various Linux kernel developers flounder in their attempts to produce decent, process-hierarchy based, resource control. The first real attempt at this was Cgroups (for “control groups”; also known as “cgroups” due to the emerging trend of refusing to capitalise proper nouns, see also [Ss]ystemd). The original Cgroups interface is known as Cgroups-v1, and is described in the kernel documentation. The main principles are pretty straightforward: you mount a special “cgroup” file system somewhere, and create directories in it to represent control groups whose resource usage you want to limit; each directory created will automatically contain a bunch of files used to control the group. You can create nested hierarchy such that there are groups within other groups, and the nested groups share the resources of their parent group (and may be further limited). You move a process into a group by writing its PID into one of the group’s control files. A group therefore potentially contains both processes and subgroups.

The two obvious resources you might want to limit are memory and CPU time, and each of these has a “controller”, but there are potentially others (such as I/O bandwidth), and some Cgroup controllers don’t really manage resource utilisation as such (eg the “freezer” controller/subsystem). The Cgroups v1 interface allowed creating multiple hierarchies with different controllers attached to them (the value of this is dubious, but the possibility is there).

Importantly, processes inherit their cgroup membership from their parent process, and cannot move themselves out of (or into) a cgroup unless they have appropriate privileges, which means that a process cannot escape its any limitations that have been imposed on it by forking. Compare this with the use of setrlimit, where a process’s use of memory (for example) can be limited using an RLIMIT_AS (address space) limitation, but the process can fork and its children can consume additional memory without drawing from the resources of the original process. With Cgroups on the other hand, a process and all its children draw resources from the containing group.

Cgroups are all very nice, in principle, but the interface was perceived to have some problems, leading to an effort to produce an improved interface – Cgroups v2 – and this is where the fun begins.

Cgroups v2

The second version of the Cgroups interface is also described in kernel documentation; the most interesting part from the perspective of the issues that will be discussed here are in an appendix, titled “R. Issues with v1 and Rationales for v2”. Let’s take a look at these issues:

1. Multiple hierarchies

The argument is that having multiple hierarchies is messy from an interface viewpoint and it also limits and/or complicates the implementation. Cgroups v2 has a “unified hierarchy” where you can enable or disable controllers at the group level (disabling a controller effectively flattens the hierarchy at that point, from the viewpoint of that particular controller only; so there are still multiple hierarchies in a sense, but they are limited to sharing the same overall structure). I think this makes sense.

2. Thread granularity

The v1 interface assigned individual threads to control groups; in v2, processes are assigned. This makes sense because the hierarchy is really a process hierarchy; processes are children of other processes, not of threads. Many resources are arbitrated at the process level (eg threads all share the same memory), and those that aren’t can be dealt with in other ways (it’s already possible to assign different processing priorities to different threads, for example).

There’s another point raised during the discussion of process-vs-thread granularity however:

cgroups were delegated to individual applications so that they can create and manage their own sub-hierarchies and control resource distributions along them. This effectively raised cgroup to the status of a syscall-like API exposed to lay programs.

I’m having a hard time understanding why this is a bad thing. Shouldn’t a process be able to limit the resource allocation to its children? Wouldn’t that be a good thing?

cgroup controllers implemented a number of knobs which would never be accepted as public APIs because they were just adding control knobs to system-management pseudo filesystem. cgroup ended up with interface knobs which were not properly abstracted or refined and directly revealed kernel internal details. These knobs got exposed to individual applications through the ill-defined delegation mechanism
effectively abusing cgroup as a shortcut to implementing public APIs
without going through the required scrutiny.

I think the argument here is essentially that some controllers exposed things they shouldn’t have. That may well be true, but is hardly an argument for a complete re-design of the whole interface.

3. Processes and child groups with common parent

Cgroup v1 allowed processes and sub-groups to coexist within a parent group; specifically (and this is where the bollocks kicks in):

cgroup v1 allowed threads to be in any cgroups which created an interesting problem where threads belonging to a parent cgroup and its children cgroups competed for resources. This was nasty as two different types of entities competed and there was no obvious way to settle it. Different controllers did different things.

Cgroups v2 no longer allows a group with controllers enabled (except for the root group) to contain both processes and subgroups, but this is an awkward limitation that achieves nothing (other than perhaps a slight simplification of implementation at the cost of usability). It’s completely untrue that there is “no obvious way to settle it”. Consider the resource distribution models outlined in the same document:

  1. Weights – clearly processes have the default weight (specified as 100). Child groups have an assigned weight (as they already do).
  2. Limits – processes are unlimited (except by limits inherited through ancestor groups). Child groups have the specified limit.
  3. Protections – processes have no protection (except what is inherited). Child groups have the specified resource protection.
  4. Allocations – processes have no allocation (except what is inherited). Child groups have the specified resource protection.

In all cases above, it is completely clear how to settle competition for resources between processes and child groups.

(I would suggest that a nice additional feature would be the ability to specify child groups as having the combined weight of their children  – an option that is not currently available, but which could potentially be useful if you don’t want to control some particular resource at the level of that child).

The io controller implicitly created a hidden leaf node for each cgroup to host the threads [… which] made the interface messy and significantly complicated the implementation.

Well, that sucks. The IO controller should have been fixed as per above (and if you want to control the resources of the immediate process children as a group, then create a group for them, duh). There’s no need for the creation of hidden leaf nodes; the processes are leaf nodes. The fact that one or two controllers did something stupid / had implementation flaws is not a good argument for prohibiting groups from containing both process nodes and child groups. What’s more:

The root cgroup is exempt from this restriction. Root contains processes and anonymous resource consumption which can’t be associated with any other cgroups and requires special treatment from most controllers. How resource consumption in the root cgroup is governed is up to each controller.

So each controller is required to support contention between processes and groups anyway. That pretty much knocks the “it simplifies the implementation” argument on the head.

This clearly is a problem which needs to be addressed from cgroup core in a uniform way.

No; it’s a problem that needs to be addressed in the controllers in a uniform way. This restriction just makes the interface clunky.

As a basic example, let’s say I have some process that wants to run a child process with resource constraints. The first step then is to create a cgroup (call it B) which is a child of my own cgroup (A). I then enable some controller for B (note that I can’t enable the controller on A, since it contains processes). I then have to create another cgroup (C) inside B, so that I can move my child process into it.

Yes, that’s right: I had to create two cgroups just to control the resource allocation for one process. And this wasn’t even necessary with the old v1 interface. Things actually got worse.

Another example: let say I want to run a child process, but make sure it and any children it spawns get the combined timeslice that would have been given to the one process (in other words, I want it to have a regular fair share of processor time, but I don’t want it soaking up more cycles by forking a bunch of children). I can’t actually do this at all with the v2 interface: the closest I can come is [ed: put the child in a nested group as described above and then] put some hard limit on the group’s processor allocation, but that won’t reflect the dynamically changing amount of time that could be allocated to the parent process. (I could put the parent process in a group as well, but then it doesn’t get equal distribution with it siblings). The v1 interface at least made this potentially possible (though the necessary controller was not implemented, it seems).

Other points to note

Cgroups have a mechanism to notify userspace when a group becomes empty; this has vastly improved between v1 and v2, thankfully (for v1 you had to specify a binary to be launched as a notification, urgh).

Systemd uses the properties of Cgroups to realise a form of better session management: whereas even unprivileged processes can “daemonise” to escape their process hierarchy, they cannot so easily escape a control group. Unfortunately, however, there is no way to reliably kill off a group. The approach used by Systemd is to iterate through the process ids in the group and send a signal to each one; however, this is racy – the process may have died in the meantime, and even worse, the process ID may have been recycled and given to a new process (that’s right – Systemd potentially kills off the wrong process). It would be really nice if it were possible to send a signal to an entire group atomically.


Cgroups v2 gives us a unified hierarchy, improved “empty group” notification, and an annoying interface quirk which makes it in many respects more complicated than the v1 interface. Two steps forward, one backwards, I guess. On the plus side, I think the issue with the v2 interface should be fixable in a 100% backwards-compatible way. Whether that will actually happen or not, time will tell.

POSIX timer APIs are borked

I’m currently working on Dasynq, an event loop library in C++ (which is not yet in a state of being ready for use by external projects, though the functionality it currently exposes does work correctly as far as I know). It has come to the point where I want to add timer functionality, and this has been frustratingly tricky, mostly due to horribly designed APIs.

There are a few basic requirements to set out before I start:

  • There are essentially two types of timer – relative and absolute. I either want the timer to expire some given interval from now, or I want it to expire at some specific (“wall clock”) time. In the latter case, if the system time is changed the timeout should be suitably adjusted. (Example: if I set an alarm for 04:00, and the system change is changed by the user from 03:25 to 04:15, the alarm should expire immediately).
  • I want to be able to be sure that I can use a timer, at some point in the future. That is, I need to be able to allocate timers in advance (without necessarily arming them immediately) or at least to be able to re-set an existing timer to a specified timeout. I should be able to avoid the situation where I need a timer, which I knew I would need in advance, but am unable to create one due to resource limits / exhaustion.
  • I need a reasonable level of resolution. Timers should be usable for everything from running weekly tasks to animation timing.

With the above points in mind, let’s take a look at what POSIX provides.

POSIX timer APIs

First, the most basic timer-like call provided by POSIX is the alarm(…) function. It has second granularity, which rules it out immediately.

Then, there’s setitimer(…). This isn’t a particularly nice interface and delivers timer expiry via a signal. There is only one timer (well, there is one timer for each of several of different kinds of clock), which means that to allow for multiple timers to be managed we need to essentially multiplex the single timer; by itself this isn’t such a huge problem, but other limitations of the API make it fundamentally difficult, and the API is pretty broken to begin with in several ways.

The first problem with setitimer is that the interval timers deliver timeout events via signals, which is awkward, especially since setitimer itself is not async-signal-safe, meaning you can’t call it from within the signal handler to set the next desired timeout; if you want to multiplex multiple timeouts over the interval timer interface, you’re forced to turn asynchronous event notifications into synchronous events (which of course is what libraries like Dasynq are all about, so by itself this isn’t a huge problem).

The next problem with setitimer is that it only allows setting a relative timeout. If I want to get a timer notification at an absolute time, I need to get the current clock time (clock_gettime), calculate the time remaining until that time, and then set the timer. Not allowing an absolute timeout means that setting an alarm for a wall-clock time is pretty much impossible – since if the system time is changed by the user, the timer’s timeout interval won’t be adjusted. However, there’s a more subtle issue here: time might elapse between the calculation and setting the timer – the process could be preempted just after calculating the interval, and in unusual cases might not be scheduled again for a significant period of time. By the time it finally arms the timer, the interval is significantly incorrect. The only work to work around this (that I can think of, other than pretending that the problem doesn’t exist) is to check the clock time immediately after setting the interval, to make sure that it’s within a certain window of tolerance of the original measurement – and if not, to re-calculate the interval and reset the timer.

The safety check just described requires a minimum of two clock_gettime calls (which generally means two calls into the kernel) just for setting a timer. If multiple timers are being managed over the top of a single interval timer, that’s going to mean that two clock_gettime calls are required on each timer expiry.

Generalised POSIX timer interface

The real-time POSIX extensions also define timer_create, which appears to solve some of the problems above:

  • It allows the creation of multiple independent timers
  • It allows for specifying absolute (as well as relative) timeouts, and when using the realtime clock the timeout interval will be adjusted appropriately (for absolute timeouts) if the system time is altered

However, notification is still either via a signal, or via a thread (SIGEV_THREAD); the latter is problematic for implementations, because there is usually no way to detect notification failure if a thread cannot be created due to resource limits, and because it requires userspace support; on Linux you need to link with -lrt (and thus also pull in the pthreads library) to use timer_create etc, even if you don’t use SIGEV_THREAD. On OpenBSD the situation is worse – the realtime extensions are generally not supported, and create_timer et al are not available at all.

Even using timers with SIGEV_SIGNAL notifications is less than ideal. Using such timers in different threads requires cooperation between all threads, to either choose different signal numbers for notification or otherwise to have a common signal handler somehow suitably dispatch notifications to the correct thread.

Non-POSIX solutions

On Linux, timerfds (timerfd_create et al) provide an apparently sane solution to the whole messy problem – it supports multiple timers, supports absolute and relative timeouts, and delivers events by file handle notifications (can be used with select/poll/epoll etc). A large number of timers could be feasibly multiplexed over a single timerfd, which is good from a resource management perspective.

On OpenBSD (and various other BSDs) there is the option of using kqueue timers. This supports multiple timers, and neatly solves the problem of notification; however it only allows relative timeouts, and the timeout/interval cannot be changed, meaning that multiple timers cannot be multiplexed over a single kqueue timer. Even worse, it is not possible to pre-allocate timers; once created, they begin countdown immediately. This makes it impossible to discover resource allocation failure until the point that the timer is actually needed.

In Conclusion

The POSIX timer APIs are awkward and clunky. The setitimer functions support only limited use cases.  On the other hand, the generalised interface would be difficult to use in a library (since it either requires signal handling or multi-threading). On Linux, the timerfd interface is an ideal substitute. On other systems the general timer interface can be used, with some caveats and trade-offs, but it is not always available; on systems where it is not, and there is no system-specific replacement, the only option for wall-clock timers is to assume that the system clock does not change (other than by the usual tick) while the system is running.

Why do we keep building rotten foundations?

APIs are like bones: sometimes you have to break them to mend them, and it fucking hurts.

It’s not a perfect analogy, but it reflects an observation of (superficially) pragmatic behavioural tendencies, even if not a necessary truth. A broken implementation is bad, but it can be repaired. A broken API can be repaired too, but anything built on it then also needs to be fixed. And often when this occurs we like to throw the baby away with the bath water, start afresh, make a new clean design where we apply the lessons learned in the past attempts and produce something that, this time, will be perfect. Except, of course, we don’t always remember all the lessons, and it never is.

For instance, look at GTK. There was a recent blog post detailing plans for GTK development (and a further followup), which starts by discussing:

… the desire to create a modern toolkit with new features vs. the need to keep a stable API.

… which in my opinion, should not be a conflict, but let’s work our way there. Some choice snippets:

… and created a hesitation to expose “too much” API for fear of having to live with it “forever”.

(A legitimate concern, and one that gets to the heart of the matter, but let’s stick to the piece for now).

We want to improve this, and we have a plan.

And of course I read this and I am worried.

“Don’t worry Mr B, I have a cunning plan to solve the problem”

 We are going to increase the speed at which we do releases of new major versions of Gtk (ie: Gtk 4, Gtk 5, Gtk 6…). We want to target a new major release every two years.


The new release of Gtk is going to be fully parallel-installable with the old one. Gtk 4 and Gtk 3 will install alongside each other in exactly the same way as Gtk 2 and Gtk 3 — separate library name, separate pkg-config name, separate header directory. You will be able to have a system that has development headers and libraries installed for each of Gtk 2, 3, 4 and 5, if you want to do that.

No! DO NOT WANT! It’s bad enough having Gtk 2 and Gtk 3 (thankfully Gtk 1 is long gone), and you want me to have to litter my system with even more Gtk-sized turds… Please, no.

Oh well, I guess it can’t get any wors-

Meanwhile, Gtk 4.0 will not be the final stable API of what we would call “Gtk 4”. Each 6 months, the new release (Gtk 4.2, Gtk 4.4, Gtk 4.6) will break API and ABI vs. the release that came before it.

Oh for fucks sake, really? So your “major new release every two years” is, in effect, actually a major new pain-in-the-proverbial every 6 months fuck fuck fuck.

We will, of course, bump the soname with each new incompatible release — you will be able to run Gtk 4.0 apps alongside Gtk 4.2 and 4.4 apps, but you won’t be able to build them on the same system

Right, so every 6 months there will effectively be a new version of GTK with its own set of libraries and its own themes (getting to that, bear with me) and our systems will become a myriad of GTK because all the new apps developed during this time aren’t going to have any clue about which version they should target and trying to keep up the with latest API is going to be a futile effort because it’ll pretty much keep changing out from under your feet. Shit, there are still apps making the transition from Gtk 2 to Gtk 3 and they’ve literally had years. (I could at this point tell you how I much prefer Gtk 2 apps anyway, but there’s probably enough material there for a whole other blog post).

Before each new “dot 0” release, the last minor release on the previous major version will be designated as this “API stable” release. For Gtk 4, for example, we will aim for this to be 4.6 (and so on for future major releases).

I could mention that this seems like exactly the opposite of how major versions should work but, oops I just did. Just to illustrate:

4.0: development, 4.1: development, 4.2: development, …, 4.6: stable, 5.0: development, …

I mean, if 4.6 is the culmination of a series of changes leading to the next major release, then why isn’t it the next major release? Since when does “X.0” not designate a stable release?

“Gtk 4.0” is the first raw version of what will eventually grow into “Gtk 4”, sometime around Gtk 4.6

Oh lordy just listen to yourself for one second.

But really, forget about the ridiculous version numbering scheme; the real problem is the rapid-fire development with lack of API/ABI stability producing a plethora of incompatible versions. Why is this even necessary? Could I suggest that, rather than pumping out new and exciting APIs/features at a faster pace, what GTK really needs to do is sit down and flesh out a decent API that it won’t have to break every 6 months, for crying out loud. Because people are going to try to build software on top of your crappy toolkit, and you should accept some responsibility for the API design rather than kicking them in the nether regions twice a year. And if you really don’t want people to try and keep up with the latest and greatest and expect them to keep developing on GTK 3 despite that fact that you’re up to version 4.4 (or whatever) then at least use a sensible versioning scheme that reflects the notion of development and instability without having to learn that GTK devs used a different scheme than the rest of the software world, just because they felt like it.

this gives many application authors and desktop environments something that they have been asking for for a long time: a version of Gtk that has the features of Gtk 3, but the stability of Gtk 2.

You could give them that right now if you would just stop screwing around with the API. An API is stable if you don’t mess with it. And if you’re having to mess with your API on a continual basis, you’re doing it wrong. Like, the whole software development thing, all wrong.

By all means, declare parts of your API as unstable, document them as such and adjust them during a period of development, and then declare them stable (and never touch them again!). But the notion that you think it’s OK to potentially break the whole thing (or more accurately, any part of the whole thing) at any moment is disturbing. (Maybe that’s not what’s really meant; I certainly hope so, but it’s not clear). This idea that you’ll bang away in a frenzy with your software hammers for a few years and what comes out at the end will be “stable” is just hokey. It’s the wrong approach. You’re off the runway and into the harbour.

I really wonder, would it have been so hard to have GTK 3 add to the GTK 2 API rather than actually break it? I mean I know there’s the whole CSS thing (which still feels like a sick joke, frankly, especially because of the half-arsed implementation) and HiDPI support, but could these not have been implemented on top of the existing API? Could not the GDK API have been implemented on top of Cairo (if it wasn’t already), and retained for backwards compatibility? Etc etc? Yes, it’d be more work – much more work, perhaps – but that work only needs to get done once, rather than in each separate application that builds on the toolkit. And if it’d had been thought through properly at the start, it could have given us a new toolkit with a lot less pain.

The Wikipedia entry for GTK claims:

The most common criticism towards GTK+ is a lack of backwards-compatibility in major updates, most notably in the API[21] and theming.[22]

(I promise I did not edit that in myself. And I mentioned theming earlier, so I should elaborate: not only are GTK 2 themes not compatible with GTK 3, but GTK 3 themes aren’t either… it seems GTK 3.18 themes don’t generally work well with GTK 3.20, for instance).

That it’s so generally acknowledged that the API stability sucks is really telling.

GTK is a foundation library. And it’s rotten, and they know its rotten. And they keep tearing it down and replacing it – with another rotten foundation. And we all suffer because of it.

Note: I started out writing this post intending to discuss API breakage in general; unfortunately GTK is such an easy target that the whole post became about the one toolkit.

Boost.Asio and resource deallocation


Boost.Asio is a cross-platform C++ library for network and low-level I/O programming that provides developers with a consistent asynchronous model using a modern C++ approach.

Ok. Also (“Threads and Boost.Asio“):

io_service provide a stronger guarantee that it is safe to use a single object concurrently

Multiple threads may call io_service::run() to set up a pool of threads from which completion handlers may be invoked.

Oh… so it’s thread-safe, and can dispatch events on multiple threads? That’s great! Let’s write a web server. Let’s see, we can set up a basic_stream_socket to handle an incoming connection:

    boost::asio::io_service io;

    // start some threads which call io.run() (implementation not shown)

    // supposed we've accepted a connection and have native handle in fd
    auto bss = new boost::asio::ip::tcp::socket(io, ..., fd);

Then we get a request and we set up an asynchronous write to the client:

    WriteHandler handler = ...; // (not an actual type, example only)
    bss->async_write_some(boost::asio::buffer(data, size), handler);

Ok; now the handler gets notified when the write completes (or fails) and can proceed to write the next chunk as appropriate.

Oh, but then we detect overload (or get shutdown, or …) and we need to drop the connection:


According to documentation for close:

This function is used to close the socket. Any asynchronous send, receive or connect operations will be cancelled immediately, and will complete with the boost::asio::error::operation_aborted error.

Great, now we can delete the socket:

    delete bss;

Aaannnnnd segfault (in another thread).

What happened? Well, it turned out that the write handler had already been called in another thread, just before we called close() on the socket. That handler is now running with the expectation that the socket still exists, and when it tries to access the now-deleted socket (to write the next chunk of data for instance) everything goes brown.

Even if close() were to wait for the handler to finish executing before it returns, that would mean that our current thread is now blocked on an (unbounded) operation in another thread. What we really want, of course, is a way to close the socket and have an asynchronous callback that tells me when there are no more pending asynchronous operations on it, i.e. that it is safe to delete any associated data (including the socket object itself).

Of the course the above is a somewhat oversimplified example, but it demonstrates the essence of the problem; an event source has associated data; at some point, you don’t need the event source any more, and you want to free the data, but you need to be sure there are no pending events before you do. Boost.Asio doesn’t make this as easy as it needs to be.

To be clear: this is an API design issue, not really a bug. I realise that the problem isn’t technically in the Boost code, but the burden of managing this issue without support from the library is significant, and furthermore is a fundamental problem in multi-threaded event loop handling.  (One workaround is to maintain shared_ptr references to the relevant data; the write handler itself needs to maintain one reference in order to prevent the data being swept out from underneath it, as it were. But this would most definitely be a kludge).

D-Bus is completely unnecesary

So for various reasons I’ve got to thinking about D-Bus, the “system bus” that’s found its way into most (all?) Linux distributions and possibly some other OSes as well. I’m more and more coming to believe that D-Bus is too heavyweight a solution for the problems it was intended to solve – mostly because it requires a daemon, and because we don’t really the bus aspect (at least, not in the form in which it’s been implemented).

How did I come to this conclusion? Well, consider first that, in essence, D-Bus provides only the following functionality:

  • Protocol (and associated library) for a simple RPC mechanism (you can call a method in a “remote” process, and receive a reply).
  • Basic security for this RPC. Basically the policy specification allows different between different users and groups, and whether the user is “at the console”.
  • Map service names to the processes that provide the service (for purposes of connecting to said service).
  • “Bus activation”, i.e. starting a process which provides a service when the given service is requested but is not currently being provided.

That’s really all there is to it. The fact that these are the only functions of D-Bus is, in my view, a good thing; it fits with the mantra of “do one thing, and do it well”. However, the question is if D-Bus is actually needed at all. Consider the “traditional approach” where services each have their own unix-family socket and listen on that for client requests. How does that compare to the feature set of D-Bus? Well:

  • Protocol is application specific (D-Bus wins)
  • Security configuration is also application specific (D-Bus wins)
  • Service is determined by socket path (about as good as D-Bus, since service names and socket paths are both pretty arbitrary)
  • No activation – server needs to be running before client connects (D-Bus wins)

So, in general, D-Bus wins, right? But wait – there are a few things we haven’t considered.

Like libraries.

Yes, protocols for communication with server programs via a socket are generally application specific, but on the other hand, there is usually a library that implements that protocol, which takes some of that pain away. Of course these libraries are language-specific, so there is a duplication of effort required in adapting library interfaces to other programming languages, but there are tools like SWIG which can assist here. Also, there’s the possibility that services could offer a standard protocol (similar to the D-Bus protocol, even) for communication over their respective sockets. It turns out there are options available which give some basic functionality; I’m sure that it wouldn’t be too hard to come up with (or find [1] [2]) something which offer basic introspection as well. Heck, you could even use the D-Bus protocol, just without the intermediary server!

How about security? Well, because sockets themselves can be assigned unix permissions, we actually get some of the D-Bus security for free. For any given service, I can easily restrict access to a single user or a single group, and if I wanted to I could use ACLs to give more fine-grained permissions. On the other hand, central configuration of security policy can be nice. But that could be implemented by a library, too. If only there was some pre-existing library specifically for this purpose… ok, so PAM isn’t really meant for fine-grained security either, since it breaks things down to the service level and not any further. But it wouldn’t be hard to come up with something better – something that allows fine-grained policy, with significantly greater flexibility than D-Bus offers.

As mentioned above, there’s pretty much no difference between choosing a service name and choosing a socket path, so I don’t need to say much about this. However, it would certainly be possible to have a more standard socket path location – /var/run/service/name or something similar.

That just leaves activation as the only issue. D-Bus can start a process to provide a requested service. But it happens that this can be done quite easily with plain unix sockets, too; it’s called “socket activation” in the SystemD world, and it’s supported by various other service managers as well (including my own still-in-alpha-infancy Dinit). You could even potentially have service handover (processes queuing to own the socket), just like D-Bus has, though I don’t know if any existing service managers can do this (and on the other hand, I’m not even convinced that it’s particularly valuable functionality).

So: replace the protocol and security with libraries (potentially even a single library for both), and use your service management/supervision system for activation. Why do we actually need the D-Bus daemon?

I have a vague idea that some library might spring into existence which implements an abstraction over the D-Bus API but using the ideas presented here (which, let’s face it, are astounding only in their simplicity) instead of communicating with a single central daemon. That is, connecting to a D-Bus service would just connect to the appropriate (some name-mangling required) socket within /var/run/service, and then speak the D-Bus protocol (or another similar protocol) to it. Likewise, requesting a service would instead attempt to listen on the corresponding socket – for unprivileged processes this might require assistance from a SUID helper program or perhaps the requisite functionality could be included in the service manager. Unfortunately I’ve got too many other projects on my plate right now, but maybe one day…


Maven in 5 minutes

Despite having been a professional Java programmer for some time, I’ve never had to use Maven until now, and to be honest I’ve avoided it. Just recently however I’ve found myself need to use it to build a pre-existing project, and so I’ve had to had to get acquainted with it to a small extent.

So what is Maven, exactly?

For some reason people who write the web pages and other documentation for build systems seem to have trouble articulating a meaningful description of what their project is, what it does and does not do, and what sets it apart from other projects. The official Maven website is no exception: the “what is Maven?” page manages to explain everything except what Maven actually is. So here’s my attempt to explain Maven, or at least my understanding of it, succintly.

  • Maven is a build tool, much like Ant. And like Ant, Maven builds are configured using an XML file. Whereas for Ant it is build.xml, for Maven it is pom.xml.
  • Maven is built around the idea that a project depends on several libraries, which fundamentally exist as jar files somewhere. To build a project, the dependencies have to be download. Maven automates this, by fetching dependencies from a Maven repository, and caching them locally.
  • A lot of Maven functionality is provided by plugins, which are also jar files and which can also be downloaded from a repository.
  • So, yeah, Maven loves downloading shit. Maven builds can fail if run offline, because the dependencies/plugins can’t be downloaded.
  • The pom.xml file basically contains some basic project information (title etc), a list of dependencies, and a build section with a list of plugins.
  • Dependencies have a scope – they might be required at test, compile time, or run time, for example.
  • Even basic functionality is provided by plugins. To compile Java code, you need the maven-compiler-plugin.
  • To Maven, “deploy” means “upload to a repository”. Notionally, when you build a new version of something with Maven, you can then upload your new version to a Maven repository.
  • Where Ant has build targets, Maven has “lifecycle phases”. These include compile, test, package and deploy (which are all part of the build “lifecycle”, as opposed to the clean lifecycle). Running a phase runs all prior phases in the same lifecycle first. (Maven plugins can also provide “goals” which will run during a phase, and/or which can be run explicitly in a phase via the “executions” section for the plugin in the pom.xml file).
  • It seems that the Maven creators are fans of “convention over configuration”. So, for instance, your Java source normally goes into a particular directory (src/main/java) and this doesn’t have to specified in the pom file.

Initial Thoughts on Maven

I can see that using Maven means you have to expend minimal effort on the build system if you have a straight-forward project structure and build process. Of course, the main issue with the “convention over configuration” approach is that it can be inflexible, actually not allowing (as opposed to just not requiring) configuration of some aspects. In the words of Maven documentation, “If you decide to use Maven, and have an unusual build structure that you cannot reorganise, you may have to forgo some features or the use of Maven altogether“. I like to call this “convention and fuck you (and your real-world requirements)”, but at least it’s pretty easy to tell if Maven is a good fit.

However, I’m not sure why a whole new build system was really needed. Ant seems, in many respects, much more powerful than Maven is (and I’m not saying I actually think Ant is good). The main feature that Maven brings to the table is the automatic dependency downloading (and sharing, though disk space is cheap enough now that I don’t know if this is really such a big deal) – probably something that could have been implemented in Ant or as a simple tool meant for use with Ant.

So, meh. It knows what it can do and it does it well, which is more than can be said for many pieces of software. It’s just not clear that its existence is really warranted.

In conclusion

There you have it – my take on Maven and a simple explanation of how it works that you can grasp without reading through a ton of documentation. Take it with a grain of salt as I’m no expert. I wish someone else had written this before me, but hopefully this will be useful to any other poor soul who has to use Maven and wants to waste as little time as possible coming to grips with the basics.