Understanding the C/C++ memory model

I know that a lot of people struggle with understanding the memory model introduced in C11/C++11. In a recent conversation I was alerted to the existence of this blog post, which while correct (by my understanding) is in my opinion aimed at those who already have a good understanding of some of the underlying concepts. I’m going to try to set out my understanding of the model, which is hopefully correct, in more straightforward terms and with a more “ground up” explanation.

Edit: some redditors complained that the description of cache operation (such as it was) in the original version of this post contained technical inaccuracies, if interpreted as a description of current actual mainstream architecture. I believe those inaccuracies were not important for the understanding of the C/C++ memory model, however, they have now been rectified. Still: If you really want to understand anything about how processor architecture works, this is not the article for you. Rather, this is about a software model that is designed to abstract away much of that detail, and is deliberately light on hardware-level specifics.

Introduction: Program order versus memory order

You have your C/C++ program and it stores values (sets a variable or stores a value through a pointer) and it loads values (reads a variable or reads a value through a pointer). The order that your program specifies these loads and stores is the “program order”. However, there are a number of things that can cause the order in which the stores and loads actually occur to your system’s main memory to be different to that of the program order. It’s worth noting that the effect of these can never be visible to a single-threaded program, because the compiler must make the sure that the code behaves as if the loads/stores are in the order that the program specifies. Some examples:

  • the compiler will often cache values in registers rather than storing them to memory immediately;
  • the compiler may re-order reads and writes to memory when it can determine that such re-ordering won’t affect program behaviour (so long as the program obeys the language rules);
  • the processor may perform out-of-order execution in some cases;
  • even when the generated code stores values to memory rather than a register, the store might go into a processor cache and stay there for some time before it goes through to main memory;
  • similarly, loads may read values which are already in the cache, which means they don’t need to be read from main memory (at all);
  • finally, when a cache writes lines back out to main memory it may do so in an order that is different to the order in which the values entered the cache.

The good news is, you don’t really need to understand any of the above points: all you need to understand is that loads and stores can be re-ordered before they go to memory, but this won’t affect the actual behaviour of a single-threaded program.

Now for the important part: in a multi-threaded program, some of these re-orderings – together with some additional issues associated with the methods used by the hardware for maintaining cache coherency – also can cause different threads to see loads and stores made by other threads as occurring in a different order. It’s probably worth noting that a program in which this actually happened could well be exhibiting undefined behaviour – what should happen is that the program uses the memory consistency primitives to make sure that it sees a “correct” ordering. We’re coming to that now.

Atomic operations

If we forget about standard synchronisation tools such as mutexes, then there are two essential mechanisms to making ordering consistent between threads: atomic operations, and memory barriers.

Atomic operations are operations which complete indivisibly. One classic example is the “atomic increment”. A “classic” (non-atomic) increment can be broken into three separate parts: it loads a value from memory, it increments the value, and then it stores the value back to the same location in memory. In multi-threaded programs, the problem with this that the value might be changed by another thread in between the load and store, and that results in storing the wrong value; the intermediate store value is lost. In particular consider two threads that are both trying to increment the value starting at 0:

  1. Thread A loads the value 0
  2. Thread B loads the value 0
  3. Thread A increments the value, calculating a result of 1
  4. Thread B also calculates a result of 1
  5. Thread A stores the value 1
  6. Thread B stores the value 1

The problem is that the value has gone from 0 to 1, despite there being two increments – one of the increments got lost. (Again, if this actually happened in a C/C++ program, that program would be exhibiting undefined behaviour). An atomic increment happens as one step, atomically (indivisibly). So, to avoid the problem above, you could use an atomic integer type and an atomic increment operation.

Another classic atomic operation is “compare and exchange”. This can be used to implement a spin lock (a kind of mutex) – you treat a value of 0 as “unlocked” and a value of 1 as locked. Then, to lock the mutex, a thread can atomically compare the current value with 0 and, if it is equal to 0, set it to 1. The operation returns the original value – if 0, the lock was successfully, if 1 then another thread has the lock and so it must be retried until the other thread releases the lock (typically this would be done in a loop, “spinning” on the lock until it becomes available).

Before we go any further, let’s note the following important points about atomic operations:

  • Atomic operations on a particular object (variable) always have a total ordering which is consistent with program ordering. This means that every atomic operation from every thread which operates on the same atomic variable happens either before or after other every other operation, and if a thread does two operations one after the other then they will appear in the same sequence in the total ordering.
  • If an atomic operation is performed in one thread and another thread then performs another atomic operation on the same variable, the second thread will definitely see the effect of the operation performed by the first (assuming there are no other threads stepping in to perform other another operation in the meantime).

What atomic operations do not necessarily do is enforce ordering on any loads and stores that occur around them (I say necessarily because, as you may be aware, you can specifying memory ordering constraints when you perform an atomic operation*. We’re getting to that).

Consider the case of the spin lock that we discussed above: the lock is probably designed to protect some data structure. So you might have a sequence that looks like this:

  1. Thread A acquires lock
  2. Thread A stores values to data structure
  3. Thread A releases lock
  4. Thread B acquires lock
  5. Thread B reads values from data structure
  6. Thread C releases lock

The problem is that the atomic operations used to implement the lock by themselves don’t impose ordering on steps 2 and 5, so thread B might not see the values written by thread A, or even worse, might see some of the values but not all of them (this is a data race, and is not actually allowed to happen in a correct program; a correct program would use the appropriate memory ordering controls to prevent it from happening).

To prevent re-ordering of non-atomic loads and stores, we use barriers (aka fences).


A barrier prevents loads and stores from being re-ordered past the barrier, in one or in both directions. Specifically: an acquire barrier operation prevents loads/stores which are after the barrier from being re-ordered to appear before the barrier, and a release barrier prevents loads/stores which are before the barrier from being moved after the barrier (there are other types of barrier operation, but I won’t cover them here right now).

Through the use of appropriate barriers, we can almost fix our spinlock example:

  1. Thread A acquires lock
  2. Thread A issues an acquire barrier
  3. Thread A stores values to the data structure
  4. Thread A issues a release barrier
  5. Thread A then releases the lock
  6. Thread B acquires the lock
  7. Thread B issues an acquire barrier
  8. Thread B reads values from the data structure
  9. Thread B issues a release barrier
  10. Thread B then releases the lock

There is still one minor issue, which is that the operations which actually acquire and release the lock could be moved past their corresponding barriers (e.g. step 5 could be re-ordered in front of step 4). There are two ways that C++11 resolves this:

First, you can combine atomic operations which acquire or release some lock with a barrier, so that the operation and the barrier are essentially indivisible (this is performed by passing the appropriate consistency constraint constant as a parameter to the atomic operation);

Second, a standalone barrier (std::atomic_thread_fence) actually acts as a stronger barrier than what is stated above. Specifically, a standalone release barrier actually also prevents subsequent stores – including atomic stores – from being moved in front of the barrier, and a standalone acquire barrier also prevents earlier loads from being moved behind the barrier.

Conclusion and key take-aways

Some key points that I wanted to cover which some more advanced tutorials tended to gloss over:

  • Atomic operations and memory ordering constraints are really two separate things;
  • Memory barriers are useless without some kind of synchronisation (which atomic operations provide). Memory barriers do not themselves provide any synchronisation. Atomic operations themselves do not provide a memory barrier;
  • Atomic operations can have a memory ordering constraint specified as a parameter. This effectively creates a barrier of the specified type, and forces the atomic operation to stick on the appropriate side of that barrier.

There are of course a lot of things I didn’t cover here, but there’s plenty of other material out there which covers the concepts in more detail. I hope the explanations here, which I’ve tried to reduce to the bare essentials, are helpful to some.

* Edit: it should be noted that the default ordering constraint on atomic operations is actually memory_order_seq_cst which implies both acquire and release semantics (depending on the operation) and provides additional guarantees beyond. So, while an atomic operation doesn’t itself imply a memory barrier, an atomic operation with the default memory order constraint does so. It may be preferable to use acquire/release semantics explicitly however since the default constraint can affect performance.


A quick update on Dinit

I’ve been very busy lately, though have managed to spend quite a bit of time coding on Dinit, and of course I released Dasynq which forms the “backbone” of Dinit, in a sense, by providing a robust event loop library. I don’t want to write a major article right now and in truth probably don’t have the content to do so, so just a quick point-by-point update:

  • As mentioned, Dasynq 1.0 was released, and there have since been (ahem) 4 minor bugfix releases.
  • The basic service management functionality of Dinit is largely complete; it supports the dependency types I determined were needed; it handles process supervision pretty well (most recently, I implemented start and stop timeouts, which are configurable per-service). Running services under different uid’s is not yet supported but should be trivial to add.
  • However, the major thing still missing is the ability to modify services on-the-fly, or even unload/reload service definitions for services that aren’t running. That’s a priority, but it will not be trivial.
  • On the other hand, system boot and shutdown/restart are handled pretty well. Dinit has been the init system of my desktop PC for many months now.
  • One significant milestone was reached: I got my first pull request for Dinit. It was small, but it showed that at least there is someone out there who is following progress, and it came as a pleasant surprise.
  • There are plenty of other rough edges. There’s no way to specify initial environment, either per-service or globally – that shouldn’t be hard to do but I’ve been putting it off (it’d make a good task for a new contributor, hint hint…). I’d like to separate PID 1 (the actual init) from the service manager, or at least make it a supported option to do so. Cgroups, namespaces and jails aren’t supported yet. There is only a “poor man’s” version of socket activation. And so on.
  • Even with all that, we’re along way from the full functionality of Systemd. That might be a good thing, though. The plan has pretty much always been to delegate parts of that functionality to other packages. The goal is to provide, together with other packages, a replacement that’s capable of running a desktop with all of the important functionality available.
  • The test suit has improved a lot, and I put a lot of effort into mocking system interfaces for the purpose of testing. That’s starting to pay off, and the number of tests is rapidly increasing. Of course the down side is that writing tests takes time away from adding functionality, but in the end it’s a certainly a win to have a comprehensive test suite.
  • I spent some time recently looking a bit more closely at both Nosh and S6-RC, two service managers which can function as or cooperate with an init system. Both are pretty decent, and both are in a more complete state than Dinit, although Dinit is catching up reasonably fast and I believe Dinit offers at least some functionality that these lack. One idea I might need to borrow from these is the concept of chaining processes together (so a logging process can be run separately to the service process, but the file descriptors that tie them together can be maintained by Dinit so that you can potentially re-start either process with minimal risk of losing log messages).

That’s about all I’ve got to say for the moment. Hopefully I can find some time to craft a longer blog post next time, and with some more interesting news to share. Thanks for reading and questions/comments welcome as always!

Introducing Dasynq

For someone looking at the rate of commits being pushed to Dinit, it might appear that development has halted. The good news is that this isn’t really the case; instead of working directly on Dinit, I’ve been working on a sub-project that came out of Dinit’s development. Allow me to introduce: Dasynq, the C++ event-loop library for robust clients!

The Background Story

Dinit, as an init system / service manager, needs to be able to respond to several different types of external event:

  • It needs to know when child processes have terminated, so that it can log and restart or continue to shut down any dependencies as appropriate
  • It needs to respond to signals which control its operation
  • It needs to receive and respond to requests coming over a socket connection, to allow service control
  • It needs to monitor timeouts so that a process which is taking too long to start or stop can be dealt with appropriately.

These requirements aren’t specific to service managers and in fact many programs, particularly network servers, need to be able to deal with a similar set of events. Typically an event-loop library is used to manage this; such a library allows monitoring a range of event types, and specifying callbacks to run when the events are detected. Most event-loop libraries use modern OS facilities such as kqueue or epoll as a back-end event delivery mechanism; in order to be able to offer some more advanced functionality such as event priorities, an event-loop library typically inserts received events in a queue rather than delivering them to the application immediately as they are detected.

When I started writing Dinit, my initial prototype used Libev, an event-loop library which is cross-platform, efficient and well-documented. It was good enough to get started with, but for an init system it had one glaring deficiency: insufficient support for error handling. In fact, the usual response of libev to encountering an error is to abort() the entire process, and there is no way to make the relevant functions return an error code instead. I began to look for a replacement. There were other event libraries, such as the venerable Libevent and the more recent Libuv, which improved error handling to the point that they could actually return error codes: but I wanted something better. Specifically, I wanted to know that certain operations could not fail, not just that I could meaningfully detect their failure.

Consider the case of a timer. If we have a service running as a process and receive a stop command for the service (perhaps as part of a system shutdown), we can send the process a signal – such as SIGTERM – requesting it to stop. But, we want to give it a reasonable time limit to respond to this signal, in case it has hung; so, we start a timer, and on expiry of the timer we can send SIGKILL in order to finish off the hung process. The issue is that, when using these existing event-loop libraries, the action of starting a timer can fail (for instance, due to resource limitations); this would leave us in the awkward position of not being able to time the process shutdown, and unless we take drastic action such as sending SIGKILL immediately, it potentially hangs the whole shutdown process.

Another example: event loops allow us to monitor the status of child processes, so we can detect when they terminate. However, in other event-loops, adding a watcher for a child process is a function that can fail. Again, this would leave us in an awkward position; we could terminate the child immediately, but it would be much better if we could have the ability to add a child watcher with no failure mode, or at least prevent forking the child if we could detect the current inability to add a watch for it.

The Birth of Dasynq

So, I set about writing Dasynq to address these issues. With Dasynq, you can pre-allocate timers and child process watchers, so that arming a timer or adding a child watch is an operation that simply cannot fail. Enabling and disabling I/O watchers, similarly, cannot fail.

At the same time, I addressed what I saw as some shortcomings in some of the other event-loop libraries (note that some of these apply to some libraries; they do not all apply to all libraries):

  • They did not allow setting timers against the system clock (the clock that potentially jumps when it is corrected by the user). This arguably shouldn’t be a common concern in this age of NTP-by-default configurations, but I still consider it a shortcoming
  • They use bad time representations; Libev for instance uses floating-point values to represent absolute time, which I consider an inherently bad idea. (edit: to be fair, though, a ‘double’ as used by Libev is fine for hundreds of years unless you need better than microsecond precision).
  • They had limited, or no, support for prioritising certain events over others.
  • They had limited support for multi-threaded applications.

Some of these were not a concern for Dinit, but I saw them as general shortcomings which could and should be addressed. And so I created Dasynq, and I’m now using it in Dinit. However, it’s fully documented, and should be usable in a range of other projects, too! As usual, feedback is welcome.

(Edit: I didn’t include boost::asio in any of the discussion above, mainly because it lacks a lot of the functionality that is present in the other event loops – such as POSIX signals, and child process watches – but also because I have concerns about the API it presents; of course it also retains the failure modes that formed my original motivation for creating Dasynq).

For more information about Dasynq, check the website or the Github repository.

Let’s Talk about Service Dependencies

(aka: Escape from System D, part IV).

First: anyone who’s been keeping tabs will have noticed that there hasn’t been a lot of progress on Dinit recently; this has been due to multiple factors, one being the hard disk drive in my laptop dying and this impeding my ability to work on the train to and from work, which is when I usually found time to work on Dinit. However, I’ve by no means abandoned the project, will hopefully have a replacement laptop soon, and expect the commits to resume in due course (there have been a small number made recently, in fact).

In this post I wanted to discuss service dependencies and pros and cons of managing them in slightly different ways. In an earlier post I touch on the basics of service management with dependencies:

if one service needs another, then starting the first should also start the other, and stopping the second should also require the first to stop.

It’s clear that there are two reasons that a service could be running:

  1. It has been explicitly started, or
  2. It has been started because another service which depends on it has been started.

This is all very well, but in the 2nd case, there’s an open question about what to do when the dependency service stops. There are two choices in this regard:

  1. A started service remains running when its dependencies stop, even if the service has not itself been explicitly started, or
  2. A started service automatically stops when its dependencies stop (unless it has itself been explicitly started).

Which is the better option? The first option is probably simpler to implement (it doesn’t require tracking whether a service was explicitly started, for instance); the second option, though, has the nice properties that (a) it doesn’t keep unneeded services running and (b) explicitly starting and then stopping a service will return the system to the original state (in terms of which services are running). Also, if you want to emulate the concept of run levels (which essentially describe a set of services to run exclusively), you can do so easily enough; switching run level is equivalent to explicitly starting the appropriate run level service and stopping the current one.

(Systemd makes a distinction between service units, which describe a process to run, and target units, which group services. However, I’m not sure there’s a real need for this distinction; services can depend on other services anyway, so the main difference is that one has an individual associated process and the other doesn’t. Indeed Systemd’s systemctl isolate command can accept a service unit, although it expects a target unit by default. Dinit on the other hand makes no real distinction between services and targets at this higher level.)

There are some complications, though, which necessarily add complexity to the service model described above. Mainly, we want some flexibility in how dependency termination is handled. The initial “boot” service, for instance, probably shouldn’t stop (and release all its dependencies as a result) if a single dependency (let’s say the sshd server, for example) terminates unexpectedly; similarly, we wouldn’t necessarily want boot to be considered failed if any of a number of certain dependency services failed to start. On the other hand, for other service/dependency combinations, we might want exactly that: if the dependency fails then the dependent also fails, and if the dependency stops then the dependent also stops.

Other problems we need to solve:

  • It may be convenient to have persistent services that remain started after they are started (due to a dependent starting, even when the dependent stops. For instance, if we have a service which mounts the filesystem read/write (from read-only) it’s probably convenient to leave it “running” after it starts, since undoing this is complicated and may be error-prone.
  • Boot failure needs a contingency; it should be possible to configure what happens if some service essential for boot fails (whether it be to start a single-user shell, reboot, power off, or simply stop with an error message).

With all the above in mind, I’ve narrowed down the necessary dependency types as follows:

  • regular – the dependency must start before the dependent starts, and if the dependency stops then the dependent stops.
  • soft – the dependency starts (in parallel) with the dependent, but if it fails or stops this does not affect the dependent. It’s not precisely clear that this dependency type is necessary in its own right, but it forms the basis for the following two dependency types.
  • waits-for – as for soft, but the dependent waits until the dependency starts (or fails) before it starts itself.
  • “milestone” – The dependency must start before the dependent starts, but once the dependent has started, the dependency link becomes soft. This is different from “waits-for” in that if the dependency fails, the dependent will not start.

This is what I’m currently implementing (up until now, only “regular” and “waits-for” dependencies have been supported by Dinit).

For the boot failure case, Dinit currently starts the service named “single” (i.e. the single-user service); however, some flexibility / configurability might be added at a later date.

For next time

There are a lot of things that I want write about and implement, and though finding the time has been increasingly difficult lately I’m hoping things will calm down a little over the next few months.

One thing I really need to do is look again, properly, at some of the other supervision/init systems out there. There are two motivations for this: one, determining whether Dinit is really necessary in its own right –  that is, can any of the existing systems do everything that I’m hoping Dinit will be able to, and would it make sense to collaborate with / contribute to one of them? In particular s6 and Nosh are two suites which seem like they are well-designed and capable. (Note that I don’t envisage stopping work on Dinit altogether, and don’t feel like availability of another quality init system is going to be a bad thing).

There’s still a lot more work that needs to be done with Dinit, too. Presently it’s not possible to modify loaded service definitions (including changing dependencies) which is certainly a must-have-for-1.0 feature, but that’s really just the tip of the iceberg. At some point I’d like to create a formal list of what is needed to truly supplant Systemd in the common Linux software ecosystem. Completing the basic Dinit functionality remains a priority for now, however.

Thanks for reading and, as always, constructive comments are welcome.

Safety and Daemons

(aka. Escape from System D, part III).

So Dinit (github) is a service manager and supervisor which can function as an init process. As I’ve previously discussed, an init needs to be exceptionally stable: if it crashes, the whole system will come down with it. A service manager which manages system services, though, also needs to be stable, even if it’s not also running as an init: it’s likely that a service manager failure will cause parts of the system to stop working correctly.

But what do we mean by stable, in this case? Well, obviously, part of what we mean is that it shouldn’t crash, and part of that means we want no bugs. But that’s a narrow interpretation and not a useful one; we don’t really want bugs in any software. A big part of being stable – the kind of stable we want in an init or service manager – is being robust in the face of resource scarcity. One resource we are concerned about is file descriptors, and one of the most obvious is memory. In C, malloc can fail: it returns a null pointer if it cannot allocate a chunk of the requested size – and this possibility is ignored only at some peril. (One class of security vulnerability occurs when a program can be manipulated into attempting allocation of a chunk so large that the allocation will certainly fail, and the program fails to check whether the allocation was successful).

Consider now the xmalloc function, implementations of which abound. One can be found in the GNU project’s libiberty library, for example. xmalloc behaves just like malloc except that it aborts the program when the allocation fails, rather than returning a null pointer. This is “safe” in the sense that it prevents program misbehaviour and potential exploits, although is sometimes less than desirable from an end-user perspective. In a service manager, it would almost certainly be problematic. In an init, it would be disastrous. (Note that in Dinit, it is planned to separate the init process from the service manager process. Currently, however, they are combined).

So, in Dinit, if a memory allocation fails, we want to be able to handle it. But also, importantly, we want to avoid (as much as possible) making critical allocations during normal operation – that is, if we could not proceed when an allocation failed, it would be better if avoided the need for allocation altogether.

How Dinit plays safe

In general Dinit tries to avoid dynamic memory allocation when it’s not essential; I’ll discuss some details shortly. However, there’s another memory-related resource which can be limited: the stack. Any sort of unbounded recursion potentially exhausts the stack space, and this form of exhaustion is much harder to detect and deal with than regular heap space exhaustion. The simplest way to deal with this is to avoid unbounded recursion, which Dinit mostly does (there is still one case that I know of remaining – during loading of service descriptions – but I hope to eliminate it in due course).

Consider the process of starting a service. If the service has dependencies, those must be started too, and the dependencies of those dependencies must be started, and so on. This would be expressed very naturally via recursion, something like:

void service::start() {
    for (auto dep : dependencies) {
    do_start(); // actually start this service

(Note this is very simplified code). However, we don’t want recursion (at least, we don’t want recursion which uses our limited stack). So instead, we could use a queue allocated on the heap:

void service::start() {
    // (throws std::bad_alloc on out-of-memory).
    // start with a queue containing this service,
    // and an empty (heap-allocating) stack:
    std::queue<service *> start_queue = { this };
    std::stack<service *> start_stack;

    // for each dependency, add to the queue. Build the stack:
    while (! start_queue.empty()) {
        for (auto dep : start_queue.front()->dependencies) {

    // start each service in reverse dependency order:
    while (! start_stack.empty()) {

This is considerably more complicated code, but it doesn’t implicitly use our limited stack, and it allows us to catch memory space exhaustion (via the std::bad_alloc exception, which is thrown from the queue and stack allocators as appropriate). It’s an improvement (if not in readability), but we’ve really just traded the use of one limited resource for another.

(Also, we need to be careful that we don’t forget to catch the exception somewhere and handle it appropriately! An uncaught exception in C++ will also terminate the program – so we essentially get xmalloc behaviour by default – and because of this, exceptions are arguably a weakness here; however, they can improve code readability and conciseness compared to continually checking for error status returns, especially in conjunction with the RAII paradigm. We just need to be vigilant in checking that we always do catch them!).

Edit: incidentally, if you’re thinking that memory allocation failure during service start is a sure sign that we won’t be able to launch the service process anyway, you’re probably right. However, consider service stop. It follows basically the same procedure as start, but in reverse, and not being able to stop services in a low-memory environment would clearly be bad.

We can improve further on the above: note that while the service dependency graph is not necessarily a tree, we only need to start each dependency once (the above code doesn’t take this into account, potentially issuing do_start() to the same service multiple times if it is a dependency of multiple other services). Given that a service only need appear in start_queue and start_stack once, we can actually manage those data structures as linked lists where the node is internal to the service (i.e. the node doesn’t need to be allocated separately).

For example, service might be defined as something like:

class service {
    std::string name;
    std::list<service *> dependencies;
    // (other details)
    bool is_in_start_queue = false;
    bool is_in_start_stack = false;
    service * next_in_start_queue = nullptr;
    service * next_in_start_stack = nullptr;
    void start();
    void do_start();

Now, although it requires extra code (again) because we can’t use the standard library’s queue or stack, we can manage the two data structures without performing any allocations. This means we can rewrite our example start() in such a way that it cannot fail (though of course in reality starting a service requires various additional steps – such as actually starting a process – for which we can’t absolutely guarantee success; however, we’ve certainly reduced the potential failure cases).

In fact, in Dinit a service can be part of several different lists (technically, order-preserving sets). I wrote some template classes to avoid duplicating code to deal with the different lists, which you can find in the source repository. Using these templates, we can rewrite the example service class and the start() method, as follows:

class service {
    std::string name;
    std::list<service *> dependencies;
    // (other details)
    lld_node<service> start_queue_node;
    lls_node<service> start_stack_node;
    void start();
    void do_start();

    static auto &get_startq_node(service *s) {
        return s->start_queue_node;
    static auto &get_starts_node(service *s) {
        return s->start_stack_node;

void service::start() {
    // start with a queue containing this service,
    // and an empty (heap-allocating) stack:
    dlist<service, service::get_startq_node> start_queue;
    slist<service, service::get_starts_node> start_stack;

    // for each dependency, add to the queue. Build the stack:
    while (! start_queue.is_empty()) {
        auto front = start_queue.pop_front();
        for (auto dep : front->dependencies) {
            if (! start_queue.is_queued(dep))
        if (! start_stack.is_queued(dep)) {

    // start each service in reverse dependency order:
    while (! start_stack.is_empty()) {

(Note that the templates take two arguments: one is the element type in the list, which is service in this case, and the other is a function to extract the list node from the element. The call to this function will normally be inlined by the compiler, so you end up paying no abstraction penalty).

This is a tiny bit more code, but it’s not too bad, and compared to the previous effort it performs no allocations and avoids issuing do_start() to any service more than once. The actual code in Dinit is somewhat more complicated, but works roughly as outlined here. (Note, I snuck some C++14 into the code above; Dinit itself remains C++11 compatible at this stage).

There’s more to resource safety than memory and stack usage; I may discuss a little bit more in the future. I hope this post has provided some interesting perspective, however. As usual, comments are welcome.


Since last post, I’ve added a “stop timeout” for services – this allows setting a maximum time for a service to stop. If it takes longer than the allowed time, the service process is issued a SIGKILL which (unless something really whack is going on) should cause it to terminate immediately. I’ve set the default to 10 seconds, which seems reasonable, but it can be configured (and disabled) via the service description file.

(I’m not sure if I really want this to be enabled by default, or whether 10 seconds is really enough as a default value – so this decision may be revisited. Opinions welcome).

Other than that, it’s been bugfixes, cleaning up TODO’s in the code, and minor robustness improvements. I’m aiming for complete service management functionality soon (and in fact Dinit already works well in this capacity, but is missing one or two features that I consider important).

Escape from System D (2)

Episode II: Init versus the service management daemon

I was pleased that my announcement of another in-development init/service manager met with a mostly positive response. I plan to keep making semi-regular posts where I post both general discussion around the issues of service management and progress updates on my own effort, dubbed Dinit.

In this post I will give a little background on init systems and service management generally. I expect a lot of readers will not learn much, since it is already well understood, but it is worth laying out some background for reference in future posts/discussion.

What is “init”?

The init process, traditionally started from /sbin/init on the filesystem, is the first userspace process to launch on the system. As such it is the only process with no parent process. Most (if not all) operating systems give it a process ID of 1, making it easy to identify. There are two special things about the init process:

  1. First, it automatically becomes the new parent of otherwise orphaned processes. In particular processes which “daemonise” themselves by double-forking and letting the intermediate parent die get re-parented to the init process.
  2. If the init process terminates, for any reason, the kernel panics (so the whole system crashes).

The second point is in fact not necessarily true – it just so happens that, at least on Linux, if the init process dies then the system dies with it. I am not sure how the various *BSD systems react, but in general, it is not expected that the init process will terminate. This means that it is very, very important that the init process does not crash. However, the first point above has some implications as well, which we’ll get to shortly.

Notionally, the init system has two jobs: to reap its child processes when they have terminated (this is accomplished using the wait system call or one of its variants; reaping a terminated process ensures that its resources are freed and that it is no longer listed in the process table of the system) and also to start up the system, which it can potentially do just by running another process. An init may also be involved in the system shutdown process as well, though strictly speaking that’s not necessary.

You might be interested in Rich Felker’s example of a minimal init system, which is part of one of his blog posts (where he also discusses Systemd). It’s less than a screenful of text – small enough that it can be “obviously bug free” – a nice attribute to have for an init, for reasons outlined above.

So what is a “service manager”?

A service manager provides, at the most basic level, a means for stopping and starting individual services. Services quite typically run as a process – consider for example the ssh server daemon, sshd – but sometimes exist in some other form; having the network connection(s) up and operational, for example, could be enacted by means of a service. Typical modern systems have a service manager which is either started from the init process or incorporated in it (Systemd is an example of an init process which incorporates service management functionality, but there are various others which do the same).

Aside from just an interface to starting and stopping services, service managers may provide:

  • process supervision – which normally amounts to the ability to restart a service process if it terminates unexpectedly (in general, this is a mitigation measure against software faults)
  • service dependency management – if one service needs another, then starting the first should also start the other, and stopping the second should also require the first to stop.
  • a logging mechanism for dealing with output from service processes (in general, though, this can be delegated largely to a secondary process).

Since a service manager is naturally somewhat more complex than a standalone init system, it should be obvious that incorporating the two in one process has some inherent risks. If an init system terminates unexpectedly, the whole system will generally crash; not only is this inconvenient for the user, but it also makes analysing the bug that caused the crash more difficult.

Why combine them, then?

The obvious question: if it’s better to keep init as simple as possible, why does it get combined with service management? One reason is so that double-forking processes, which have re-parented to the init process, can be supervised; normal POSIX functions only allow receiving status notifications for direct child processes. (Various *BSDs support watching arbitrary process status via the kqueue system calls, but the interface has flaws – that I will perhaps discuss another time – and anyway, any mechanism to watch a non-immediate-child process by process ID, without co-ordination with the parent process, is prone to a race condition: at least in theory, a process with a given ID can die, and be reaped, and the process ID can be recycled, in between some other process discovering the process ID and setting up a watch for it or even worse sending a termination signal in an attempt to shut down a service).

Now we could just about argue that no service should double-fork, and this is eliminates any need for the service manager to run as the init process (PID 1). However, we can’t actually prevent processes from double-forking; on the other hand, there is a mechanism – at least on Linux – called cgroups, which allows for tracking process origin even through double-fork. Importantly, this can be used to track processes belonging to particular user sessions. One operation that we might naturally want to perform to a cgroup is to terminate it – or rather, terminate all processes in the cgroup – and this, once again, is racy unless we can co-ordinate with the parent process(es) of all processes in the cgroup (and by “coordinate” I mean that we want to prevent the parent process from reaping child processes which have terminated, to avoid the race where a process ID is recycled and the wrong process is then terminated, as described above),

(Some other systems might have functionality similar to cgroups – I have FreeBSD jails in mind, though I need to do some research to understand exactly how jails work and their limitations, and in particular if they also suffer the termination race problem described above).

So, for supervising double-forked processes, and for controlling user sessions, having control of the PID 1 (init) process is important for a service manager. However, there’s a hint in what hasn’t been said: while we may need co-operation between the init process and the service manager, it’s not absolutely necessary that they are the same process. One of the ideas I’d like to investigate with Dinit is whether we can keep a very simple init process and a separate, more complex, service manager / supervisor.

Dinit progress

For the most part reactions to my announcement of Dinit were positive. One comment on Reddit wondered how I was going to be able to achieve a “solid-as-a-rock stable” system using a non-memory-safe language (C++) and without having any tests. Of course, this wasn’t quite correct; I have always had tests for Dinit, but they were not automated. One thing that I’ve done since my initial announcement is implement a small number of automated tests (that you can run using “make check”). I plan to write many more tests, but this feels like a good start. I’ll discuss the reasons for using C++ at some point, but it needs to borne in mind that while C++ is not memory-safe it is still perfectly possible to write stable software in such a language; it just takes a little more effort!

I’ve also done a little refactoring, solved one or two minor bugs, and improved the man pages. My TODO list is slowly getting smaller and I think Dinit is approaching the stage where it can be considered a high-quality service manager, though it is a way off from being a full replacement for Systemd.

Please feel free to comment below and/or check out the source code on the Github repository.

Escape from System D

Episode I: It’s obvious, init?

How is that for a title, right? I know, I know, they want me to call it “systemd” not “System D” or “Systemd” or anything else resembling a legitimate proper noun, but that doesn’t work quite so well for a sci-fi-esque sounding movie title as does the above.

Anyway, to cut straight to the chase: I’m writing an init system. I’m not happy with Systemd’s increasing feature creep, bugs and occasional developer attitude issues, and I know I’m not the only one; however, I do want an init system / service supervision and management system that is more capable than the ancient Sys V init, and which in theory – together with a number of other pieces of software – could effectively provide a fully functional replacement for Systemd, without taking its all-or-nothing approach, and without needlessly sacrificing backwards compatibility with pre-existing tools and workflows, while being simpler both conceptually and in implementation.

Yes, there are probably already other options. I have at least briefly looked at a number of them. (see here, though there are probably many that are missing from that list). In general I am not perfectly happy with any of them, which is why I’ve decided to write yet another. There may be a bit of NIH syndrome leading to this decision; that’s ok, I can live with that; partly we write software because there’s nothing else that will do the job, and party we write it just for the heck of it. This project is always going to be a large part for the latter.

So, what are my main goals? Let’s see:

  • This will be both an init system and a service manager / process supervisor, in this sense similar to Systemd. It both boots the system, runs services (and allows them to be controlled), and shuts the system down. It will be able to automatically restart services that fail, when it is sensible to do so.
  • The dependency model is simple, but effective. You should be able to express ordering between services that specifies one service requires another to have started first.
  • Simplicity in general is an explicit goal, as is ease of configuration and use. (Sometimes these conflict).
  • It will be cross-platform. At least, it should run on most POSIX systems, not just Linux.
  • It will be both efficient and maintainable.
  • It will be stable. Solid-as-a-rock stable.

That should probably do for now. I know that these could be considered lofty goals; I’ll discuss more on each point in a number of follow-up posts, and where applicable I’ll discuss differences to Systemd and other init systems and service managers, anything that will make this particular software special, and any ways in which things might be improved generally.

What I do need to say before I finish up, however, is that this software is real: it has a name – “Dinit” – and in fact it already has a good body of source code and documentation, as can be seen on the Github page. What’s more, I’m already using it to boot my own system (although I’m not going to recommend at this stage that anyone else should try to use it for that just yet).

And, oh yeah, it’s written in C++. I know a number of people are going to snub their noses at the project just because of that; I don’t care. As far as I’m concerned the realistic alternatives at this point in time are probably C and Rust; C++ beats C hands down and Rust just isn’t, in my eyes, quite ready yet, though it may be one day (maybe even soon). I’ll discuss the precise reasons for choosing C++ (beyond “I happen to like it”, which is certainly one reason) in a future post, but ultimately arguing about programming languages isn’t something I really want to get into.

In the next episode of Escape From System D, I will discuss the role of init and service management systems and why they are often combined.