For someone looking at the rate of commits being pushed to Dinit, it might appear that development has halted. The good news is that this isn’t really the case; instead of working directly on Dinit, I’ve been working on a sub-project that came out of Dinit’s development. Allow me to introduce: Dasynq, the C++ event-loop library for robust clients!
The Background Story
Dinit, as an init system / service manager, needs to be able to respond to several different types of external event:
- It needs to know when child processes have terminated, so that it can log and restart or continue to shut down any dependencies as appropriate
- It needs to respond to signals which control its operation
- It needs to receive and respond to requests coming over a socket connection, to allow service control
- It needs to monitor timeouts so that a process which is taking too long to start or stop can be dealt with appropriately.
These requirements aren’t specific to service managers and in fact many programs, particularly network servers, need to be able to deal with a similar set of events. Typically an event-loop library is used to manage this; such a library allows monitoring a range of event types, and specifying callbacks to run when the events are detected. Most event-loop libraries use modern OS facilities such as kqueue or epoll as a back-end event delivery mechanism; in order to be able to offer some more advanced functionality such as event priorities, an event-loop library typically inserts received events in a queue rather than delivering them to the application immediately as they are detected.
When I started writing Dinit, my initial prototype used Libev, an event-loop library which is cross-platform, efficient and well-documented. It was good enough to get started with, but for an init system it had one glaring deficiency: insufficient support for error handling. In fact, the usual response of libev to encountering an error is to abort() the entire process, and there is no way to make the relevant functions return an error code instead. I began to look for a replacement. There were other event libraries, such as the venerable Libevent and the more recent Libuv, which improved error handling to the point that they could actually return error codes: but I wanted something better. Specifically, I wanted to know that certain operations could not fail, not just that I could meaningfully detect their failure.
Consider the case of a timer. If we have a service running as a process and receive a stop command for the service (perhaps as part of a system shutdown), we can send the process a signal – such as SIGTERM – requesting it to stop. But, we want to give it a reasonable time limit to respond to this signal, in case it has hung; so, we start a timer, and on expiry of the timer we can send SIGKILL in order to finish off the hung process. The issue is that, when using these existing event-loop libraries, the action of starting a timer can fail (for instance, due to resource limitations); this would leave us in the awkward position of not being able to time the process shutdown, and unless we take drastic action such as sending SIGKILL immediately, it potentially hangs the whole shutdown process.
Another example: event loops allow us to monitor the status of child processes, so we can detect when they terminate. However, in other event-loops, adding a watcher for a child process is a function that can fail. Again, this would leave us in an awkward position; we could terminate the child immediately, but it would be much better if we could have the ability to add a child watcher with no failure mode, or at least prevent forking the child if we could detect the current inability to add a watch for it.
The Birth of Dasynq
So, I set about writing Dasynq to address these issues. With Dasynq, you can pre-allocate timers and child process watchers, so that arming a timer or adding a child watch is an operation that simply cannot fail. Enabling and disabling I/O watchers, similarly, cannot fail.
At the same time, I addressed what I saw as some shortcomings in some of the other event-loop libraries (note that some of these apply to some libraries; they do not all apply to all libraries):
- They did not allow setting timers against the system clock (the clock that potentially jumps when it is corrected by the user). This arguably shouldn’t be a common concern in this age of NTP-by-default configurations, but I still consider it a shortcoming
- They use bad time representations; Libev for instance uses floating-point values to represent absolute time, which I consider an inherently bad idea. (edit: to be fair, though, a ‘double’ as used by Libev is fine for hundreds of years unless you need better than microsecond precision).
- They had limited, or no, support for prioritising certain events over others.
- They had limited support for multi-threaded applications.
Some of these were not a concern for Dinit, but I saw them as general shortcomings which could and should be addressed. And so I created Dasynq, and I’m now using it in Dinit. However, it’s fully documented, and should be usable in a range of other projects, too! As usual, feedback is welcome.
(Edit: I didn’t include boost::asio in any of the discussion above, mainly because it lacks a lot of the functionality that is present in the other event loops – such as POSIX signals, and child process watches – but also because I have concerns about the API it presents; of course it also retains the failure modes that formed my original motivation for creating Dasynq).