Let’s Talk about Service Dependencies

(aka: Escape from System D, part IV).

First: anyone who’s been keeping tabs will have noticed that there hasn’t been a lot of progress on Dinit recently; this has been due to multiple factors, one being the hard disk drive in my laptop dying and this impeding my ability to work on the train to and from work, which is when I usually found time to work on Dinit. However, I’ve by no means abandoned the project, will hopefully have a replacement laptop soon, and expect the commits to resume in due course (there have been a small number made recently, in fact).

In this post I wanted to discuss service dependencies and pros and cons of managing them in slightly different ways. In an earlier post I touch on the basics of service management with dependencies:

if one service needs another, then starting the first should also start the other, and stopping the second should also require the first to stop.

It’s clear that there are two reasons that a service could be running:

  1. It has been explicitly started, or
  2. It has been started because another service which depends on it has been started.

This is all very well, but in the 2nd case, there’s an open question about what to do when the dependency service stops. There are two choices in this regard:

  1. A started service remains running when its dependencies stop, even if the service has not itself been explicitly started, or
  2. A started service automatically stops when its dependencies stop (unless it has itself been explicitly started).

Which is the better option? The first option is probably simpler to implement (it doesn’t require tracking whether a service was explicitly started, for instance); the second option, though, has the nice properties that (a) it doesn’t keep unneeded services running and (b) explicitly starting and then stopping a service will return the system to the original state (in terms of which services are running). Also, if you want to emulate the concept of run levels (which essentially describe a set of services to run exclusively), you can do so easily enough; switching run level is equivalent to explicitly starting the appropriate run level service and stopping the current one.

(Systemd makes a distinction between service units, which describe a process to run, and target units, which group services. However, I’m not sure there’s a real need for this distinction; services can depend on other services anyway, so the main difference is that one has an individual associated process and the other doesn’t. Indeed Systemd’s systemctl isolate command can accept a service unit, although it expects a target unit by default. Dinit on the other hand makes no real distinction between services and targets at this higher level.)

There are some complications, though, which necessarily add complexity to the service model described above. Mainly, we want some flexibility in how dependency termination is handled. The initial “boot” service, for instance, probably shouldn’t stop (and release all its dependencies as a result) if a single dependency (let’s say the sshd server, for example) terminates unexpectedly; similarly, we wouldn’t necessarily want boot to be considered failed if any of a number of certain dependency services failed to start. On the other hand, for other service/dependency combinations, we might want exactly that: if the dependency fails then the dependent also fails, and if the dependency stops then the dependent also stops.

Other problems we need to solve:

  • It may be convenient to have persistent services that remain started after they are started (due to a dependent starting, even when the dependent stops. For instance, if we have a service which mounts the filesystem read/write (from read-only) it’s probably convenient to leave it “running” after it starts, since undoing this is complicated and may be error-prone.
  • Boot failure needs a contingency; it should be possible to configure what happens if some service essential for boot fails (whether it be to start a single-user shell, reboot, power off, or simply stop with an error message).

With all the above in mind, I’ve narrowed down the necessary dependency types as follows:

  • regular – the dependency must start before the dependent starts, and if the dependency stops then the dependent stops.
  • soft – the dependency starts (in parallel) with the dependent, but if it fails or stops this does not affect the dependent. It’s not precisely clear that this dependency type is necessary in its own right, but it forms the basis for the following two dependency types.
  • waits-for – as for soft, but the dependent waits until the dependency starts (or fails) before it starts itself.
  • “milestone” – The dependency must start before the dependent starts, but once the dependent has started, the dependency link becomes soft. This is different from “waits-for” in that if the dependency fails, the dependent will not start.

This is what I’m currently implementing (up until now, only “regular” and “waits-for” dependencies have been supported by Dinit).

For the boot failure case, Dinit currently starts the service named “single” (i.e. the single-user service); however, some flexibility / configurability might be added at a later date.

For next time

There are a lot of things that I want write about and implement, and though finding the time has been increasingly difficult lately I’m hoping things will calm down a little over the next few months.

One thing I really need to do is look again, properly, at some of the other supervision/init systems out there. There are two motivations for this: one, determining whether Dinit is really necessary in its own right –  that is, can any of the existing systems do everything that I’m hoping Dinit will be able to, and would it make sense to collaborate with / contribute to one of them? In particular s6 and Nosh are two suites which seem like they are well-designed and capable. (Note that I don’t envisage stopping work on Dinit altogether, and don’t feel like availability of another quality init system is going to be a bad thing).

There’s still a lot more work that needs to be done with Dinit, too. Presently it’s not possible to modify loaded service definitions (including changing dependencies) which is certainly a must-have-for-1.0 feature, but that’s really just the tip of the iceberg. At some point I’d like to create a formal list of what is needed to truly supplant Systemd in the common Linux software ecosystem. Completing the basic Dinit functionality remains a priority for now, however.

Thanks for reading and, as always, constructive comments are welcome.

Advertisements

4 thoughts on “Let’s Talk about Service Dependencies

  1. The question of whether anyone is still interested in a service can sometimes be a tricky one, as can the question of what to do if a service gets tied up by some task that cannot be readily identified or killed. In some contexts, a useful approach can be to either have a service include a means by which it can notify clients “Hey, I’m going away if you don’t speak up”, and/or have a way of shutting down a service that releases all non-fungible resources, but leaves behind enough of a stub that any existing references will continue to be valid references to a dead service [so any attempt to use them will deterministically fail]. While yanking a service out from underneath a client is often icky, it can sometimes yield semantically-correct behavior much more cleanly than would otherwise be possible. For example, writing client code to use blocking I/O is often easier than using non-blocking code, but it may be hard to cleanly shut down a client while it’s waiting for an I/O operation that will never complete. Yanking the I/O resource out from under the client may be a simple and easy way to let it continue execution so it can clean up after itself nicely.

    • I had meant to reply this much earlier. I think the idea of having services notify clients that they are potentially going away is interesting for some scenarios, but probably doesn’t fit with the model envisioned for a complete service manager – where the manager knows with certainty whether a service is still needed by its dependents.

      In regards to killing dependencies before (or rather at the same time) as dependents, this is potentially possible, but would need to be configured (because potentially it is very much not what is wanted). To look at your example, a client using blocking I/O in general is still generally going to respond to a signal (because that will interrupt the I/O call and return an error code). It is a pretty badly-behaved client that can’t be killed except by killing the service that it’s using – however, that’s not to say such clients don’t necessarily exist, so the option to have the dependency sent the termination signal at the same time as the client could indeed by useful.

      • If applications use a pattern of starting a service when not running, and otherwise simply using a running service, a service manager may have no way of knowing whether a service might have dependents the manager doesn’t know about. A better pattern may be to have every client notify the server of its dependence upon the service and then let the manager decide whether to run the service, and when it should be shut down. If one wants the ability to start a service with no dependents, a better approach than the ones you listed may be to have the explicit start-service request create a dummy client which is dependent upon the service. As long as that client exists, the service will stay alive. If the client goes away while other clients are using the service, the service will be shut down once the last dependency goes away. The dummy client thus provides a convenient and consistent way to indicate that there is no longer a need to keep the service alive other than existing dependencies.

        Having a service manager know of all dependencies can be a clean approach if it can be reliably informed when a client no longer needs a service. It may be problematic, however, if e.g. a process might abnormally terminate (e.g. core dump) without notifying the service manager that it no longer needs any services it had been using. A protocol based upon a “Does anyone still need this service?” query would be able to recover in such situations, though it would run the risk that a service might get yanked from a process that gets sufficiently waylaid. On the flip side, breaking a dependency when a process gets sufficiently waylaid may sometimes be better than letting it persist forever. For example, if a process is suspended, having it keep its services might be good if it ever gets resumed. If, however, a process which is suspended is never going to be resumed, keeping services alive for it may serve no purpose.

        • Essentially my concern is with static dependencies; a service needs another at all times, or does not. I feel the job of the service manager is to tell a service either to start, or to stop. In the latter case, if the service has active clients, it can potentially delay shutdown until the clients go away, but that is outside the concern of the service manager itself.

          > Having a service manager know of all dependencies can be a clean approach if it can be reliably informed when a client no longer needs a service.

          The model I’m talking about is simple: if a client is running, and the service description of the client says that it depends on some other service, then the client needs that service (bear in mind that a service is not necessarily a running process). Dynamic dependencies such as what you describe here are altogether another concern, and represent a different model. Using a service may indeed need a process to be started, but from a service manager point of view in the model I’m presenting, if the service is stopped then no clients can exist.

Leave a Reply to davmac Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.