Bus1 is the new Kdbus

For some time there have been one or two developers essentially trying to move part of D-Bus into the kernel, mainly (as far as I understand) for efficiency reasons. This has so far culminated in the “Bus1” patch series – read the announcement from here (more discussion follows):

Bus1 is a local IPC system, which provides a decentralized infrastructure to share objects between local peers. The main building blocks are nodes and handles. Nodes represent objects of a local peer, while handles represent descriptors that point to a node. Nodes can be created and destroyed by any peer, and they will always remain owned by their respective creator. Handles on the other hand, are used to refer to nodes and can be passed around with messages as auxiliary data. Whenever a handle is transferred, the receiver will get its own handle allocated, pointing to the same node as the original handle.

Any peer can send messages directed at one of their handles. This will transfer the message to the owner of the node the handle points to. If a peer does not posess a handle to a given node, it will not be able to send a message to that node. That is, handles provide exclusive access management. Anyone that somehow acquired a handle to a node is privileged to further send this handle to other peers. As such, access management is transitive. Once a peer acquired a handle, it cannot be revoked again. However, a node owner can, at anytime, destroy a node. This will effectively unbind all existing handles to that node on any peer, notifying each one of the destruction.

Unlike nodes and handles, peers cannot be addressed directly. In fact, peers are completely disconnected entities. A peer is merely an anchor of a set of nodes and handles, including an incoming message queue for any of those. Whether multiple nodes are all part of the same peer, or part of different peers does not affect the remote view of those. Peers solely exist as management entity and command dispatcher to local processes.

The set of actors on a system is completely decentralized. There is no global component involved that provides a central registry or discovery mechanism. Furthermore, communication between peers only involves those peers, and does not affect any other peer in any way. No global communication lock is taken. However, any communication is still globally ordered, including unicasts, multicasts, and notifications.

Ok, and maybe I’m missing something, but: replace “nodes” with “unix domain sockets” and replace “handles” with “file descriptors” and, er, haven’t we already got that? (Ok, maybe not quite exactly – but is it perhaps good enough?)

Eg:

Sockets represent objects of a local peer, while descriptors represent descriptors that point to a socket. Sockets can be created and destroyed by any peer, and they will always remain owned by their respective creator.

right?

Descriptors on the other hand, are used to refer to socket connections and can be passed around with messages as auxiliary data. Whenever a descriptor is transferred, the receiver will get its own descriptor allocated, pointing to the same socket connection as the original handle.

It’s already possible to pass file descriptors to another process via a socket. Technically passing a file descriptor connected to a socket gives the other peer the same connection to the socket, which is probably not conceptually identical to passing handles, which (if I understand correctly) is more like having another connection to the same socket. But could it be so hard to devise a standard protocol for requesting a file descriptor with a secondary connection, specifically so that it can be passed to another process?

Any peer can send messages directed at one of their file descriptors. This will transfer the message to the owner of the socket the descriptor points to. If a peer does not posess a descriptor to a given socket, it will not be able to send a message to that socket. That is, descriptors provide exclusive access management. Anyone that somehow acquired a descriptor to a socket is privileged to further send this descriptor to other peers. As such, access management is transitive. Once a peer acquired a descriptor, it cannot be revoked again. However, a socket owner can, at anytime, close all connections to that socket. This will effectively unbind all existing descriptors to that socket on any peer, notifying each one of the destruction

right? (except that individual connections to a socket can be “revoked” i.e. closed, which is surely an improvement if anything).

Unlike sockets and file descriptors, peers cannot be addressed directly. In fact, peers are completely disconnected entities. A peer is merely an anchor of a set of sockets and file descriptors, including an incoming message queue for any of those. Whether multiple sockets are all part of the same peer, or part of different peers does not affect the remote view of those. Peers solely exist as management entity and command dispatcher to local processes.

I suspect the only difference is that each “peer” has a single receive queue for all its nodes, rather than one per connection.

The set of actors on a system is completely decentralized. There is no global component involved that provides a central registry or discovery mechanism. Furthermore, communication between peers only involves those peers, and does not affect any other peer in any way. No global communication lock is taken. However, any communication is still globally ordered, including unicasts, multicasts, and notifications.

I think this is meant to read as “no, it’s not the D-Bus daemon functionality being subsumed in the kernel”.

But as per all above, is Bus1 really necessary at all? Is multicasting to multiple clients so common that we need a whole new IPC mechanism to make it more efficient? Does global ordering of messages to different services ever actually matter? I’m not really convinced.

Advertisements

11 thoughts on “Bus1 is the new Kdbus

  1. So multicasting to multiple clients is very common. That’s one of the things that DBUS does a lot of, power events, screen locking, etc. A lot of that is sent from some system daemon to multiple things interested in the state of the system. The way bus1 works is similar (but not the same) to how Android’s Binder works. Bus1 isn’t capable on it’s own of being a DBUS replacement, it’ll still require a userspace manager for it but could allow for things like Binder and other proprietary message passing systems to be re-implemented against a standard kernel feature that’s available on all platforms.

    One of the big advantages over unix domain sockets is that it avoids the need for copying the message to every process that’s receiving the message like you need for either IP or a Unix socket. This is key to being able to efficiently handle workloads with multicast messaging, or for sending low latency high bandwidth data (such as sound or video) between processes.

    1. Ok – but most of these examples are trivial; you don’t need a dedicated multicasting mechanism to inform multiple clients that the screen is locking, for example. You just send the same message individually to what is likely a handful of clients, and you only do so infrequently.

      One of the big advantages over unix domain sockets is that it avoids the need for copying the message to every process that’s receiving the message like you need for either IP or a Unix socket

      For small messages, the copying doesn’t matter, and for larger messages (such as your sound/video example) you can use vmsplice or pass a descriptor to a memory-mapped region obtained using memfd_create.

      1. The only important distinction between sending the same message multiple times to several peers as unicast or sending it once as a multicast, is ordering. If you use repeated unicast you would break causality, i.e., if receiving one message causes a peer to send a second message it is important that no other peer who receives both messages receive the second (the effect) before the first (the cause).

  2. bus1 is not about performance, and is not about moving parts of dbus into the kernel.

    Your comparison with UDS is mostly correct, and people have used UDS to create capability-based IPC systems before (a comparison we also make in the various talks we gave about bus1). The problem really is message ordering, by having distinct message queues you lose the order between messages.

    > Does global ordering of messages to different services ever actually matter?

    Yes.

    > I’m not really convinced.

    Ok.

    1. > > Does global ordering of messages to different services ever actually matter?
      >
      > Yes.

      What I’m looking for is a concrete example of a case where this is important (and which can’t easily be solved by other means). Maybe there is such a case and maybe it’s not such a niche that it wouldn’t justify actually creating a new IPC mechanism, but I can’t think of one and have been unable to find such an example in the documentation / emails about Bus1. It’s all very well to claim that global ordering matters (especially with a bold “yes it matters” response to what was a reasoned exposition); could you give a meaningful example?

      (And, if it’s not about performance, couldn’t D-Bus solve this problem anyway?)

      1. Imagine an event happens in daemon A and a message is sent to its subscribers B and C notifying them of the event. As a result of receiving the notification, an event is triggered in daemon B causing it to call a method on C. What we give you is a guarantee that C receives the notification before the method call.
        Imagine daemon X calling two methods, one after the other on two different objects in daemon Y. We guarantee that the methods calls are received in Y in the order they were called in X.

        If these are properties you want, then use bus1, if not then maybe UDS will do (though there are other issues that you may or may not care about). Note that of course there are always ways to achieve similar properties higher up in the stack, but there will be tradeoffs. See for instance Mojo’s associated interfaces [0].

        > especially with a bold “yes it matters” response to what was a reasoned exposition

        Hey, you write a cocky blog, you get cocky responses, that’s only fair 😉

        [0]:

      2. > Imagine an event happens in daemon A …

        Alright, you’ve explained the global ordering thing again, but I’m not having trouble understanding the concept – what I meant by concrete example was a situation where global ordering actually mattered. So you daemon A/B/C example doesn’t help; yes, without global ordering, daemon A could send a message to B and C and B could in response send a message to C and C could receive the message from B before the message from A; what’s a situation where this matters? What sort of messages are we talking about here? The other poster above talked about power events / screen locking but these don’t have causality/ordering issues (at least that I can see)

        And maybe the answer is, “if that doesn’t matter to your particular use case, you don’t need Bus1” – and that’s fair enough, but I’m still curious about what particular real problem that Bus1 was created to address. I had assumed that the plan was for D-Bus to use this new interface (I mean, Bus1 was created on the heels of Kdbus, and Kdbus by name implies some association to D-Bus) and couldn’t understand what D-Bus clients needed global ordering that weren’t already getting it from D-Bus. Having quickly skimmed the document you linked above, the main trade-off with implementing such message system in userspace seems to be performance – hence my assumption that the motivation for Bus1 was performance based; after all, can’t you get the same functionality (even global message ordering, if you really need it) without a new kernel interface, if you don’t care about performance?

        > Hey, you write a cocky blog, you get cocky responses, that’s only fair

        Touché 🙂

      3. A is a dameon managing your low-level networking stack (monitoring and creating links, addresses and routes, but not implementing any policy), B is monitoring A and decides what network settings to apply to any given link that appears (what static IP addresses to configure, which links should have DHCP run on them, etc). C is a DHCP client that monitors A for links coming and going and B calls a method on C to start DHCP on a given link. Without the bus respecting causality, B must either proxy the information gathered from A in its request to C, or C must sync with A before processing the request from B. Of course that could be done, but it comes at a cost, it is unintuitive, fragile and usually entails redundant work and code to handle hard-to-hit edge cases.

        The point with Mojo was that they got around a similar problem by allowing interfaces to be explicitly “associated” with each other. The trade-off there is that they can no longer use one UDS socket per object, but have to share the UDS socket to regain ordering. Which in turn means the objects are tied together explicitly, and can no longer be passed around independently of one another. If the transport provided the ordering guarantees this would all go away. Keep in mind that these work-arounds are there to deal with potentially hard-to-hit race conditions, so the problem is not merely that the work-around is ugly, but that people may forget to employ it without easily hitting the problem.

        Long story short, what we really want to provide is a system that is as intuitive as possible, and causality is a big part of that in my opinion. These problems have to be solved somewhere in the stack, and we think solving them at the transport layer is the right place.

        Lastly, I should moderate one thing: of course performance matters. If one solution fundamentally does twice or three times the amount of work compared to another, it is not going to fly. However, I want to get away from this notion that performance is our main target. We want a general-purpose capability based IPC solution for linux with the features that entails, and a subsidiary goal is that it should not suck (i.e., it should be performant, scalable, secure, …).

      4. Ok, I’m starting to see it. Would it be fair to say that Bus1 isn’t trying to solve a problem that couldn’t be solved another way, it’s rather just trying to provide a better, more flexible IPC mechanism than what is already available? (So that, for instance, you don’t need a daemon running over unix-domain sockets to act as bus arbiter/communications nexus, with all the drudgery which that entails – though this in itself might not be the primary motivation). Thus, me arguing that you don’t need Bus1 is un-disprovable, but also missing the point – that it’s not about any actual need, it’s about improvement (including simplification and ease of use for various purposes).

        I can get behind that, I think. I still maintain that this is moving (strictly, copying) a function that is currently provided by D-Bus into the kernel – since unless I’m mistaken D-Bus does or at least could be made to provide global ordering, and certainly it provides the other aspects of bus functionality – but I see what you mean now when you say it’s not about performance. (I’m aware too of the allocation management that Bus1 performs; I’m not sure if this would be possible to implement in D-Bus, though I suspect it might be).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s