C++ and pass-by-reference-to-copy

I’m sure many are familiar with the terms pass-by-reference and pass-by-value. In pass-by-reference a reference to the original value is passed into a function, which potentially allows the function to modify the value. In pass-by-value the function instead receives a copy of the original value. C++ has pass-by-value semantics by default (except, arguably, for arrays) but function parameters can be explicitly marked as being pass-by-reference, with the ‘&’ modifier.

Today, I learned that C++ will in some circumstances pass by reference to a (temporary) copy.

Interjection: I said “copy”, but actually the temporary object will have a different type. Technically, it is a temporary initialised using the original value, not a copy of the original value.

Consider the following program:

#include <iostream>

void foo(void * const &p)
{
    std::cout << "foo, &p = " << &p << std::endl;
}

int main (int argc, char **argv)
{
    int * argcp = &argc;
    std::cout << "main, &argcp = " << &argcp << std::endl;
    foo(argcp);
    foo(argcp);
    return 0;
}

What should the output be? Naively, I expected it to print the same pointer value three times. Instead, it prints this:

main, &argcp = 0x7ffc247c9af8
foo, &p = 0x7ffc247c9b00
foo, &p = 0x7ffc247c9b08

Why? It turns out that what we end up passing is a reference to a temporary, because the pointer types aren’t compatible. That is, a “void * &” cannot be a reference to an “int *” variable (essentially for the same reason that, for example, a “float &” cannot be a reference to a “double” value). Because the parameter is tagged as const, it is safe to instead pass a temporary initialised with the value of of the argument – the value can’t be changed by the reference, so there won’t be a problem with such changes being lost due to them only affecting the temporary.

I can see some cases where this might cause an issue, though, and I was a bit surprised to find that C++ will do this “conversion” automatically. Perhaps it allows for various conveniences that wouldn’t otherwise be possible; for instance, it means that I can choose to change any function parameter type to a const reference and all existing calls will still be valid.

The same thing happens with a “Base * const &” and “Derived *” in place of “void * const &” / “int *”, and for any types which offer conversion, eg:

#include <iostream>

class A {};

class B
{
    public:
    operator A()
    {
        return A();
    }
};

void foo(A const &p)
{
    std::cout << "foo, &p = " << &p << std::endl;
}

int main (int argc, char **argv)
{
    B b;
    std::cout << "main, &b = " << &b << std::endl;
    foo(b);
    foo(b);
    return 0;
}

Note this last example is not passing pointers, but (references to) object themselves.

Takeaway thoughts:

  • Storing the address of a parameter received by const reference is probably unwise; it may refer to a temporary.
  • Similarly, storing a reference to the received parameter indirectly could cause problems.
  • In general, you cannot assume that the pointer object referred to by a const-reference parameter is the one actually passed as an argument, and it may not exist once the function returns.

OpenGL spec for glDrawRangeElementsBaseVertex is rubbish

The title says it all: the spec for the glDrawRangeElementsBaseVertex function is rubbish.

glDrawRangeElementsBaseVertex is a restricted form of glDrawElementsBaseVertex.

Ok, but:

mode, start, end, count and basevertex match the corresponding arguments to glDrawElementsBaseVertex, with the additional constraint that all values in the array indices must lie between start and end, inclusive, prior to adding basevertex.

glDrawElementsBaseVertex doesn’t have a start or end argument. Perhaps the above should say “mode, count, type, indices and basevertex”, since type and indices seem to have the same meaning for both functions?

Index values lying outside the range [start, end] are treated in the same way as glDrawElementsBaseVertex

But… you just said that all the index values must be inside that range. Perhaps substitute “outside” with “inside” to make this sentence make sense?

Does no-one proof-read this stuff? bug submitted.

Update: so it turns out that ‘in the same way as glDrawElementsBaseVertex’ is supposed to mean ‘in an implementation-defined manner consistent with how similarly out-of-range indices are treated by glDrawElementsBaseVertex’. I feel like the wording could be much clearer but I’m not going to argue this one. The parameter specifications are clearly incorrect and this should be fixed in a revision.

When technical argument gets heated

I’m going to stray away from usual topics and write about a recent experience with a discussion I had on an issue tracker for the OSMC open-source project (it’s a media center / linux distribution, based around Kodi). This wasn’t a pleasant experience, essentially because of one toxic developer, and I find myself still needing to write down some things about it that still trouble me, but there is no other suitable medium for doing so. I don’t feel that I was in the wrong, however, I’ll try to focus on what I could have handled better (especially in terms of winning the technical argument) rather than what others might have said or done that I find offensive. This is a somewhat personal blog entry, but I suppose these situations happen to most of us at some point or other.

The original discussion in full is still available here, though further comments have been blocked, but I’ll try to re-cap in what I hope is an even-handed manner.

1. The Issue

A user had noticed that OSMC was enabling the SSH service in its default configuration, with a default username and password, and without prompting or informing the user in any way. I would like to think at this point that any readers are already horrified, and I certainly was, but this is certainly not a universal reaction: the issue was summarily closed without being addressed. One of the developers gave some reasons for this: essentially, SSH is useful as a recovery option in case of failure or misconfiguration that prevents other access, but the user could not be expected to remember a password; furthermore, OSMC is designed to be used behind a NAT (by which he meant firewall, but that’s nit-picking). Finally, OSMC can be configured without a keyboard attached (using just a remote control), which makes choosing a password difficult.

These reasons weren’t good enough for the original reporter nor for several others who chimed in (before I did), and there was some ongoing discussion (with the issue held in the CLOSED state). Points of contention included:

  • Whether being on a private network is a good enough security measure
  • To what level user convenience (such as not having to remember a password and being able to use SSH when necessary without having to enable it first) trumps security, and vice versa. (Bear in mind that forgetting the SSH password potentially requires the user to re-image their device, though I personally don’t consider this necessarily adds much weight, relatively speaking).
  • Whether restricting incoming SSH connections to end points on the local network would suffice as a mitigation.
  • Whether disabling SSH by default would be a reasonable change.

Some arguments were made for making no changes, that I felt were markedly dubious, including:

If we disable SSH, then where should the line be drawn? There is a libmicrohttpd service on Port 80 running with a JSON-RPC interface. Unfortunately, there is a Kodi 0-day vulnerability which allows a system to be compromised with this.

(I mean, your excuse for not securing one part of your system is that there’s a known vulnerability in another part? What?) I eventually stepped in, and I wanted to hit hard – really drive the point home:

You leave a service like SSH running with a default password and I pretty much guarantee you just created a botnet. Well done. News flash: your users won’t do what you want or expect them to do. A proportion of the devices will get exposed to the wider world, through accident or negligence – yes, even a proportion of that alleged 99% that sit behind a router; you can’t prevent this, but you can mitigate the damage, and it would be irresponsible not to do so.

Now, the first sentence was (I thought) obvious hyperbole designed to stress a particular point. This brings me to my first mistake:

#1 Avoid Hyperbole and Exaggeration

Although in this case I though it was obvious that I wasn’t saying there would be an actual botnet at any particular point in time if this issue is not addressed, the problem was that people later argued against this point as if I had meant it literally (I was essentially challenged to produce evidence of the existence of a botnet). And though this seems ridiculous, it gave people a way to argue in which it looked (superficially) like they were making a valid point which reduced my own.

I continued with some brief conjecture about how the issue could be resolved in terms of implementation details. I think this was also a mistake, but it’s more of a technical nature. I shouldn’t have gotten caught up in implementation details, and instead just stuck to the main point: ssh with default password enabled by default is very bad. (You can always argue the less important details later, once the important battles are won).

#2 Don’t bow out early

After rebutting another user’s comment (which was apparently later deleted), I concluded with:

(I’m done with the discussion. I hope you guys see the light eventually. Good luck).

I should not have said this. I wasn’t done. I got a reply, and felt compelled to respond.

#3 Don’t make throwaway comments

I continued arguing against another user (who later I discovered was actually an OSMC developer), one “KODeKarnage”. KODeKarnage had said:

There is no such thing as 100% secure, at some point you have to be ok with good enough, and blithely ignoring the costs of “mitigation” doesn’t make them disappear.

I responded with:

What, the “cost” of users needing to actually remember their password? In my view cost/benefit analysis weighs heavily in favor of not having a default password. I’m surprised that anyone would disagree with this, but there it is, I guess.

There’s probably a couple of things to be said about this. Firstly, I was too flippant about the costs KODeKarnage was referring to. At this stage as then, I feel those costs are insignificant compared to the problem of having a trivially compromisable system on your network, but being so flippant allowed KODeKarnage to claim that I hadn’t really considered their point, which indeed became an ongoing theme (along with, shortly afterwards, insults to my intelligence and insinuations of arrogance).

Secondly, that last sentence. I was surprised that people disagreed, but saying so didn’t help matters; indeed, apparently, it was construed as an insult, which the same KODeKarnage revealed much further down the line. The problem with this sentence, even if one doesn’t consider it inflammatory (and I don’t), is that it is unnecessary to the discussion at hand. Any time you say something unnecessary, you’re giving those who disagree potential fuel to use to detract (or at least distract) from your valid arguments.

A more immediate response from KODeKarnage was:

Relying on the password alone isnt good enough. Relying on your network security IS good enough.

And the cost isn’t people REMEMBERING their password. The cost is people FORGETTING their password.

Both of those facts are obvious to anyone who has actually considered the issue. Maybe come back when you have given this more than a few seconds thought.

I rebutted the first two points and ended by quoting the final sentence, responding with:

I don’t see the point in arguing with you at this stage; you’re being arrogant and rude.

Which brings us to point #4.

#4 If you’re not going to argue with someone, don’t argue with them.

I shouldn’t have bothered rebutting the first two points. They were insignificant compared to the personal insult which had been issued. Replying to them detracted from the real issue here, which was that this person was being a prat. I could have either ignored them, or dismissed them without trying to counter them – normally not a valid tactic in a technical debate, but at this point, the debate had gone beyond the technical.

I left the conversation, for some time, after being insulted. For a while, nothing happened. Some time later, I noticed that KODeKarnage had made a commit that looked as if he was going to implement the ability to change the SSH password, as well as disable the service, during the initial setup. A little later, the password change functionality (which had only reached the stage of being a stub) was removed, though the ability to disable SSH remained in the code base.

More time passed, during which nothing much else happened.

Then, the discussion awoke as it was joined by another user, who agreed that having a default username/password was bad and didn’t think that the option to disable SSH, as had been implemented, really was a satisfactory solution. There was a little more to-and-fro, and at this point I got used as an excuse to forgo further discussion:

However, outlandish statements such as

pretty much guarantee you just created a botnet

make it increasingly challenging for us to take further comments on this issue seriously.

Yep, that’s my “botnet” quote – refer to mistake #1. Now the price for that mistake was being paid. I feel the statement quoted above (originally by one “samnazarko”) is ridiculous, and indeed samnazarko was called out for taking my statement far too literally by the other user (not by me) – but arguably the damage was done. Essentially, the opposition had been given a gift in the form of a “clearly ridiculous statement”. It didn’t matter to them that it was fundamentally incorrect to take this statement literally, because by doing so they were essentially able to hold up a hand and say, “no, stop, you’re obviously wrong, conversation is over”.

However, the argument continued. There was a lot of fluff about IPv6 and “NAT vs firewall” that doesn’t really matter too much. Eventually the original reporter of the issue (vexto1) waded back in and revealed that a private discussion between himself and samnazarko had resulted in the promise that it would become possible to disable SSH. Indeed, this had been delivered, but as vext01 noted (and as had been noted by the other user who was relatively new to the discussion, joepie91), the option was enabled by default and the help text was insufficient to dissuade users from enabling it on no more than a whim. Although this had been pointed out earlier, samnazarko now relented:

I am sure we can make that change

Battle won, right? I took a moment to stick one, ever so gently, to this KODeKarnage who had been obnoxious earlier:

I’m glad that you’re addressing this, especially after developer comments such as those above (“Relying on your network security IS good enough”, “Maybe come back when you have given this more than a few seconds thought”). It’s a shame that so much resistance was put up initially. I hope you can keep OSMC secure, going forward, and also that your community can become less toxic towards users who try to constructively discuss any issues. The irony that the same developer who told me that my opinion was both wrong and worthless has apparently themselves given more than “a few seconds thought” to the issue now (and implemented SSH disable option) has not escaped me.

I know that some people will consider this mistake #5, though I honestly don’t feel it was. The technical battle had, fortunately, been won by this point. Was it cheeky? yes. Provocative? I suppose, though in all honesty I never expected KODeKarnage to bite on this.

They did so, however:

You said you were surprised anyone would disagree with you, but then spent absolutely no time trying to appreciate why that might be the case.

(“What, the “cost” of users needing to actually remember their password?”)

You completely ignored the massive costs of having users effectively locked out of their device and unable to implement fixes. In fact, you dismissed ALL costs as a mere “smidgen”.

You either did not spend any time thinking about those costs, or you did spend time but couldn’t think of them, or you could think of them but placed zero value in them. THAT is arrogant, and it betrays your feelings toward the people who would pay those costs.

Your fantasies about bot-nets notwithstanding (and are you going to pay up on that “pretty much guarantee”?) you didn’t give this more than a few seconds thought. You embraced your priors and waved away the effects on other people. Your cost-benefit analysis amounted to little more than extolling the benefits and kicking the costs off the scale.

And notice that not once have I ever dismissed the user being able to change the password as a entirely bad thing. But there are costs to forcing them to do it, and those should be acknowledged and weighed against the benefits.

From the very start you treated the costs as non-existent, and insulted the intelligence of people who didn’t value them the same.

What ensued was a to-and-fro which got heated, though I did avoid throwing any insults of my own until towards the end, at which stage I felt that it was clear that no amount of reasonable discussion was going to get anywhere.

Which takes me the final valuable lesson:

#5 Don’t keep arguing when you’ve already won.

It just frustrates you and takes the pleasure of the victory away.

I had the right, I believe, to feel insulted at being called arrogant. I was also outraged at the fact that I was being accused of “insulting the intelligence of people who didn’t value the same” – when that was clearly what had been done to me, not the other way around. I wanted to respond, and I believe that defending yourself against an allegation like the above is reasonable.  But when I responded, I tried to rebuff each (or at least several) of the points in turn, which just led to a shit-storm of messages to and fro which were almost certainly of no interest to anyone but the two of us. I even broke rule #2 (don’t bow out early) again.

What I should have done was dismissed KODeKarnage’s statements above – all of them – with one or two simple sentences – along the lines of “No, you’re misrepresenting my position, but I refuse to argue with you further” – and then denied the allegation categorically – in a nice, straightforward message that couldn’t easily lead to further argument. I regret not doing so, now. It would have saved me some time and some stress.

I also fired off a twitter message to express my frustration, not thinking that it might get beyond my immediate circle of followers – unfortunately it was seen by samnazarko who used it as a reason to lock the discussion (I would have thought that various of KODeKarnage’s insulting statements would have been enough to warrant that on their own, but I suspect that developers within a project have a certain tolerance for each other, and it’s possible that they know each other in person).

In any case, what’s done is done. I said all that I really needed to say in the discussion, and we got the right result, but I should have reined myself in to a much greater extent. I want to try harder to do that the next time I’m drawn into such a discussion.

Final shots

I’ll just leave these two great quotes from KODeKarnage here:

Both of those facts are obvious to anyone who has actually considered the issue. Maybe come back when you have given this more than a few seconds thought.

and then:

Either it was obvious or it took more than a few seconds thought. You can’t have it both ways.

Indeed, you can’t.

D-Bus is completely unnecesary

So for various reasons I’ve got to thinking about D-Bus, the “system bus” that’s found its way into most (all?) Linux distributions and possibly some other OSes as well. I’m more and more coming to believe that D-Bus is too heavyweight a solution for the problems it was intended to solve – mostly because it requires a daemon, and because we don’t really the bus aspect (at least, not in the form in which it’s been implemented).

How did I come to this conclusion? Well, consider first that, in essence, D-Bus provides only the following functionality:

  • Protocol (and associated library) for a simple RPC mechanism (you can call a method in a “remote” process, and receive a reply).
  • Basic security for this RPC. Basically the policy specification allows different between different users and groups, and whether the user is “at the console”.
  • Map service names to the processes that provide the service (for purposes of connecting to said service).
  • “Bus activation”, i.e. starting a process which provides a service when the given service is requested but is not currently being provided.

That’s really all there is to it. The fact that these are the only functions of D-Bus is, in my view, a good thing; it fits with the mantra of “do one thing, and do it well”. However, the question is if D-Bus is actually needed at all. Consider the “traditional approach” where services each have their own unix-family socket and listen on that for client requests. How does that compare to the feature set of D-Bus? Well:

  • Protocol is application specific (D-Bus wins)
  • Security configuration is also application specific (D-Bus wins)
  • Service is determined by socket path (about as good as D-Bus, since service names and socket paths are both pretty arbitrary)
  • No activation – server needs to be running before client connects (D-Bus wins)

So, in general, D-Bus wins, right? But wait – there are a few things we haven’t considered.

Like libraries.

Yes, protocols for communication with server programs via a socket are generally application specific, but on the other hand, there is usually a library that implements that protocol, which takes some of that pain away. Of course these libraries are language-specific, so there is a duplication of effort required in adapting library interfaces to other programming languages, but there are tools like SWIG which can assist here. Also, there’s the possibility that services could offer a standard protocol (similar to the D-Bus protocol, even) for communication over their respective sockets. It turns out there are options available which give some basic functionality; I’m sure that it wouldn’t be too hard to come up with (or find [1] [2]) something which offer basic introspection as well. Heck, you could even use the D-Bus protocol, just without the intermediary server!

How about security? Well, because sockets themselves can be assigned unix permissions, we actually get some of the D-Bus security for free. For any given service, I can easily restrict access to a single user or a single group, and if I wanted to I could use ACLs to give more fine-grained permissions. On the other hand, central configuration of security policy can be nice. But that could be implemented by a library, too. If only there was some pre-existing library specifically for this purpose… ok, so PAM isn’t really meant for fine-grained security either, since it breaks things down to the service level and not any further. But it wouldn’t be hard to come up with something better – something that allows fine-grained policy, with significantly greater flexibility than D-Bus offers.

As mentioned above, there’s pretty much no difference between choosing a service name and choosing a socket path, so I don’t need to say much about this. However, it would certainly be possible to have a more standard socket path location – /var/run/service/name or something similar.

That just leaves activation as the only issue. D-Bus can start a process to provide a requested service. But it happens that this can be done quite easily with plain unix sockets, too; it’s called “socket activation” in the SystemD world, and it’s supported by various other service managers as well (including my own still-in-alpha-infancy Dinit). You could even potentially have service handover (processes queuing to own the socket), just like D-Bus has, though I don’t know if any existing service managers can do this (and on the other hand, I’m not even convinced that it’s particularly valuable functionality).

So: replace the protocol and security with libraries (potentially even a single library for both), and use your service management/supervision system for activation. Why do we actually need the D-Bus daemon?

I have a vague idea that some library might spring into existence which implements an abstraction over the D-Bus API but using the ideas presented here (which, let’s face it, are astounding only in their simplicity) instead of communicating with a single central daemon. That is, connecting to a D-Bus service would just connect to the appropriate (some name-mangling required) socket within /var/run/service, and then speak the D-Bus protocol (or another similar protocol) to it. Likewise, requesting a service would instead attempt to listen on the corresponding socket – for unprivileged processes this might require assistance from a SUID helper program or perhaps the requisite functionality could be included in the service manager. Unfortunately I’ve got too many other projects on my plate right now, but maybe one day…

 

Maven in 5 minutes

Despite having been a professional Java programmer for some time, I’ve never had to use Maven until now, and to be honest I’ve avoided it. Just recently however I’ve found myself need to use it to build a pre-existing project, and so I’ve had to had to get acquainted with it to a small extent.

So what is Maven, exactly?

For some reason people who write the web pages and other documentation for build systems seem to have trouble articulating a meaningful description of what their project is, what it does and does not do, and what sets it apart from other projects. The official Maven website is no exception: the “what is Maven?” page manages to explain everything except what Maven actually is. So here’s my attempt to explain Maven, or at least my understanding of it, succintly.

  • Maven is a build tool, much like Ant. And like Ant, Maven builds are configured using an XML file. Whereas for Ant it is build.xml, for Maven it is pom.xml.
  • Maven is built around the idea that a project depends on several libraries, which fundamentally exist as jar files somewhere. To build a project, the dependencies have to be download. Maven automates this, by fetching dependencies from a Maven repository, and caching them locally.
  • A lot of Maven functionality is provided by plugins, which are also jar files and which can also be downloaded from a repository.
  • So, yeah, Maven loves downloading shit. Maven builds can fail if run offline, because the dependencies/plugins can’t be downloaded.
  • The pom.xml file basically contains some basic project information (title etc), a list of dependencies, and a build section with a list of plugins.
  • Dependencies have a scope – they might be required at test, compile time, or run time, for example.
  • Even basic functionality is provided by plugins. To compile Java code, you need the maven-compiler-plugin.
  • To Maven, “deploy” means “upload to a repository”. Notionally, when you build a new version of something with Maven, you can then upload your new version to a Maven repository.
  • Where Ant has build targets, Maven has “lifecycle phases”. These include compile, test, package and deploy (which are all part of the build “lifecycle”, as opposed to the clean lifecycle). Running a phase runs all prior phases in the same lifecycle first. (Maven plugins can also provide “goals” which will run during a phase, and/or which can be run explicitly in a phase via the “executions” section for the plugin in the pom.xml file).
  • It seems that the Maven creators are fans of “convention over configuration”. So, for instance, your Java source normally goes into a particular directory (src/main/java) and this doesn’t have to specified in the pom file.

Initial Thoughts on Maven

I can see that using Maven means you have to expend minimal effort on the build system if you have a straight-forward project structure and build process. Of course, the main issue with the “convention over configuration” approach is that it can be inflexible, actually not allowing (as opposed to just not requiring) configuration of some aspects. In the words of Maven documentation, “If you decide to use Maven, and have an unusual build structure that you cannot reorganise, you may have to forgo some features or the use of Maven altogether“. I like to call this “convention and fuck you (and your real-world requirements)”, but at least it’s pretty easy to tell if Maven is a good fit.

However, I’m not sure why a whole new build system was really needed. Ant seems, in many respects, much more powerful than Maven is (and I’m not saying I actually think Ant is good). The main feature that Maven brings to the table is the automatic dependency downloading (and sharing, though disk space is cheap enough now that I don’t know if this is really such a big deal) – probably something that could have been implemented in Ant or as a simple tool meant for use with Ant.

So, meh. It knows what it can do and it does it well, which is more than can be said for many pieces of software. It’s just not clear that its existence is really warranted.

In conclusion

There you have it – my take on Maven and a simple explanation of how it works that you can grasp without reading through a ton of documentation. Take it with a grain of salt as I’m no expert. I wish someone else had written this before me, but hopefully this will be useful to any other poor soul who has to use Maven and wants to waste as little time as possible coming to grips with the basics.

Why is there no decent, simple, structured text format?

So I want a structured text format usable for configuration files and data interchange. My key requirements can be boiled down to:

  • Syntax allows concise expression (this alone rules out XML)
  • Simple to parse (this also rules out XML)
  • Suitable for human “consumption” (reading, editing). To some degree, this rules out XML.

As you can see, XML is definitely out. But oddly enough, I’m struggling to find anything I’m really happy with.

Oh yeah, I know, I know, JSON, right? I have these main problems with JSON:

1. Excessive quotation marks required

So for a simple set of key-values I need something like:

{
    "key1" : "overquoted",
    "key2" : "the perils of javascript"
}

I mean this isn’t crazy bad, but why are the quotes even necessary? I mean wouldn’t it be nice if I could instead write:

{
    key1 : overquoted,
    key2 : "the perils of javascript"
}

Given that, at the point key1 and key2 appear, an alphabetical character may not otherwise legitimately be present, what would be the harm in allowing this? (sure, if you want spaces or punctuation in your key, then you should need to quote it, but not otherwise). Similarly for values. It would be nice if those unnecessary quotes weren’t actually required.

2. No comments

This one really irks me. For a configuration file, comments are pretty much mandatory. Douglas Crockford gives an explanation for why there are no comments in JSON (why he removed comments from the spec, in fact), and it sucks. Basically: people weren’t using comments the way I wanted them to, so I removed comments. Yeah, I think we just hit ludicrous speed. There are so many things wrong with this argument I barely know where to begin. At the very outset, anyone using comments as parsing directives was going to need a custom parser anyway – what they were dealing with wasn’t plain JSON. So actually changing JSON does not affect those people; they will continue to use their custom parsers. In fact all you do by removing comments is make the standard less useful generally. The follow up:

Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser.

… is equally ridiculous. I mean sure I could strip comments before handing off the the parser, but then my original data isn’t actually JSON, is it? And so interoperability is destroyed anyway, because I can no longer use any standard-JSON tools on my configuration files.

3. Unclear semantics

The current RFC for JSON has, in my opinion, a lot of guff about implementations that just doesn’t belong in a specification. Problematically, this discussion could be seen to legitimise limitations of implementations. Take for example:

An object whose names are all unique is interoperable in the sense that all software implementations receiving that object will agree on the name-value mappings. When the names within an object are not unique, the behavior of software that receives such an object is unpredictable.

What does that mean, exactly? That it’s allowed that objects with non-unique names cause software to behave “unpredictably”? This is meant to be a specification imposing requirements on implementations, but it’s hard to glean from this text precisely what the requirements are, in particular because (just above that):

The names within an object SHOULD be unique.

That’s SHOULD, not MUST (see RFC 2119); which implies that non-unique names are in fact permissible and must be handled by an implementation. I wonder how many JSON libraries will properly represent such an object… not many, I’d guess.

So How About YAML?

YAML does solve the problems that I identified with JSON above. It doesn’t require superfluous quotation marks, it’s clear about map semantics, and it allows comments. On the other hand, its specification is quite large. Part of the complexity comes from the concept of tags which are a way of identifying types. While the YAML core specification (“failsafe schema”) deals only with maps, sequences and strings, it allows for explicitly tagging any value as a particular type identified by a “tag”. The schema notionally defines how tags are mapped to actual types and allows for specifying rules for determining the type of otherwise untagged ‘plain scalar’ (roughly: unquoted string) values. So for instance the JSON schema – which makes YAML a superset of JSON – maps sequences of digits to an integer type rather than the string type. The fact that different schemas yield different semantics, and that arbitrary types (which a given implementation may not know how to handle) can be assigned to values, in my opinion reduces YAML’s value as an interchange format.

(For instance, a tool which merges two YAML maps needs to know whether 123 and “123” are the same or not. If using the failsafe schema, they are strings and are the same; if using the JSON schema, one is a number and they are not the same).

In fact, the whole notion of schemas leads to the question of whether it is really up to the text format to decide what type plain nodes really are. Certainly, maps and sequences have a distinct type and are usually unambiguous – even YAML doesn’t allow a schema to re-define those – and are enough to represent any data structure (in fact, just sequences would be enough for this). I also think it’s worth while having a standard quoting mechanism for strings, and this is necessary to be able to disambiguate scalar values from structures in some cases. But beyond that, to me it seems best just to let the application determine how to interpret each scalar string (and it can potentially use regular expressions for this, as YAML schemas do), but that for purposes of document structure scalars are always just strings. This is essentially what the YAML failsafe scheme does (and it even allows disambiguating quoted strings from unquoted strings, since the latter will be tagged with the ‘?’ unknown type).

It’s worth noting that YAML can handle recursive structures – sequences or maps that contain themselves as members either directly or indirectly. This isn’t useful for my needs but it could be for some applications. On the other hand, I imagine that it greatly complicates implementation of parsers, and could be used for attacks on poorly coded applications (it could be used to create unbounded recursion leading to stack overflow).

Or TOML?

TOML is a relative newcomer on the scene of simple structured text formats. It has a fixed set of supported types rather than allowing schemas as YAML does, and generally aims to be a simpler format; on the other hand it is much closer to YAML in syntax than JSON and so is much easier to read and edit by hand.

Among the supported types are the standard map / sequence / string, but also integer, float, boolean and date-time. This seems fine, but again I’m uncertain that having more just than the basic “string” scalar type is really necessary. On the other hand having these types properly standardised is unlikely to cause any harm.

I think the one downside to TOML is the ungainly syntax for sequences of maps – it requires double-square brackets with the name of the sequence repeated for each element:

[[sequencename]]
key1 = value1
key2 = value2
[[sequencename]]
key1 = value1
key2 = value2

Nested maps are also a bit verbose, requiring the parent map name to be given as a prefix to the child map name:

[parent]
[parent.child]
key1 = value1  # this key and all following keys are in the child map

The top level node of a TOML structure, if I understand correctly, must always be a map, since you specify key-value pairs. This is probably not a huge concern for my purposes but is certainly a limitation of the format. Once you’ve opened a map (“table” in TOML parlance) there’s also no way to close it, it seems, other than by opening another table.

I think the occasional ugliness of the syntax, together with the immaturity of the format, are deal breakers.

And so the winner…

… Is probably YAML, at this stage, with the failsafe schema, although the potential for recursive structures makes me a little uneasy and it’d be nicer if I didn’t have to explicitly choose a schema. It’s also a shame that the spec is so wordy and complex, but the syntax itself is nice enough I think and seems like a better fit than either JSON or TOML.

D-Bus, ConsoleKit, PolicyKit: turds upon turds upon turds

Recently I’ve tasked myself with trying to modernise my system a bit (see my previous post entitled “On Going 64-bit” for some background). Part of this is to upgrade certain base components of the system to allow better interoperability with desktop environments and programs utilising recent desktop standards. For various reasons I want to avoid installing SystemD (some background here), but that means I need to find other ways to provide various services otherwise implemented by it.

Replacements for SystemD

For my system init I’ve previously been using “SimpleInit” which used to be part of the Util-linux package (but which is no longer included in that package). More recently I’ve been toying with writing my own replacement which provides proper service management (starting and stopping services, and handling service dependencies), but that’s probably the topic of a future post; suffice to say that there are plenty of alternatives to SystemD if you need an init.

The first major piece of (non-init) functionality normally provided by SystemD is that of maintaining the /dev hierarchy (creating device nodes and various symbolic links, assigning appropriate ownership and attributes, and providing a service for other programs to be able to listen for device hotplug events). Before SystemD subsumed it, this task was performed by a daemon called Udev. There is a fork called Eudev maintained by some Gentoo folk which seems up to the task (and which is maintained), so I’m using that.

Part of what SystemD provides is seat and session management (via a service referred to as logind). In short, a seat is a set of input/output hardware usable for direct user interaction (mouse, keyboard, screen); a normal desktop computer has one seat (but you could probably plug in an extra mouse, keyboard and graphics card and call that a second seat). So “seat management” is essentially just about assigning hardware devices to a seat ID. If I log in to (create a session on) some seat then the devices belonging to that seat should be made accessible to me. Ideally you would be able to start the X server and tell it what seat to use and it would then automatically find the right keyboard, mouse and display adapter (in practice this is not possible, yet; you need to create separate configuration files for each seat).

The seat/session management aspect of SystemD actually subsumes the old ConsoleKit, which is no longer maintained. Fortunately a successor called ConsoleKit2 has been forked. ConsoleKit2 seems to me to be some way from finished, but it at least seems to run ok. ConsoleKit2 exposes, like SystemD’s logind, various functionality to manage sessions on seats, and also system shutdown/restart functions; all are exposed via D-Bus.

Finally, ConsoleKit2 (and SystemD, presumably) requires PolicyKit (apparently now called just Polkit) in order to allow policy about who can actually utilise the shutdown functionality. PolicyKit provides a D-Bus API that allows a client to query, essentially, whether a certain user can perform a certain action; the policy controlling the response is configurable. You can allow actions to be performed by certain users, or by users by belonging to certain groups, or by users who are logged in to a local seat, for instance (this latter requires ConsoleKit[2] functionality – so ConsoleKit and PolicyKit are highly coupled, but you can build ConsoleKit2 without PolicyKit and just not provide the shutdown scripts).

Ok, so let’s examine each of these components.

D-Bus

D-Bus is a socket-based service that essentially provides Remote Procedure Call (RPC) functionality. You ask for a service by name (actually, you provide a connection id or name and an object name), D-Bus launches it if it’s not running already and provides a handle, and you then send signals to / call methods on the service instance (by name, potentially with parameters that need to be marshalled to send them over the wire). It is usual to run a system bus instance and a per-session bus instance; typically, configuration allows recognized service names to be owned only by particular users (such as root).

I don’t feel like there’s anything wrong with the concept of D-Bus, but I do have some issues with the implementation:

  • The system instance exposes a unix socket  (/var/run/dbus/system_bus_socket) with rwxrwxrwx permissions (all users can connect). To me, this seems like a local Denial Of Service attack waiting to happen.
  • That D-Bus handles method calls to the service seems like one step too far. Connecting to a service should just give you a stream and the protocol you use from that point should be up to the service. Allowing introspection and the ability to call methods via a script is nice, I suppose, but not necessary; it imposes a protocol with certain limitations.

Still, it could’ve been worse. At least D-Bus doesn’t use XML for the transport protocol (even though it does for its configuration files, urgh).

ConsoleKit[2] / logind

So, this is where shit starts to get messy. Basically the API provided is way too heavy for the few tasks that should really be needed. As far as use cases that I can see:

  • Screen savers might want to be notified if their containing session goes inactive, so they don’t waste processor cycles.
  • I guess the ability to request a system shutdown/restart, and to inhibit such, is useful (but why put it under the same service name as the session tracking stuff, for Pete’s sake…)
  • Basic session tracking; we might want to know who is logged on and to what seat, to implement an equivalent of the “who” command for instance.
  • I guess the ability to kill off an entire session could be useful, as could waiting for all processes in a session to finish.
  • Whatever basic functionality is necessary to make a determination of whether a user has a local session. PolicyKit would use this; see below.
  • Fast user switching functionality: see below.

This is pretty much what’s provided by SystemD (see here) or ConsoleKit2 (here). Yes, the APIs are essentially the same but for some reason the SystemD folk changed the D-Bus service name. Breaking stuff to look tough, I guess.

So-called “fast user switching” (running two or more sessions on one seat and switching between them) requires the ability to list sessions on a seat, and then the ability to activate one of these sessions, but I’m not sure that the latter functionality should be part of ConsoleKit anyway. Consider the case of Linux virtual terminals; these share a seat, and switching to another session is a matter of pressing CTRL+F1 for instance (but can also be done via an ioctl call); a user-switch applet in X (at this stage) works by just running different X sessions on different virtual terminals and switching to the appropriate virtual terminal programmatically. However, a better-designed window system might allow running multiple sessions on the same virtual terminal (consider even the case of running multiple embedded X servers such as Xephyr within a single parent X display, where switching sessions just means hiding one Xephyr window and making another visible). In this case you have a kind of virtual seat on top of a session (which is itself on top of a real seat). ConsoleKit could not be expected to know how to bring such a session to the foreground (although it might allow registering session handlers so that the session initiator can provide such a function, and in fact this seems like the perfect solution).

Note that neither ConsoleKit[2] nor SystemD allow for handling the above situation. The stupid API design means that the session manager needs to know how to switch the active session on a seat.

In terms of implementation, most of the complexity comes from the apparent desire to prevent processes from “escaping” their session (by detaching from the controlling terminal and running as a daemon, or in general by refusing to die when their parent process does). This is necessary for a reliable implementation of one of the features listed above – the ability to kill off an entire session. SystemD tries to use cgroups to group processes in a session, but it’s not clear to me that this is foolproof (since I don’t think it’s possible to atomically kill off all processes in a cgroup, although the freezer and possibly cpuset groups could be used to do that in a slightly roundabout way; the documentation implies that SystemD doesn’t do this, as it requires CONFIG_CGROUPS but “it is OK to disable all controllers”). ConsoleKit2 can also use cgroups to manage sessions, but this depends on cgmanager, and in any case ConsoleKit2 does not provide for terminating sessions (a la logind’s TerminateSession method).

So anyway, the API kind of sucks but could probably be worked into something a bit better easily enough. Also, ConsoleKit2 needs some work before it provides all the session management functionality of SystemD. Which brings us to the final piece of the puzzle.

PolicyKit (Polkit)

I wasn’t even going to bother with Pol[icy]Kit, but ConsoleKit2 gives you a nasty (if ultimately rather meaningless) warning at build time if you don’t have it, and anyway it would be nice if the “shutdown” functionality of my desktop actually worked. The principle behind Polkit is pretty straightforward: user tries to invoke a command on a service (whether by D-Bus or some other means); the service asks Polkit if the user (apparently identified by any of session, process, and/or user id) is allowed to invoke that command; Polkit checks its configuration and replies yay or nay. Sometimes Polkit wants authentication (a password) first, and if so it uses a D-Bus object that has been registered to act as an Authentication Agent for the session (presumably, it checks to make sure that the process registering the agent for a session actually belongs to that session, otherwise there’s a security hole).

Polkit can give special privilege to local processes, that is, processes in a session on a local seat. This might be used for instance to allow local users to shut down the system, or mount USB devices. In general, then, Polkit makes a decision based on:

  • The user id of the process invoking the request (in case it is not provided by the service)
  • Whether that process is part of a local session
  • In some cases, whether the authentication via the provided agent succeeds.

For all this to work, Polkit needs:

  • to be able to determine the session ID of the process invoking the request (so that it can check if the session is local)
  • to be able to check if a session is local
  • to be able to determine the user ID of the process invoking the request (if it was not provided by the service)

Both of these requisite functions are provided by ConsoleKit/Logind. I have one issue with the concept, which is that I think determination of being “local” should be made solely based on the user id and not the process. That is, if a user has a local session, than any process belonging to that user in any session (including a non-local session) should be treated as local. Putting it another way, users and not processes should have privilege. This would simplify the requirements a little; the list of three items above because a list of just two. However this is really a minor quibble, I guess.

No, my biggest complaint about Polkit is that you have to write the policies in fucking Javascript. Yes, Polkit actually has a javascript interpreter as a dependency. Considering how the factors above are most likely the only ones relevant in the policy this seems like a case of excessive bloat. Also, it’s Javascript, a language which doesn’t even properly support integers. (Now would be a good time to go watch the Wat talk, if you haven’t already, I guess).

Do I really have to install this piece of filth just so I can reboot my desktop using a GUI?

Sigh.