Clever hacks are just not.

Just recently I wrote about the clever “dual ABI” hack found in GCC 5’s implementation of the standard C++ library. I quipped at the end of that post:

I suspect the right way to handle changing ABI is to do it the way it’s always been done – by bumping the soname.

Having recently done some work on a C++ project, I’ve discovered an issue present in the dual ABI implementation. What seemed like trivial code to handle an exception from attempting to open a non-existent file just wasn’t working. I worked it down to the following test case:

#include <fstream>
#include <iostream>
#include <typeinfo>

int main(int argc, char **argv)
{
    using namespace std;
    using std::ios;
    
    ifstream a_file;
    a_file.exceptions(ios::badbit | ios::failbit);
    
    try {
        a_file.open("a-non-existent-file", ios::in);
    }
    catch (ios_base::failure &exc) {
        cout << "Caught exception on attempt to open non-existing file." << endl;
    }
    catch (exception &exc) {
        cout << "Caught standard exception: " << typeid(exc).name() << endl;
    }

    return 0;
}

Surprisingly, this fails to work properly (prints the wrong message) when compiled with a straight “g++ -o simple simple.cc”. It works correctly when compiled instead with “g++ -D_GLIBCXX_USE_CXX11_ABI=0 -o simple simple.cc”. I would have filed a bug, but it seems the problem is known.

This is a pretty major bug. Any C++ program that handles I/O errors by catching the appropriate exceptions just isn’t going to work. Furthermore, this isn’t just a minor problem; having to separate out implementations of iosbase and derived classes for the two ABIs would significantly reduce the benefit of even having the dual ABI. The problem with “clever” hacks like the dual ABI libstdc++ is that they are (1) often not so clever and are (2) always most definitely hacks. The proposed solution? Heap more hacks on top so the first hack works:

One option to make both forms work would be to hack the EH runtime to make handlers for std::ios_base::failure able to catch std::ios_base::failure[abi:cxx11] objects, and vice versa, creating an object of the other type on the fly.

Urgh. Just urgh.

What really annoys me most about this nonsense is that the one sane option I have – stripping out the ‘abi_tag’ nonsense from the source, re-compiling libstdc++ and calling it libstdc++.so.7 – would probably cause me problems further down the track, because third-party binaries are going to want the new ABI with the old soname (and even if I didn’t care about those, I’d still be one version number up from the official libstdc++ from that point on, which feels like it could cause problems).

Advertisements

On Going 64-bit

Some History

My desktop system almost exclusively runs software that I compiled from source. It’s been that way for a long time, since I got frustrated with the lacklustre pace of Debian-stable software updates and with the package management system in general (this was 1998 or so). I began compiling packages from source rather than updating them via the Debian repository, and eventually decided to purge the remaining parts of Debian from my system, replacing Debian packages with compiled-from-source counterparts.

What can I say; I was young and optimistic. Surprisingly enough, I managed to get a working system, though compiling many packages was fraught with unwarranted difficulty and required various hacks to makefiles and buildscripts. Linux From Scratch didn’t even exist yet, so I was mostly on my own when I ran into problems, but I persevered and it payed off. I had a nice, fast, working system and I had a good understanding of how it all fit together. Later, when Fedora and Ubuntu appeared on the scene (and Debian had got its act together somewhat) I felt no desire to switch to these fancy new distributions because doing so would mean losing the strong connection with the system that I had pieced together by hand.

Sure, I faced some problems. Upgrading packages was a difficult, and often required upgrading several of the dependencies. Uninstalling packages was a nightmarish procedure of identifying which files belonged to the package and deleting them one by one. I learned that I could use “make DESTDIR=… install” to output many packages into their own root, and I eventually I wrote a shell script that would “install” packages by symbolically linking them into the root file system from a package specific root, and could uninstall them by removing those links – so I had a kind of rudimentary package management (I cursed those packages which didn’t support DESTDIR or an equivalent; I often ended up needing to edit the makefile by hand). I usually compiled without many of the optional dependencies and to a large extent I avoided the “dependency hell” that had plagued me while using Debian. I found Linux From Scratch at some point and used its guides as a starting point when I wanted to upgrade or install a new package. I kept notes of build options and any specific hacks or patches that I had used. Larger packages (the Mozilla suite and OpenOffice come to mind) were the most problematic, often incorporating non-standard build systems and having undocumented dependencies; to fix an obscure build problem I often had to tinker and then repeat the build process, which could run for hours, multiple times until I had managed to work around the issue (anyone who’s read this blog is now probably starting to understand why I have such a loathing for bad build documentation!).

Despite all the problems and the work involved, I maintained this system to the present day.

Modernising

When I started building my system, the processor was a 32-bit Pentium III. When an upgrade gave me a processor that was 64-bit capable, it didn’t seem like the effort of switching to the new architecture was justifiable, so I stuck with the 32-bit software system. Recently I decided that it’s time to change, so I begun the task of re-compiling the system for the 64-bit architecture. This required building GCC as a cross-compiler, that is, a compiler that runs on one architecture but that targets another.

Building a cross-compiler was not as easy as I had hope it would be. GCC requires a toolchain (linker, assembler etc) that supports the target architecture. GNU Binutils targeting on a 32-bit architecture cannot handle the production of 64-bit binaries, so the first step (and probably the easiest) was to build a cross-target Binutils. This was really about as simple as:

./configure --prefix=/opt/x86_64 --host=i686-pc-linux-gnu --target=x86_64-pc-linux-gnu
make
make install

However, building GCC as a cross compiler is nowhere near as trivial. The issue is that GCC includes both a compiler and a runtime-support library. Building the runtime-support library requires linking against an appropriate system C library, and of course I didn’t have one of those, and I couldn’t build one because I didn’t have a suitable cross-compiler. It turns out, however, that you can build just enough of GCC to be able to build just enough of Glibc that then you can then build a bit more of GCC and then the rest of Glibc and finally the rest of GCC. This process isn’t formally documented (which is a shame) but there’s a good rundown of it in a guide by Jeff Preshing, without which I’m not sure I would have succeeded. (Some additional notes on cross compiling GCC are included at the end of this post).

Now I had a working cross-compiler targeting the x86_64 platform. When built as a cross-compiler GCC and Binutils name their executables beginning with an architecture prefix, so for instance “gcc” becomes “x86_64-pc-linux-gnu-gcc” and “ld” becomes “x86_64-pc-linux-gnu-ld” (you actually get these names when building as a non-cross compiler, too, but in that case you also get the non-prefixed name installed). To build a 64-bit kernel, you simply supply some variables to the kernel make process:

make ARCH=x86_64 CROSS_COMPILE="/opt/x86_64/bin/x86_64-pc-linux-gnu-" menuconfig
make ARCH=x86_64 CROSS_COMPILE="/opt/x86_64/bin/x86_64-pc-linux-gnu-" bzImage
# and so on

The “CROSS_COMPILE” variable is the prefix used for any compiler/binutil utility when producing object for the target. So for example “gcc” is prefixed to become “/opt/x86_64/bin/x86_64-pc-linux-gnu-gcc”.

I built the kernel, and booted to it. It worked! … except that it refused to run “init”, because I hadn’t enabled “IA-32 emulation” (the ability to run 32-bit executables on a 64-bit kernel). That was easily resolved, however.

I then set about building some more 64-bit packages: build Binutils as a 64-bit native package, re-building Glibc without the special –prefix, building GCC as a 64-bit native compiler (which required first cross-compiling its prequisites, including MPFR, GMP and MPC). All seems to be going ok so far.

Multi-lib / Multi-arch

One issue with having 32-bit and 64-bit libraries in the same system is that you have name clashes. If I have a 32-bit /usr/lib/libgmp.so.10.1.3, where do I put the 64-bit version? It seems that the prevalent solution is to put 64-bit libraries in /usr/lib64 (or /lib64) and leave only 32-bit libraries in plain /usr/lib and /lib. The Glibc dynamic linker hard-codes these paths when compiled for x86_64, it seems. So, when I build 64-bit packages I’ll use –libdir=/usr/lib64, I guess, but I don’t like this much; the division seems pretty unnatural, mainly in that it favours the 32-bit architecture and is somewhat arbitrary. Debian and Ubuntu both appear to have similar reservations and are working on “Multiarch spec”, but for now I’ll go with the lib/lib64 division as it’s going to be less immediate hassle.

I also still have the issue that I can’t control which packages pkg-config will detect. I guess I can use the PKG_CONFIG_PATH environment variable to add the 64-bit paths in when doing a 64-bit build, but I can’t prevent it from picking up 32-bit packages in the case where the 64-bit package isn’t installed, which I imagine could lead to some pretty funky build errors.

That’s about where I’m at. It wasn’t particularly easy, but it is done and it seems to work.

Notes on building GCC as a Cross Compiler

First, read the guide by Jeff Preshing.

I have the following additional notes:

  • If it’s not clear from Jeff’s guide, –prefix for your GCC cross-compiler build should be /xyz (or whatever you like) and –prefix for Glibc should be /xyz/$TARGET, in my case I used /usr/x86-64 and /usr/x86-64/x86_64-pc-linux-gnu. The GCC cross-compiler will expect to find libraries and include files in the latter (under …/lib and …/include).
  • I wanted multilib support, whereas Jeff’s guide builds GCC without multilib. For this you need the 32-bit Glibc available to the cross-compiler, in $GLIBC_PREFIX/lib/32 (being /usr/x86-64/x86_64-pc-linux-gnu/lib/32 in my case). I symlinked various libraries, but you might get away with just symlinking your entire /usr/lib as $GLIBC_PREFIX/lib/32. You also need $GLIBC_PREFIX/include/gnu/stubs-32.h (which you can link from /usr/include/gnu/stubs-32.h).
  • During the Glibc build I got an error about an unresolved symbol, on something like __stack_chk_guard (foolishly I did not record the exact error). I got around this by configuring Glibc with ‘libc_cv_ssp=no’.
  • If you install 64-bit libraries into the standard /usr/lib and want to link against them when building with your cross-compiler, you’ll need to add -L/usr/lib to your linker flags (eg LDFLAGS=-L/usr/lib during configure).

Tale of Two ABIs

With GCC 5.2 just having been released, I figured it was time to upgrade my system to the latest-and-greatest in GNU compiler technology. So far the new compiler seems fine, but there’s a subtle issue that has been introduced to do with the “dual ABI” in the standard C++ library implementation, libstdc++. A Red Hat developer (“rhjason”) writes about it here. If I try to boil it down to the essence:

  • The C++11 standard has some complexity requirements which require re-engineered implementations of certain standard classes, among them std::string and std::list. This thereby requires an ABI change (perhaps the sizes of the structures have changed, or inline functions exist which now need to be implemented differently; given that std::list is a template, there’s really no getting around this).

    “Complexity requirements” refers to the algorithmic complexity (think ‘big-O’ notation). In particular std::list size() must now be O(1) – that is, it must take a constant amount of time regardless of the number of elements in the list (GCC bug). Previously the size() method worked by counting the elements in the list, which is O(n); there’s an explanation (in the form of an argument against) here. (I’m not sure if I agree with this argument).

  • Changing the ABI normally means changing the soname, the name of the dynamic library that you link against. In this case it would have been a bump from libstdc++.so.6 to libstdc++.so.7.

    This does, admittedly, come with problems. It’s possible to have some program (A) link against a library (B) and also against the C++ library (libstdc++.so.7). However, it may be the case that the library B was linked against the older version of the C++ library (libstdc++.so.6). When you execute A, then, it will dynamically link against two versions of the library. Because these libraries have (at least some) common symbol names, one will override the other. That means, for instance, that the library B will call some function but have the libstdc++.so.7 execute, which has the wrong ABI. There’s also the possibility of passing standard library objects (such as strings) between A and B, for which the ABI differs. Either case might lead to data structure corruption, incorrect behavior and possibly crashing.

    (Note that this problem is not limited to the C++ library. Any library which changes its ABI and bumps its soname potentially causes similar issues).

  • To combat these problems, the GCC/libstdc++ folk decided to put the old and new ABI into a single dynamic library (libstdc++.so.6).

    In many cases, nested inline namespaces (explanation) are used to avoid symbol clashes between the old and new ABI; however this is not possible in every case (because you can’t have a namespace inside a class, for instance). So, the compiler now understands ‘__attribute (( abi_tag(“cxx11”) ))’, which can be applied to an inline namespace or a declaration.

    (I do not understand why the tag should ever actually need to be applied to a namespace, and indeed it’s not really done in libstdc++. It seems that you do not need to supply the abi argument if you do – so you can have just ‘__attribute ((abi_tag))’ – but neither is it at all clear to me what the purpose of this would be. With a few experiments I see that it has some effect on warnings when -Wabi-tag is used, and might affect the abi tag that functions/variables inside the namespace “inherit” if they use an abi-tagged type).

  • You can select the ABI to compile against using the _GLIBCXX_USE_CXX11_ABI preprocessor macro (set it to 1 for the new ABI, or 0 for the old ABI). If the macro is not set it will be set to 1 when you include a C++ header.

So, this allows two ABIs to exist in a single dynamic library. As an added bonus you may get link-time errors if you try to link to a library using the other ABI. However, there is at least one significant issue with this strategy; Allan McRae (an Arch Linux guy) writes about it here:

This discovered an issue when building software using the new C++ ABI with clang, which builds against the new ABI (as instructed in the GCC header file), but does not know about the abi_tag attribute. This results in problems such as (for example) any function in a library with a std::string return type will be mangled with a [abi:cxx11] ABI tag. Clang does not handle these ABI tags, so will not add the tag to the mangled name and then linking will fail.

In other words, the GCC people have basically decided that other compilers don’t exist. This is, I have to say, pretty shitty. The LLVM guys are now forced to choose between supporting this non-standard abi_tag attribute or supporting only the non-C++11 ABI when using GNU libstdc++. Although I hope they go with the former (because it’ll make life easier for me personally) one could understand if they decided not to. I suspect the right way to handle changing ABI is to do it the way it’s always been done – by bumping the soname.

Compiler Bugs Worst Bugs

I think that compiler bugs – the kind where they produce the wrong code, i.e. an incorrect compilation – are perhaps the worst kind of bug, because they can be very difficult to identify and they can cause subtle issues (including security issues) in other code that should work correctly. That’s why this GCC bug bothers me a lot. Not only is it a “wrong code” bug, but it is easy to reproduce without using any language extensions or unusual compiler options.

Just -O2 is needed when compiling the following program:

#include <assert.h>

unsigned int global;

unsigned int two()
{
    return 2 * global;
}

unsigned int six()
{
    return 3 * two();
}

unsigned int f()
{
    return two() * 2 + six() * 5;
}

void g(const unsigned int from_f)
{
    const unsigned int thirty_four = two() * 2 + six() * 5;
    assert(from_f == thirty_four);
}

int main()
{
    global = 1;
    const unsigned int f_result = f();
    g(f_result);
}

It’s easy to reproduce, and it’s obviously wrong. Somehow the compiler is managing to mess up the calculation of (2 * 2 + 6 * 5). And yet, it’s classified as Priority 2. And furthermore, GCC 4.9.3 was just recently released, with this bug still present. It makes me start to wonder about the quality of GCC. I’m waiting for this to be fixed before I move to the 4.9 series (5.x series is way too new for my liking, though I might skip over 4.9 if 5.2 is released in the near future).

I’d like to run my compiler benchmarking tests on LLVM 3.6.1 and GCC, but I’ll hold off for a bit. I have done some quick testing with LLVM though and I have to say it is exceeding expectations.

Mesa and strict aliasing

I recently started poking at the Mesa source code. Presently Mesa builds with “-fno-strict-aliasing” by default, and removing that option produces a non-working binary. I started looking into this and just recently have submitted (the second version of) a patch to address some of the aliasing problems – enough that I could build a working binary with strict aliasing enabled:

http://lists.freedesktop.org/archives/mesa-dev/2015-June/087278.html

I don’t know whether this will be taken on board; at the time of writing, no-one has formally reviewed it or agreed to push it upstream. The performance improvement when compiling with vs without -fno-strict-aliasing is, admittedly, a bit underwhelming (less than I had hoped for, anyway); however, I personally feel that code requiring strict aliasing to be turned off is broken anyway.

Not everyone thinks that way, though. It’s clear that Ian Romanick, a prominent Mesa developer, did not (originally) even understand the issue:

NAK.  The datastructure is correct as-is.  It has been in common use since at least 1985.  See the references in the header file.

(The data structure is indeed correct; the implementation is broken unless strict aliasing is disabled). I don’t think it’s uncommon that C developers don’t properly understand the aliasing rules, but that’s a shame. The rules aren’t really that complicated. I liked this quote from Dave Airlie, another Mesa developer:

I personally think we should get past the, aliasing is hard, lets go shopping,

Or in other words: let’s stop using -fno-strict-aliasing as a crutch and just fix the problems. I’m glad that I’m not the only one with this opinion. A few other developers, however, clearly feel that it is too difficult, and that -fno-strict-aliasing is the answer. It’ll be interesting to see how this plays out.

The Systemd debacle

I’m late to write this, but perhaps better late than never (and truth be told, I’ve been neglecting this blog, largely because I prefer to be writing software than complaining about it, though I recently seem to have precious little time for either). If you’re reading this then you most likely already know about Systemd, the init-system-replacement-cum-kitchen-sink brainchild of Lennart Poettering and others (yes, they want me to call it “systemd”, but I’m averse, for some reason, to proper nouns beginning with lower-case letters; something to do with having had a moderately good education, I guess). Since its inception Systemd has gone on to become the primary, if not the only, choice of init system on a number of Linux distributions, and has more-or-less become a dependency of the Gnome desktop environment. You’ll also already be aware that not everyone is happy with this state of affairs.

Amongst other examples:

On the other hand, Systemd has its advocates:

  • Lennart Poettering, in his initial announcement of Systemd and how good its boot times are.
  • Lennart Poettering, trying to debunk some “myths” regarding Systemd (and apparently failing to recognize the mild irony of “debunking” both Myths #2 and #3, and of the fact that Lennart himself is largely responsible for Myth #2 due to the announcement linked in the previous point)
  • This blog post entitled “Why systemd?” by, err, Lennart Poettering.
  • This guy (from Arch Linux?), who I assume is not actually Lennart Poettering, though I can’t tell for sure.
  • LWN editor Jonathan Corbet basically saying that Systemd isn’t really that bad. Specifically: … The systemd wars will wind down as users realize that their systems still work and that Linux as a whole has not been taken over by some sort of alien menace.
  • Various linux distributions that are now using Systemd as their primary init system

There’s a fair amount of hyperbole on both sides, so who’s really right? And is the question a technical one or is it purely political?

On Lennart

Ol’ Lennart has received his fair share of criticism over the years. Here’s one which made me laugh, from one “HackerCracker” commenting on a Kuro5hin article:

That said, it seems there’s a putsch on to make Linux into Windows, complete with an inscrutable binary log, called SystemD. And $DEITY help you if you go criticizing it for its many faults, you will be pilloried as a Luddite, a moron, an emotional weenie without an ounce of brains as it is being written by a very, very, VERY intelligent man by the name of Lennart Poettering. The very same Lennart Poettering that brought the horror that is PulseAudio. And Avahi. And a host of other code abortions masquerading as THE NEXT BEST THING EVAR! to hit Linux. But the smiling fascist cheerleaders for this new SystemD paradise fail to see the incredible irony in their belligerently stupid marketing campaign. I guess it makes sense in a way, after all, it was Vladimir Ilyich Lenin who said ‘We shall win by slogans’ (paraphrasing).

(A mildly amusing aside: Googling for “+pulseaudio +horror” returns over 80,000 results).

Comparing Lennart to Lenin is ridiculous, but it’s fair to say that Poettering’s reputation precedes him. PulseAudio was widely criticized when it first arrived on the scene, for being both overly complex and, well, buggy as all fuck. I still don’t use PulseAudio on my system, mainly because I’ve never seen the need (although some software is starting to depend on it a bit, and to be fair I suspect the vast majority of bugs have, by this stage, been ironed out), and because the one time I looked at installing it, it became a dependency nightmare (I’m really flummoxed as to why it requires ConsoleKit [or, of course, Systemd] as a hard dependency, but that’s fodder for a future discussion, perhaps).

Lennart doesn’t help himself, I think, by being a bit of an arse. Not a huge arse, just a bit of an arse, but even being a bit of an arse is going to piss people off. He made a huge amount of noise about receiving death threats, at one point, which were apparently related to Systemd, claiming that “the Open Source community is full of assholes” [sic] and railing against Linus Torvalds and those associated with him. A ZDNet article covers the story pretty well, but I think this choice quote from Bruce Bryfield really sums it up:

… the complaints coming from Poettering amount to a new definition of chutzpah. Poettering, you may remember, is fond of sweeping critiques of huge bases of code, and of releasing half-finished replacements like PulseAudio, systemd, and Avahi that are radical departures from what they replace. He is a person as much known for expecting other people to tidy up after him as for his innovations. For many people, this high-handed behavior makes Poettering an example of the same abusive behavior that he denounces — and his critique more than slightly problematic.

Reading Poettering’s “Why systemd?” article is also revealing of character. The article essentially consists of a long list of Systemd features, with crosses marked for other init systems which don’t have those same features, which masquerades as an unbiased comparison. The only pretence of humility is at the end, where there’s a red mark against Systemd for lack of maturity. The whole thing reads a bit like a propaganda pamphlet, and the concluding remark – I believe that momentum is clearly with systemd. We invite you to join our community and be part of that momentum – helps to cement this perception. It’s a little bit creepy, really.

On the other hand, Poettering is by no means stupid. The initial announcement and discussion of Systemd is well worth a read, as it highlights many of the fundamental ideas behind the software, and some of it is in fact quite clever. Which brings us to the technical side of things.

Technical side

I don’t believe there’s necessarily anything wrong with Systemd on a technical level (but hold that thought, because the key word here is necessarily – I’ll elaborate on that a bit later). There are certainly some real problems that it seeks to address. The old “runlevel” system from Sys V init systems never made any real sense, and there were always issues managing service dependencies with such a system (if you want to run A and B, and B requires A to be started before it is itself started, how do you arrange that?). Although I don’t personally have much experience with Upstart, Poettering’s initial Systemd announcement gives a reasonable critique (if it is indeed correct). So there was space, I think, for an init system which provided proper service management; i.e. which allows starting and stopping individual services, and will automatically start their depencies / stop their dependents if required.

On the other hand, Systemd does a lot more than just provide service management. Some of these things are, I personally think, unarguably worthwhile or at least “not harmful”. Allowing parallel startup of services falls into this category, as does socket-based activation, where the service manager opens a socket on behalf of some service, and only starts the service when a client connects to the service (for one thing, this means the service isn’t actually running and consuming resources when it’s not actually needed, and for another thing, this simplifies handling of service dependencies so that they do not, in many cases, need to be explicitly configured).

There are other things Systemd can do that I consider might be going a little too far. Its use of autofs to allow filesystems to “appear” mounted before the filesystem checks and so forth have actually been run, for example, to me seems like excessive parallelization. On the other hand, it doesn’t hurt and I suppose that you can always just ignore the feature if you do not want to use it.

The use of cgroups to prevent processes from escaping their parent’s supervision (by daemonizing themselves) is probably a good idea, though this problem is I suppose as old as unix itself and one of the main reasons that it has had no solution up to this point is because there hasn’t, in general, been a pressing need. The only real benefit I see is that it can be used to prevent users from leaving processes running on a machine after they’ve logged out, which is of course only an issue on true multi-user machines. For such a limited scenario, I have a lot of trouble understanding why Systemd has a hard dependency on cgroups.

In fact, most of the Linux features required by Systemd as listed in Poettering’s “Debunking myths” document (Myth #15: SystemD could be ported to other kernels if the maintainers just wanted to and Myth #16: systemd is not portable for no reason) are obscure enough that it remains unclear why these features are actually required. The “debunk” therefore completely fails – the myths remain undebunked (if that is actually a word). (Also, are there really two separate myths? These seem to be identical to me).

As well as the use of obscure Linux kernel features, Systemd requires DBus, and this is obviously unnecessary. So one valid critique of Systemd is that it has unnecessary dependencies. However, this is a not necessarily a strong argument against using Systemd.

[Edit 3/1/2016: The most important point that I missed when I first wrote this article is that Systemd crams a lot of stuff (including their D-Bus implementation) into the PID 1 process, a process which brings the whole system down if it crashes. This is a valid concern, and probably is the most legitimate technical concern raised against Systemd to this point].

The Human side

In fact, I had a lot of trouble actually putting my finger on what it was about Systemd that really bothered me so much. I generally disregard political arguments about software because I feel that technical merit should be the main focus. So, regardless of how much I might dislike Lennart Poettering’s manner, his habit of using hyperbole, and his apparent tendency of failing to provide rational and logical arguments, I’d normally be inclined to say that we should just swallow the bile, install Systemd on our systems and get on with it. Why does that seem so hard to do in this case?

I’ve read many complaints about Systemd; some of them are listed above, although they generally fail to provide a compelling technical argument against Systemd. If the reasons for wanting to avoid Systemd aren’t technical, can they still be valid? I’ve struggled with this question for some time. Here’s a few pieces which helped me to finally clear it up in my mind:

Has modern linux lost its way?” (John Goerzen) – choice quote:

This is, in my mind, orthogonal to the systemd question. I used to be able to say Linux was clean, logical, well put-together, and organized. I can’t really say this anymore. Users and groups are not really determinitive for permissions, now that we have things like polkit running around. (Yes, by the way, I am a member of plugdev.) Error messages are unhelpful (WHY was I not authorized?) and logs are nowhere to be found. Traditionally, one could twiddle who could mount devices via /etc/fstab lines and perhaps some sudo rules. Granted, you had to know where to look, but when you did, it was simple; only two pieces to fit together. I’ve even spent time figuring out where to look and STILL have no idea what to do.

[Edit: ok, I realise that the above quote states that the issues are orthogonal to “the systemd question”, in the eyes of the author, and it’s true that the issues raised are not specifically about Systemd; nonetheless the general concepts very much also explain my concerns with Systemd; I don’t think that they actually are completely orthogonal.]

This resonates a bit with me. I’m worried about how using Systemd could render me unable to solve issues with my own system, without first having to delve deep into the internals of Systemd – which is something I’d rather not have to do, especially because the damn thing seems so hard to pin down, with new releases happening on a frequent basis. I feel like my init system should not do that. I want stability – both system stability, but stability in the sense that I want to have some assurance that I understand how the system works, and that I don’t have to follow every commit on this arrogant developer’s pet project just to keep that understanding valid. So much seems to have been crammed into Systemd in such a short space of time – it’s replacing init, udev, syslogd, inetd, cron, ConsoleKit, login/getty, network configuration, and recently even an EFI boot manager.

Fear of Change?

It could be argued that this argument against the adoption of Systemd is driven by fear of change. Fear of change is generally viewed with negative connotation, but of course it’s not an uncommon occurrence and is grounded in our past experiences. I’m not against change, and I think that Systemd is clearly a step forward in certain directions where I have, for some time, thought that some improvement would be nice. The problem is not the change, but that there is too much change, and too fast. Here’s a quote from Theodore Ts’o:

A realization that I recently came to while discussing the whole systemd controversy with some friends at the Collab Summit is that a lot of the fear and uncertainty over systemd may not be so much about systemd, but the fear and loathing over radical changes that have been coming down the pike over the past few years, many of which have been not well documented, and worse, had some truly catastrophic design flaws that were extremely hard to fix.

And yes, that’s definitely part of it. This isn’t the first time that some upstart has told us they had the solution to all our problems. Remember HAL? Remember Devfs? These were kind of a mess, but at some point or another the distributions were all using them. These eventually were superceded by Udev, which was at least fairly easy to understand, but now udev has been eaten by Systemd. And I mean, sure, you don’t have to use Systemd; as Poettering is so fond of pointing out, he’s not forcing anybody to do anything. The problem is that the distributions are jumping on the bandwagon and as a result we’re going to see hard dependencies emerge in future versions of software that we want to use.

In conclusion

The vitriol against Systemd is probably not warranted; the fault doesn’t lie with Systemd itself. But the dislike that people have for having lots of changes rammed down their proverbial throat is reasonable. Things can be improved, but it doesn’t have to happen all at once, and the choice shouldn’t be between accepting it all at once or get left behind. That distributions are adopting SystemD in its entirety is disconcerting. That some Debian folks were so strongly against this is, in fact, reasonable. But it’s not because there’s anything in particular wrong with Systemd; rather, it’s because we don’t really know what might be wrong with it. And we don’t know how much effort we’re going to have to make to contort our systems to work around Systemd which will later be made redundant when the next greatest whizz-bang system component comes along.

I normally like Google, but…

I recently had to fill out the “Report a delivery problem between your domain and Gmail” form.

This is a server that I use personally. It is not an open relay and sends email only from me. I have checked the outgoing mail logs and there is no way, not a chance, that there has been spam from my domain to any gmail accounts.

Google, I don’t understand why you’ve blocked me from sending email through my server from sending email to gmail accounts. Perhaps the IP address was used by a spammer in the past, but that must have been years ago. I don’t understand, either, why you make this form (https://support.google.com/mail/contact/msgdelivery) so difficult to find and fill out; why, for instance, you ask for ‘results from your tests’ and then limit the field length so that is impossible to provide those results.

I have now done everything within my power to ensure that my server cannot be seen as a spammer. I have set up reverse DNS records so that the IP address (***.***.***.***) correctly resolves to the hostname (******.***). I have added an SPF record for the site. In fact I have completely complied, always, with your “Best practices for forwarding emails to gmail” (https://support.google.com/mail/answer/175365?hl=en), and more recently with your “Bulk Senders Guidelines” (https://support.google.com/mail/answer/81126?hl=en) despite the fact that I am clearly not a Bulk Sender.

Please, at least, remove my server from your blacklist and allow the limited number of your users that I wish to contact to receive my emails.

Please also fix your form. And for pity’s sake please fix your 550 response so that it guides server admins to the form rather than requiring them to trawl the internet in search of it. I’d like to suggest, furthermore, that it’s not reasonable to blacklist a server and return SMTP 550 responses, without allowing the server administrator some means of discovering why their server is blacklisted.

Thankyou.

After submitting the form, this text is displayed:
Thank you for your report. We will investigate this issue and take the necessary steps to resolve it. We will contact you if we need more details; however, you will not receive a response or email acknowledgment of your submission.

… so, I can’t even expect a response? Lift your game, Google. Lift your game.

Edit 17/July/2015: After having avoided the problem for some time by (a) occasionally using a different mail server and (b) not emailing Gmail addresses, I finally figured out the problem – after looking at the Postfix logs and noticing that the IP address for the Gmail relay was an IPv6 address, I re-configured Postfix to contact Gmail servers only via IPv4. Hey presto, it worked! It seems that I had reverse DNS for IPv4 but not IPv6, and the lack of reverse DNS is enough to make the Gmail relays refuse to accept mail.

I wondered about this. I had SPF set up and surely that make the reverse-DNS check unnecessary? Of course there is always the following scenario:

  • I gain access to a mail server through which I can route mail (maybe it’s an open relay, or maybe I get access via some other means);
  • I set up a domain name, and specify (via an SPF record) that my relay is used to send email for that domain
  • I spam away.

While this is certainly possible, it also seems to be easy to deal with, because it requires the spammer to purchase a domain and, given that emails can be verified as “originating” from that domain due to SPF, the domain can just be blacklisted. In any case I would think that the 550 response from the Gmail relay should include information on exactly why the message was refused, which would have saved me a lot of trouble.