systemd discourse sucks

systemd sucks. That’s been discussed many times, plenty of other people have said everything I could say about it.

What I want to talk about is why the discourse around it sucks. I’m going to address the points that I see systemd defenders repeatedly raise, and explain why they are incorrect.

sshd patch

A malicious maintainer backdoored xz to pwn sshd instances.

The defenders point out that systemd is kind of a sideshow here. The exploit chain, in practice, works because xz gets loaded into sshd, via libsystemd which is pulled in to… tell systemd that sshd has started. “The attacker would have just found another vulnerability”. And maybe that’s true (we know of several instances where it is). But it doesn’t change anything if it is. Just because other things are similarly vulnerable, or similarly overly complex, or similarly frail, does not excuse anything.

The people who wrote this sshd patch didn’t need to (and shouldn’t have) pulled in libsystemd for this, that much is true. But systemd encourages this by doing things like:

Documenting the interface in nothing but a paragraph
Providing an overly complex interface with no guidance on when applications should implement various parts
Providing no example or pseudocode of how to use the interface
Providing no guidance (except on random HN comments?) on implementing the interface
Providing an implementation for it within libsystemd

Now, this whole notification is entirely unnecessary. It may be necessary on poorly written daemons, which do silly things like fork off before making sure the config is valid. I don’t believe sshd falls into this category. Historically, the “I’ve started up successfully” signal for a well-behaved daemon was when it double-forked and backgrounded itself. Even for poorly behaved daemons, there are better ways to solve this.

So yeah. Most of the fault lies on the maintainer who backdoored xz; some fault lies on the SSH patchers; some fault lies on systemd. That can all be true at the same time (and is).

systemd does suck

This is just another example - of which there are many - of the problems systemd creates. And every time such an example comes up, systemd’s defenders are quick to point out that it could’ve happened without systemd.

But every time, it did happen with systemd. systemd is not a sideshow. This argument is saying “sure using $program_a introduces more vulnerabilities, but so does using $program_b!”

There are no advantages

Sometimes the program you’re using introduces value which surpasses its complexity (and attack surface). Sometimes the program you’re using does not. systemd does not.

I want to go over some of the commonly proposed “advantages” of systemd and discuss why they’re incorrect. Some of these are real advantages that systemd provides (but so do other things); some of them are disadvantages; and some are just false. But any time you discuss systemd, several of these will be presented by defenders as the reason why systemd adds value.

I use OpenRC on my desktop (which hosts many servers itself), so mostly I’ll compare each point to the way OpenRC does it, but I’ll mention other systems when appropriate.

I use systemd on my servers. This isn’t because systemd provides any value itself, but rather because of the number of applications I run which simply don’t ship an init-script anymore, or otherwise have dependencies on systemd. That’s the only way systemd provides any value: it lets you be part of the mainstream.

Long-start daemons

“My daemon takes forever to start! It needs to tell the service manager when it’s done!”

Well, first off, I would prefer to fix the start time, not the service manager. Then, as I mentioned above, a well-behaved service will not fork off until it is ready to accept connections. Understanding that isn’t always possible, there are many ways to handle notification which are simpler than systemd’s.

systemd’s notification involves checking the prefix on an environment variable ($NOTIFY_SOCKET), opening a socket on it, and then sending a “state string” to it. This is already a decent burden on a programmer who may not know how to send a datagram over a Unix socket (or, as we saw with the leftpad incident, simple string manipulation like checking and stripping off a prefix may already be beyond them).

Oh, and there’s plenty of prior art which already solved this, with the only requirement writing to an already-open FD!

OpenRC handles this too. The specific implementation can differ across different services (since different services might have support for different ways of notifying), but to take OpenVPN as an example, the init script marks itself as “still starting” when it starts, and then a --up hook (line 84) fired from OpenVPN re-executes the init-script with IN_BACKGROUND=true (at which time the init-script exits success, indicating that OpenVPN is running; line 63).

Socket activation

inetd.

Service monitoring

Is better accomplished by… monitoring.

You, the administrator, know what particular endpoints or methods need to be monitored to consider the service alive. For something like sshd this might be “connect to port 22 and verify you see an SSH banner”; for a web server this might be “make sure the certificate is valid and not expiring soon, and make sure these URIs return 200”. There are plenty of monitoring solutions out there for this. Monitoring the process is not sufficient.

systemd does not effectively ensure that your service stays running.

For a poorly-behaved daemon, attempting to monitor the state of the process tells you nothing. It may have crashed just after sending the “I’m alive” notification, or maybe the “send notifications to watchdog” thread is still running while the “accept() new connections” thread is deadlocked. I know this, because I have written software and then discovered bugs like these in it, as well as discovering bugs like these in a lot of other people’s software.

For a well-behaved daemon, attempting to monitor the state of the process is superfluous; the process had already successfully started up when it forked off, and if it does crash it will crash hard and the process will end (allowing it to be restarted automatically): sshd is one of these.

PID/process monitoring

Is great for well-behaved processes, but absolutely nothing new to systemd.

Of course, it doesn’t work for poorly-behaved processes. It won’t notice deadlocks, infinite loops, DoS, or anything else. To work around this, systemd tried to add notification, but it doesn’t help:

Service notification

Works great, until the service crashes just after sending the “I’m alive” notification, or the “send notifications to watchdog” thread is still running while the “accept() new connections” thread is deadlocked.

Liveness monitoring

Is the only way, and systemd doesn’t even do it.

Service restarting

Can be done so many different ways in sysvinit.

Seriously, how are you actually using this as an argument? Init is just running executables blindly! You can do whatever you want in them!

Write some janky shell script to monitor and restart!
Use any of the many other options which are older than systemd.
Fix the daemon to not crash in the first place!
Convert it to an inetd service!

Supports more things

No, it doesn’t. It’s true that it does more things itself and is therefore more complex, but this is a disadvantage.

Your init script can do whatever you want. Better yet, it usually does it using a language that you already know (shell script), instead of a domain-specific configuration file format which places limitations on what you can do.

You want to give something a private /tmp? You can use unshare etc. (or this functionality could have been introduced as its own dedicated command).
You want to strip capabilities from a process? You can use something like capsh.
You want to chroot? Use chroot.
You want bind mounts? Use mount --bind.
You want to add your own commands for a unit? You can’t even do that in systemd!

Documentation

Ugh. Okay, systemd’s documentation is better than most distros had for their previous init-system, but that doesn’t mean it is good. Even finding out the meaning of some option in a config file can mean a trip through several different man pages (instead of a trip to the single man page for the involved command being used in a shell script…).

Dependency management

Oh boy, this is a big one. This is probably the single biggest piece of FUD I see about older init-systems. People say that it had no dependency management. Which is simply false! And obviously so! Without some kind of dependency management there’s no chance your systems would have ever booted.

The reality is that just about every init-system had dependency management built-in before systemd appeared. Usually, this was in the form of special comments in the init-script. I particularly like the way OpenRC does it, though. It’s quite akin to the types of dependencies in systemd, which just helps to show how not-novel systemd always has been.

In older systems, the dependencies were often manually established (by having separate runlevels, or by manually adjusting the names of some important /etc/rc.d symlinks). That’s still a dependency management system, even if not a very good one. (And again, this was gone and replaced with automatic dependency/ordering in every major distro before systemd was even created much less adopted.)

Unit files are simpler

No, they’re not.

Here’s an OpenRC-style init-script to start up some well-behaved daemon, foobar:

#!/sbin/openrc-run
command=/usr/bin/foobar
pidfile=/var/run/foobar.pid
name="FooBar Daemon"
description="FooBar is a daemon that drinks"

Here’s the sole necessary modification for some arbitrary poorly-behaved daemon:

start() {
    # Run whatever commands you need to start the daemon
}

Here’s the equivalent in systemd:

[Unit]
Description=Foo

[Service]
ExecStart=/usr/sbin/foo-daemon

[Install]
WantedBy=multi-user.target

Here’s the documentation you need to understand how a systemd service is executed: systemd.service systemd.unit systemd.exec , plus maybe systemd.socket

Here’s the documentation you need to understand how an OpenRC service is executed: a basic knowledge of the existence of shells and environment variables; and one document.

Just as a bonus, at the bottom is my quick custom init-script to decrypt my hard drives; in systemd I would probably have to split it into several scripts (or add argument processing, which OpenRC does for me) and then write a unit file on top of it.

Oh, and in OpenRC if I ever have any questions what’s happening, all I have to do is throw a set -x or strace in the script. Good luck debugging systemd.

Unit files are broken less often

That depends entirely on what software you’re using. I can certainly say for my usage that I’ve had to edit far more systemd unit files to fix issues (usually resulting in the service failing when you restart/reload it), compared to init-scripts.

I think this is actually one of the things that is truly better with systemd, though. It has led to much more secure/defense-in-depth locking down of services than before. I don’t actually mind having to occasionally unlock something for it (maybe it needs to share /tmp with another service and PrivateTmp is default).

I just wish that had been accomplished with better documentation and tooling, like a new command privatetmp for the shell scripts to call documented in man privatetmp, instead of a new configuration language with a PrivateTmp setting which is partly documented in man systemd.service and partly in man systemd.unit and partly in man systemd.exec (did I miss any more places?).

But it’s modular

What’s more modular than a series of arbitrary executables (often shell scripts which are easy to modify) that the administrator can completely customize in any way they see fit? That’s all older init-systems are. Better yet, they’re maintained by multiple different people (enforcing a separation of concerns) and largely ancient code which is simple enough to be nearly bug-free, and any gotchas are (or were) well-understood and widely documented.

systemd itself may well be modular, I don’t know, I don’t care. What’s not modular is the way programs and distro’s have adopted it. And the defenders will say that’s not systemd’s fault, but it is. systemd, both by sheer scale and by clear intent, encourages developers to rely on it. Like everything that Poettering and Red Hat have put out in the past 15 years or so, it works its way into as many programs and distros as possible, with vague or nonexistent benefits, seemingly for the sole purpose of creating new bugs to enjoy troubleshooting.

Once every mainstream distro depended on systemd, it was only natural that many popular applications would cease supporting anything but systemd, requiring that you use various features and interfaces (which are in practice proprietary, despite being theoretically free), allowing systemd to continue growing. How many applications today support systemd socket activation? By comparison, how many support inetd socket activation (an interface which has multiple competing implementations, which is simple to create or switch to an alternate implementation…)? systemd is actively harmful to the “open choice” nature of open source.

systemd has taken over timezones (I’ve hit bugs here), DNS resolution (I’ve hit bugs here), your bootloader, your udev, and your hostname. And for no reason! None of that has anything to do with managing the services on your system! None of that has any reason to be part of the same project, or to be part of a service manager! It has added persistent identifiers to your machine, which should be the choice of you, the administrator; or at least the distro. It has added untold hours of work for every single administrator learning new names for configuration settings and new (lengthy, confusing, complex, buggy) commands to manage your system when the existing commands worked just fine and were more extensible and composable.

But sure, it’s modular. Whatever. 🤮

Linux is bloated, too!

Someone tried to tell me that Linux is 100MB of kernel, and ten times as much in modules. Therefore it’s as bad as (or worse than) systemd.

$ find /boot -type f -exec du -h {} +
15M	/boot/EFI/Boot/bootx64.efi
15M	/boot/vmlinuz-6.6.13-gentoo
4.4M	/boot/System.map-6.6.13-gentoo
146K	/boot/config-6.6.13-gentoo

$ find /lib/modules/$(uname -r)/ -name '*.ko' -exec du -h {} + | sort -h
8.0K	/lib/modules/6.6.13-gentoo/video/nvidia-peermem.ko
16K	/lib/modules/6.6.13-gentoo/video/nvidia-drm.ko
20K	/lib/modules/6.6.13-gentoo/misc/vboxnetadp.ko
48K	/lib/modules/6.6.13-gentoo/misc/vboxnetflt.ko
700K	/lib/modules/6.6.13-gentoo/misc/vboxdrv.ko
1.7M	/lib/modules/6.6.13-gentoo/video/nvidia-modeset.ko
2.5M	/lib/modules/6.6.13-gentoo/video/nvidia-uvm.ko
59M	/lib/modules/6.6.13-gentoo/video/nvidia.ko

Unlike systemd, the Linux kernel actually is modular. Also unlike systemd, the Linux kernel actually provides value.

And once again, even if this were true, it would be an argument against Linux, not an argument for systemd.

Back to discourse

Any time systemd comes up and you point out these problems, there’s a predictable series of responses:

But it has advantages! (No it doesn’t, you can do anything you want in sysvinit.)
But it’s simpler! (No it’s not, it’s far more complex.)
Okay, but surely it’s simpler to configure! (Occasionally, for the very simplest situations… I guess?)
Okay fine but systemd isn’t even relevant, it would’ve been broken some other way too. (Would you jump off a bridge just because other kids are, Timmy?)
But it has advantages! (… repeat ad nauseum)

The worst part is that none of this is new. I’m not being original here; these are all things I’ve personally experienced, but they’ve also been widely reported. If you are parroting any of these arguments, you are at best willfully ignorant. When you begin repeating the same arguments that have already been debunked in that very thread, you are being intentionally malicious. And when you dismiss it because you don’t understand the relevance, you are attempting to suppress anyone whose views you don’t agree with. Throw in some gaslighting, alternative facts, and some “no, that never happens with systemd” (even after being presented with multiple instances of it happening) and it’s definitely a really fun time.

If you do this, you suck too.

Init-script

As promised, here’s my init-script to decrypt my drives. Note that this functionality would require a separate script with systemd. Note that the lock and unlock functionality could not be provided in the same command as start and stop with systemd. Note that the process can take some time to complete, and OpenRC has no problem delaying dependent services from starting until it is finished (even though I do have services starting in parallel).

I look forward to suggestions as to how I could de-simplify this with systemd.

#!/sbin/openrc-run

# Create encryption:
#   gpg --decrypt crypt_key.luks.gpg | cryptsetup luksFormat --key-size 512 /dev/$DEVICE -
# Open encryption:
#   gpg --decrypt crypt_key.luks.gpg | cryptsetup --key-file - luksOpen /dev/$DEVICE $DEVICE

extra_started_commands="lock unlock"

depend() {
	use modules
	before checkfs fsck
	after dev-settle
}

start() {
	setleds -D +num
	mount /tmp

	_start
}

_start() {
	export GNUPGHOME="/tmp/root-gnupg/"

	set -o pipefail
	rc=0

	pipes=""
	for dev in $DEVICES; do
		pipes="$pipes /tmp/key_pipe_$dev"
	done

	trap "rm -rf $pipes $GNUPGHOME" EXIT # Ensure pipes are removed even if we error out
	rm -rf $pipes "$GNUPGHOME" # Remove pipes in case they already exist
	umask 0077 # Make sure only root can read them

	cp -a /root/.gnupg "$GNUPGHOME"
	mkfifo $pipes
	for dev in $DEVICES; do
		(
			if [ "$1" = "--unlock" ]; then
				cryptsetup --key-file - luksResume $dev </tmp/key_pipe_$dev
			else
				cryptsetup --key-file - luksOpen /dev/$dev $dev </tmp/key_pipe_$dev
			fi
			eend $? "$dev failure running cryptsetup" || rc=$?
		) &
	done
	ebegin "Decrypting keyfile"
	timeout 30 gpg --quiet --decrypt /root/crypt_key.luks.gpg | tee $pipes >/dev/null
	eend $? "failure decrypting keyfile" || rc=$?
	gpg-connect-agent "killagent" </dev/null >/dev/null

	wait

	return $rc
}

stop() {
	for dev in $DEVICES; do
		cryptsetup close $dev
	done
}

lock() {
	for dev in $DEVICES; do
		cryptsetup luksSuspend $dev
	done
}

unlock() {
	#killall -v gpg-agent
	_start --unlock
}