Commit 7416634f authored by Christian Brauner's avatar Christian Brauner
Browse files

Merge patch series "fs: add immutable rootfs"

Christian Brauner <brauner@kernel.org> says:

Currently pivot_root() doesn't work on the real rootfs because it
cannot be unmounted. Userspace has to do a recursive removal of the
initramfs contents manually before continuing the boot.

Really all we want from the real rootfs is to serve as the parent mount
for anything that is actually useful such as the tmpfs or ramfs for
initramfs unpacking or the rootfs itself. There's no need for the real
rootfs to actually be anything meaningful or useful. Add a immutable
rootfs called "nullfs" that can be selected via the "nullfs_rootfs"
kernel command line option.

The kernel will mount a tmpfs/ramfs on top of it, unpack the initramfs
and fire up userspace which mounts the rootfs and can then just do:

  chdir(rootfs);
  pivot_root(".", ".");
  umount2(".", MNT_DETACH);

and be done with it. (Ofc, userspace can also choose to retain the
initramfs contents by using something like pivot_root(".", "/initramfs")
without unmounting it.)

Technically this also means that the rootfs mount in unprivileged
namespaces doesn't need to become MNT_LOCKED anymore as it's guaranteed
that the immutable rootfs remains permanently empty so there cannot be
anything revealed by unmounting the covering mount.

In the future this will also allow us to create completely empty mount
namespaces without risking to leak anything.

systemd already handles this all correctly as it tries to pivot_root()
first and falls back to MS_MOVE only when that fails.

This goes back to various discussion in previous years and a LPC 2024
presentation about this very topic.

* patches from https://patch.msgid.link/20260112-work-immutable-rootfs-v2-0-88dd1c34a204@kernel.org:
  docs: mention nullfs
  fs: add immutable rootfs
  fs: add init_pivot_root()
  fs: ensure that internal tmpfs mount gets mount id zero

Link: https://patch.msgid.link/20260112-work-immutable-rootfs-v2-0-88dd1c34a204@kernel.org


Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
parents 8f0b4cce 649cb20b
Loading
Loading
Loading
Loading
+23 −9
Original line number Diff line number Diff line
@@ -76,10 +76,15 @@ What is rootfs?
---------------

Rootfs is a special instance of ramfs (or tmpfs, if that's enabled), which is
always present in 2.6 systems.  You can't unmount rootfs for approximately the
same reason you can't kill the init process; rather than having special code
to check for and handle an empty list, it's smaller and simpler for the kernel
to just make sure certain lists can't become empty.
always present in 2.6 systems.  Traditionally, you can't unmount rootfs for
approximately the same reason you can't kill the init process; rather than
having special code to check for and handle an empty list, it's smaller and
simpler for the kernel to just make sure certain lists can't become empty.

However, if the kernel is booted with "nullfs_rootfs", an immutable empty
filesystem called nullfs is used as the true root, with the mutable rootfs
(tmpfs/ramfs) mounted on top of it.  This allows pivot_root() and unmounting
of the initramfs to work normally.

Most systems just mount another filesystem over rootfs and ignore it.  The
amount of space an empty instance of ramfs takes up is tiny.
@@ -121,17 +126,26 @@ All this differs from the old initrd in several ways:
    program.  See the switch_root utility, below.)

  - When switching another root device, initrd would pivot_root and then
    umount the ramdisk.  But initramfs is rootfs: you can neither pivot_root
    rootfs, nor unmount it.  Instead delete everything out of rootfs to
    free up the space (find -xdev / -exec rm '{}' ';'), overmount rootfs
    with the new root (cd /newmount; mount --move . /; chroot .), attach
    stdin/stdout/stderr to the new /dev/console, and exec the new init.
    umount the ramdisk.  Traditionally, initramfs is rootfs: you can neither
    pivot_root rootfs, nor unmount it.  Instead delete everything out of
    rootfs to free up the space (find -xdev / -exec rm '{}' ';'), overmount
    rootfs with the new root (cd /newmount; mount --move . /; chroot .),
    attach stdin/stdout/stderr to the new /dev/console, and exec the new init.

    Since this is a remarkably persnickety process (and involves deleting
    commands before you can run them), the klibc package introduced a helper
    program (utils/run_init.c) to do all this for you.  Most other packages
    (such as busybox) have named this command "switch_root".

    However, if the kernel is booted with "nullfs_rootfs", pivot_root() works
    normally from the initramfs.  Userspace can simply do::

      chdir(new_root);
      pivot_root(".", ".");
      umount2(".", MNT_DETACH);

    This is the preferred method when nullfs_rootfs is enabled.

Populating initramfs:
---------------------

+1 −1
Original line number Diff line number Diff line
@@ -16,7 +16,7 @@ obj-y := open.o read_write.o file_table.o super.o \
		stack.o fs_struct.o statfs.o fs_pin.o nsfs.o \
		fs_dirent.o fs_context.o fs_parser.o fsopen.o init.o \
		kernel_read_file.o mnt_idmapping.o remap_range.o pidfs.o \
		file_attr.o
		file_attr.o nullfs.o

obj-$(CONFIG_BUFFER_HEAD)	+= buffer.o mpage.o
obj-$(CONFIG_PROC_FS)		+= proc_namespace.o
+17 −0
Original line number Diff line number Diff line
@@ -13,6 +13,23 @@
#include <linux/security.h>
#include "internal.h"

int __init init_pivot_root(const char *new_root, const char *put_old)
{
	struct path new_path __free(path_put) = {};
	struct path old_path __free(path_put) = {};
	int ret;

	ret = kern_path(new_root, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &new_path);
	if (ret)
		return ret;

	ret = kern_path(put_old, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &old_path);
	if (ret)
		return ret;

	return path_pivot_root(&new_path, &old_path);
}

int __init init_mount(const char *dev_name, const char *dir_name,
		const char *type_page, unsigned long flags, void *data_page)
{
+1 −0
Original line number Diff line number Diff line
@@ -90,6 +90,7 @@ extern bool may_mount(void);
int path_mount(const char *dev_name, const struct path *path,
		const char *type_page, unsigned long flags, void *data_page);
int path_umount(const struct path *path, int flags);
int path_pivot_root(struct path *new, struct path *old);

int show_path(struct seq_file *m, struct dentry *root);

+1 −0
Original line number Diff line number Diff line
@@ -5,6 +5,7 @@
#include <linux/ns_common.h>
#include <linux/fs_pin.h>

extern struct file_system_type nullfs_fs_type;
extern struct list_head notify_list;

struct mnt_namespace {
Loading