Commit c84bb79f authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge tag 'vfs-7.0-rc1.nullfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs nullfs update from Christian Brauner:
 "Add a completely catatonic minimal pseudo filesystem called "nullfs"
  and make pivot_root() work in the initramfs.

  Currently pivot_root() does not work on the real rootfs because it
  cannot be unmounted. Userspace has to recursively delete initramfs
  contents manually before continuing boot, using the fragile
  switch_root sequence (overmount + chroot).

  Add nullfs, a minimal immutable filesystem that serves as the true
  root of the mount hierarchy. The mutable rootfs (tmpfs/ramfs) is
  mounted on top of it. This allows userspace to simply:

      chdir(new_root);
      pivot_root(".", ".");
      umount2(".", MNT_DETACH);

  without the traditional switch_root workarounds. systemd already
  handles this correctly. It tries pivot_root() first and falls back
  to MS_MOVE only when that fails.

  This also means rootfs mounts in unprivileged namespaces no longer
  need MNT_LOCKED, since the immutable nullfs guarantees nothing can be
  revealed by unmounting the covering mount.

  nullfs is a single-instance filesystem (get_tree_single()) marked
  SB_NOUSER | SB_I_NOEXEC | SB_I_NODEV with an immutable empty root
  directory. This means sooner or later it can be used to overmount
  other directories to hide their contents without any additional
  protection needed.

  We enable it unconditionally. If we see any real regression we'll
  hide it behind a boot option.

  nullfs has extensions beyond this in the future. It will serve as a
  concept to support the creation of completely empty mount namespaces -
  which is work coming up in the next cycle"

* tag 'vfs-7.0-rc1.nullfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: use nullfs unconditionally as the real rootfs
  docs: mention nullfs
  fs: add immutable rootfs
  fs: add init_pivot_root()
  fs: ensure that internal tmpfs mount gets mount id zero
parents 7e01a69f 313c47f4
Loading
Loading
Loading
Loading
+12 −14
Original line number Diff line number Diff line
@@ -76,10 +76,10 @@ What is rootfs?
---------------

Rootfs is a special instance of ramfs (or tmpfs, if that's enabled), which is
always present in 2.6 systems.  You can't unmount rootfs for approximately the
same reason you can't kill the init process; rather than having special code
to check for and handle an empty list, it's smaller and simpler for the kernel
to just make sure certain lists can't become empty.
always present in Linux systems.  The kernel uses an immutable empty filesystem
called nullfs as the true root of the VFS hierarchy, with the mutable rootfs
(tmpfs/ramfs) mounted on top of it.  This allows pivot_root() and unmounting
of the initramfs to work normally.

Most systems just mount another filesystem over rootfs and ignore it.  The
amount of space an empty instance of ramfs takes up is tiny.
@@ -121,16 +121,14 @@ All this differs from the old initrd in several ways:
    program.  See the switch_root utility, below.)

  - When switching another root device, initrd would pivot_root and then
    umount the ramdisk.  But initramfs is rootfs: you can neither pivot_root
    rootfs, nor unmount it.  Instead delete everything out of rootfs to
    free up the space (find -xdev / -exec rm '{}' ';'), overmount rootfs
    with the new root (cd /newmount; mount --move . /; chroot .), attach
    stdin/stdout/stderr to the new /dev/console, and exec the new init.

    Since this is a remarkably persnickety process (and involves deleting
    commands before you can run them), the klibc package introduced a helper
    program (utils/run_init.c) to do all this for you.  Most other packages
    (such as busybox) have named this command "switch_root".
    umount the ramdisk.  With nullfs as the true root, pivot_root() works
    normally from the initramfs.  Userspace can simply do::

      chdir(new_root);
      pivot_root(".", ".");
      umount2(".", MNT_DETACH);

    This is the preferred method for switching root filesystems.

Populating initramfs:
---------------------
+1 −1
Original line number Diff line number Diff line
@@ -16,7 +16,7 @@ obj-y := open.o read_write.o file_table.o super.o \
		stack.o fs_struct.o statfs.o fs_pin.o nsfs.o \
		fs_dirent.o fs_context.o fs_parser.o fsopen.o init.o \
		kernel_read_file.o mnt_idmapping.o remap_range.o pidfs.o \
		file_attr.o fserror.o
		file_attr.o fserror.o nullfs.o

obj-$(CONFIG_BUFFER_HEAD)	+= buffer.o mpage.o
obj-$(CONFIG_PROC_FS)		+= proc_namespace.o
+17 −0
Original line number Diff line number Diff line
@@ -13,6 +13,23 @@
#include <linux/security.h>
#include "internal.h"

int __init init_pivot_root(const char *new_root, const char *put_old)
{
	struct path new_path __free(path_put) = {};
	struct path old_path __free(path_put) = {};
	int ret;

	ret = kern_path(new_root, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &new_path);
	if (ret)
		return ret;

	ret = kern_path(put_old, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &old_path);
	if (ret)
		return ret;

	return path_pivot_root(&new_path, &old_path);
}

int __init init_mount(const char *dev_name, const char *dir_name,
		const char *type_page, unsigned long flags, void *data_page)
{
+1 −0
Original line number Diff line number Diff line
@@ -90,6 +90,7 @@ extern bool may_mount(void);
int path_mount(const char *dev_name, const struct path *path,
		const char *type_page, unsigned long flags, void *data_page);
int path_umount(const struct path *path, int flags);
int path_pivot_root(struct path *new, struct path *old);

int show_path(struct seq_file *m, struct dentry *root);

+1 −0
Original line number Diff line number Diff line
@@ -5,6 +5,7 @@
#include <linux/ns_common.h>
#include <linux/fs_pin.h>

extern struct file_system_type nullfs_fs_type;
extern struct list_head notify_list;

struct mnt_namespace {
Loading