Merge tag 'vfs-7.0-rc1.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs mount updates from Christian Brauner:

 - statmount: accept fd as a parameter

   Extend struct mnt_id_req with a file descriptor field and a new
   STATMOUNT_BY_FD flag. When set, statmount() returns mount information
   for the mount the fd resides on — including detached mounts
   (unmounted via umount2(MNT_DETACH)).

   For detached mounts the STATMOUNT_MNT_POINT and STATMOUNT_MNT_NS_ID
   mask bits are cleared since neither is meaningful. The capability
   check is skipped for STATMOUNT_BY_FD since holding an fd already
   implies prior access to the mount and equivalent information is
   available through fstatfs() and /proc/pid/mountinfo without
   privilege. Includes comprehensive selftests covering both attached
   and detached mount cases.

 - fs: Remove internal old mount API code (1 patch)

   Now that every in-tree filesystem has been converted to the new
   mount API, remove all the legacy shim code in fs_context.c that
   handled unconverted filesystems. This deletes ~280 lines including
   legacy_init_fs_context(), the legacy_fs_context struct, and
   associated wrappers. The mount(2) syscall path for userspace remains
   untouched. Documentation references to the legacy callbacks are
   cleaned up.

 - mount: add OPEN_TREE_NAMESPACE to open_tree()

   Container runtimes currently use CLONE_NEWNS to copy the caller's
   entire mount namespace — only to then pivot_root() and recursively
   unmount everything they just copied. With large mount tables and
   thousands of parallel container launches this creates significant
   contention on the namespace semaphore.

   OPEN_TREE_NAMESPACE copies only the specified mount tree (like
   OPEN_TREE_CLONE) but returns a mount namespace fd instead of a
   detached mount fd. The new namespace contains the copied tree mounted
   on top of a clone of the real rootfs.

   This functions as a combined unshare(CLONE_NEWNS) + pivot_root() in a
   single syscall. Works with user namespaces: an unshare(CLONE_NEWUSER)
   followed by OPEN_TREE_NAMESPACE creates a mount namespace owned by
   the new user namespace. Mount namespace file mounts are excluded from
   the copy to prevent cycles. Includes ~1000 lines of selftests"

* tag 'vfs-7.0-rc1.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  selftests/open_tree: add OPEN_TREE_NAMESPACE tests
  mount: add OPEN_TREE_NAMESPACE
  fs: Remove internal old mount API code
  selftests: statmount: tests for STATMOUNT_BY_FD
  statmount: accept fd as a parameter
  statmount: permission check should return EPERM
This commit is contained in:
Linus Torvalds
2026-02-09 14:43:47 -08:00
20 changed files with 1669 additions and 365 deletions

View File

@@ -180,7 +180,6 @@ prototypes::
int (*freeze_fs) (struct super_block *);
int (*unfreeze_fs) (struct super_block *);
int (*statfs) (struct dentry *, struct kstatfs *);
int (*remount_fs) (struct super_block *, int *, char *);
void (*umount_begin) (struct super_block *);
int (*show_options)(struct seq_file *, struct dentry *);
ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t);
@@ -204,7 +203,6 @@ sync_fs: read
freeze_fs: write
unfreeze_fs: write
statfs: maybe(read) (see below)
remount_fs: write
umount_begin: no
show_options: no (namespace_sem)
quota_read: no (see below)
@@ -229,8 +227,6 @@ file_system_type
prototypes::
struct dentry *(*mount) (struct file_system_type *, int,
const char *, void *);
void (*kill_sb) (struct super_block *);
locking rules:
@@ -238,13 +234,9 @@ locking rules:
======= =========
ops may block
======= =========
mount yes
kill_sb yes
======= =========
->mount() returns ERR_PTR or the root dentry; its superblock should be locked
on return.
->kill_sb() takes a write-locked superblock, does all shutdown work on it,
unlocks and drops the reference.

View File

@@ -299,8 +299,6 @@ manage the filesystem context. They are as follows:
On success it should return 0. In the case of an error, it should return
a negative error code.
.. Note:: reconfigure is intended as a replacement for remount_fs.
Filesystem context Security
===========================

View File

@@ -448,11 +448,8 @@ a file off.
**mandatory**
->get_sb() is gone. Switch to use of ->mount(). Typically it's just
a matter of switching from calling ``get_sb_``... to ``mount_``... and changing
the function type. If you were doing it manually, just switch from setting
->mnt_root to some pointer to returning that pointer. On errors return
ERR_PTR(...).
->get_sb() and ->mount() are gone. Switch to using the new mount API. See
Documentation/filesystems/mount_api.rst for more details.
---

View File

@@ -94,11 +94,9 @@ functions:
The passed struct file_system_type describes your filesystem. When a
request is made to mount a filesystem onto a directory in your
namespace, the VFS will call the appropriate mount() method for the
specific filesystem. New vfsmount referring to the tree returned by
->mount() will be attached to the mountpoint, so that when pathname
resolution reaches the mountpoint it will jump into the root of that
vfsmount.
namespace, the VFS will call the appropriate get_tree() method for the
specific filesystem. See Documentation/filesystems/mount_api.rst
for more details.
You can see all filesystems that are registered to the kernel in the
file /proc/filesystems.
@@ -117,8 +115,6 @@ members are defined:
int fs_flags;
int (*init_fs_context)(struct fs_context *);
const struct fs_parameter_spec *parameters;
struct dentry *(*mount) (struct file_system_type *, int,
const char *, void *);
void (*kill_sb) (struct super_block *);
struct module *owner;
struct file_system_type * next;
@@ -151,10 +147,6 @@ members are defined:
'struct fs_parameter_spec'.
More info in Documentation/filesystems/mount_api.rst.
``mount``
the method to call when a new instance of this filesystem should
be mounted
``kill_sb``
the method to call when an instance of this filesystem should be
shut down
@@ -173,45 +165,6 @@ members are defined:
s_lock_key, s_umount_key, s_vfs_rename_key, s_writers_key,
i_lock_key, i_mutex_key, invalidate_lock_key, i_mutex_dir_key: lockdep-specific
The mount() method has the following arguments:
``struct file_system_type *fs_type``
describes the filesystem, partly initialized by the specific
filesystem code
``int flags``
mount flags
``const char *dev_name``
the device name we are mounting.
``void *data``
arbitrary mount options, usually comes as an ASCII string (see
"Mount Options" section)
The mount() method must return the root dentry of the tree requested by
caller. An active reference to its superblock must be grabbed and the
superblock must be locked. On failure it should return ERR_PTR(error).
The arguments match those of mount(2) and their interpretation depends
on filesystem type. E.g. for block filesystems, dev_name is interpreted
as block device name, that device is opened and if it contains a
suitable filesystem image the method creates and initializes struct
super_block accordingly, returning its root dentry to caller.
->mount() may choose to return a subtree of existing filesystem - it
doesn't have to create a new one. The main result from the caller's
point of view is a reference to dentry at the root of (sub)tree to be
attached; creation of new superblock is a common side effect.
The most interesting member of the superblock structure that the mount()
method fills in is the "s_op" field. This is a pointer to a "struct
super_operations" which describes the next level of the filesystem
implementation.
For more information on mounting (and the new mount API), see
Documentation/filesystems/mount_api.rst.
The Superblock Object
=====================
@@ -244,7 +197,6 @@ filesystem. The following members are defined:
enum freeze_wholder who);
int (*unfreeze_fs) (struct super_block *);
int (*statfs) (struct dentry *, struct kstatfs *);
int (*remount_fs) (struct super_block *, int *, char *);
void (*umount_begin) (struct super_block *);
int (*show_options)(struct seq_file *, struct dentry *);
@@ -351,10 +303,6 @@ or bottom half).
``statfs``
called when the VFS needs to get filesystem statistics.
``remount_fs``
called when the filesystem is remounted. This is called with
the kernel lock held
``umount_begin``
called when the VFS is unmounting a filesystem.