123 Commits

Author SHA1 Message Date
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Linus Torvalds
136114e0ab Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:

 - "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves
   disk space by teaching ocfs2 to reclaim suballocator block group
   space (Heming Zhao)

 - "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the
   ARRAY_END() macro and uses it in various places (Alejandro Colomar)

 - "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes
   the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the
   page size (Pnina Feder)

 - "kallsyms: Prevent invalid access when showing module buildid" cleans
   up kallsyms code related to module buildid and fixes an invalid
   access crash when printing backtraces (Petr Mladek)

 - "Address page fault in ima_restore_measurement_list()" fixes a
   kexec-related crash that can occur when booting the second-stage
   kernel on x86 (Harshit Mogalapalli)

 - "kho: ABI headers and Documentation updates" updates the kexec
   handover ABI documentation (Mike Rapoport)

 - "Align atomic storage" adds the __aligned attribute to atomic_t and
   atomic64_t definitions to get natural alignment of both types on
   csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain)

 - "kho: clean up page initialization logic" simplifies the page
   initialization logic in kho_restore_page() (Pratyush Yadav)

 - "Unload linux/kernel.h" moves several things out of kernel.h and into
   more appropriate places (Yury Norov)

 - "don't abuse task_struct.group_leader" removes the usage of
   ->group_leader when it is "obviously unnecessary" (Oleg Nesterov)

 - "list private v2 & luo flb" adds some infrastructure improvements to
   the live update orchestrator (Pasha Tatashin)

* tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits)
  watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency
  procfs: fix missing RCU protection when reading real_parent in do_task_stat()
  watchdog/softlockup: fix sample ring index wrap in need_counting_irqs()
  kcsan, compiler_types: avoid duplicate type issues in BPF Type Format
  kho: fix doc for kho_restore_pages()
  tests/liveupdate: add in-kernel liveupdate test
  liveupdate: luo_flb: introduce File-Lifecycle-Bound global state
  liveupdate: luo_file: Use private list
  list: add kunit test for private list primitives
  list: add primitives for private list manipulations
  delayacct: fix uapi timespec64 definition
  panic: add panic_force_cpu= parameter to redirect panic to a specific CPU
  netclassid: use thread_group_leader(p) in update_classid_task()
  RDMA/umem: don't abuse current->group_leader
  drm/pan*: don't abuse current->group_leader
  drm/amd: kill the outdated "Only the pthreads threading model is supported" checks
  drm/amdgpu: don't abuse current->group_leader
  android/binder: use same_thread_group(proc->tsk, current) in binder_mmap()
  android/binder: don't abuse current->group_leader
  kho: skip memoryless NUMA nodes when reserving scratch areas
  ...
2026-02-12 12:13:01 -08:00
Randy Dunlap
24c776355f kernel.h: drop hex.h and update all hex.h users
Remove <linux/hex.h> from <linux/kernel.h> and update all users/callers of
hex.h interfaces to directly #include <linux/hex.h> as part of the process
of putting kernel.h on a diet.

Removing hex.h from kernel.h means that 36K C source files don't have to
pay the price of parsing hex.h for the roughly 120 C source files that
need it.

This change has been build-tested with allmodconfig on most ARCHes.  Also,
all users/callers of <linux/hex.h> in the entire source tree have been
updated if needed (if not already #included).

Link: https://lkml.kernel.org/r/20251215005206.2362276-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20 19:44:19 -08:00
Kohei Enju
4b22ec1685 efivarfs: fix error propagation in efivar_entry_get()
efivar_entry_get() always returns success even if the underlying
__efivar_entry_get() fails, masking errors.

This may result in uninitialized heap memory being copied to userspace
in the efivarfs_file_read() path.

Fix it by returning the error from __efivar_entry_get().

Fixes: 2d82e6227e ("efi: vars: Move efivar caching layer into efivarfs")
Cc: <stable@vger.kernel.org> # v6.1+
Signed-off-by: Kohei Enju <kohei@enjuk.jp>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2026-01-19 08:40:36 +01:00
Linus Torvalds
7cd122b552 Merge tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull persistent dentry infrastructure and conversion from Al Viro:
 "Some filesystems use a kinda-sorta controlled dentry refcount leak to
  pin dentries of created objects in dcache (and undo it when removing
  those). A reference is grabbed and not released, but it's not actually
  _stored_ anywhere.

  That works, but it's hard to follow and verify; among other things, we
  have no way to tell _which_ of the increments is intended to be an
  unpaired one. Worse, on removal we need to decide whether the
  reference had already been dropped, which can be non-trivial if that
  removal is on umount and we need to figure out if this dentry is
  pinned due to e.g. unlink() not done. Usually that is handled by using
  kill_litter_super() as ->kill_sb(), but there are open-coded special
  cases of the same (consider e.g. /proc/self).

  Things get simpler if we introduce a new dentry flag
  (DCACHE_PERSISTENT) marking those "leaked" dentries. Having it set
  claims responsibility for +1 in refcount.

  The end result this series is aiming for:

   - get these unbalanced dget() and dput() replaced with new primitives
     that would, in addition to adjusting refcount, set and clear
     persistency flag.

   - instead of having kill_litter_super() mess with removing the
     remaining "leaked" references (e.g. for all tmpfs files that hadn't
     been removed prior to umount), have the regular
     shrink_dcache_for_umount() strip DCACHE_PERSISTENT of all dentries,
     dropping the corresponding reference if it had been set. After that
     kill_litter_super() becomes an equivalent of kill_anon_super().

  Doing that in a single step is not feasible - it would affect too many
  places in too many filesystems. It has to be split into a series.

  This work has really started early in 2024; quite a few preliminary
  pieces have already gone into mainline. This chunk is finally getting
  to the meat of that stuff - infrastructure and most of the conversions
  to it.

  Some pieces are still sitting in the local branches, but the bulk of
  that stuff is here"

* tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
  d_make_discardable(): warn if given a non-persistent dentry
  kill securityfs_recursive_remove()
  convert securityfs
  get rid of kill_litter_super()
  convert rust_binderfs
  convert nfsctl
  convert rpc_pipefs
  convert hypfs
  hypfs: swich hypfs_create_u64() to returning int
  hypfs: switch hypfs_create_str() to returning int
  hypfs: don't pin dentries twice
  convert gadgetfs
  gadgetfs: switch to simple_remove_by_name()
  convert functionfs
  functionfs: switch to simple_remove_by_name()
  functionfs: fix the open/removal races
  functionfs: need to cancel ->reset_work in ->kill_sb()
  functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}()
  functionfs: don't abuse ffs_data_closed() on fs shutdown
  convert selinuxfs
  ...
2025-12-05 14:36:21 -08:00
Al Viro
e11e247608 convert efivarfs
Initially filesystem is populated with d_alloc_name() + d_add().
That becomes d_alloc_name() + d_make_persistent() + dput().
Dynamic creation is switched to d_make_persistent();
removal - to simple_unlink() (no point open-coding it in
efivarfs_unlink(), better call it there)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:04 -05:00
Christian Brauner
a3f8f86627 power: always freeze efivarfs
The efivarfs filesystems must always be frozen and thawed to resync
variable state. Make it so.

Link: https://patch.msgid.link/20251105-vorbild-zutreffen-fe00d1dd98db@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12 10:12:39 +01:00
Linus Torvalds
b7ce6fa90f Merge tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
 "This contains the usual selections of misc updates for this cycle.

  Features:

   - Add "initramfs_options" parameter to set initramfs mount options.
     This allows to add specific mount options to the rootfs to e.g.,
     limit the memory size

   - Add RWF_NOSIGNAL flag for pwritev2()

     Add RWF_NOSIGNAL flag for pwritev2. This flag prevents the SIGPIPE
     signal from being raised when writing on disconnected pipes or
     sockets. The flag is handled directly by the pipe filesystem and
     converted to the existing MSG_NOSIGNAL flag for sockets

   - Allow to pass pid namespace as procfs mount option

     Ever since the introduction of pid namespaces, procfs has had very
     implicit behaviour surrounding them (the pidns used by a procfs
     mount is auto-selected based on the mounting process's active
     pidns, and the pidns itself is basically hidden once the mount has
     been constructed)

     This implicit behaviour has historically meant that userspace was
     required to do some special dances in order to configure the pidns
     of a procfs mount as desired. Examples include:

     * In order to bypass the mnt_too_revealing() check, Kubernetes
       creates a procfs mount from an empty pidns so that user
       namespaced containers can be nested (without this, the nested
       containers would fail to mount procfs)

       But this requires forking off a helper process because you cannot
       just one-shot this using mount(2)

     * Container runtimes in general need to fork into a container
       before configuring its mounts, which can lead to security issues
       in the case of shared-pidns containers (a privileged process in
       the pidns can interact with your container runtime process)

       While SUID_DUMP_DISABLE and user namespaces make this less of an
       issue, the strict need for this due to a minor uAPI wart is kind
       of unfortunate

       Things would be much easier if there was a way for userspace to
       just specify the pidns they want. So this pull request contains
       changes to implement a new "pidns" argument which can be set
       using fsconfig(2):

           fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
           fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0);

       or classic mount(2) / mount(8):

           // mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc
           mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid");

  Cleanups:

   - Remove the last references to EXPORT_OP_ASYNC_LOCK

   - Make file_remove_privs_flags() static

   - Remove redundant __GFP_NOWARN when GFP_NOWAIT is used

   - Use try_cmpxchg() in start_dir_add()

   - Use try_cmpxchg() in sb_init_done_wq()

   - Replace offsetof() with struct_size() in ioctl_file_dedupe_range()

   - Remove vfs_ioctl() export

   - Replace rwlock() with spinlock in epoll code as rwlock causes
     priority inversion on preempt rt kernels

   - Make ns_entries in fs/proc/namespaces const

   - Use a switch() statement() in init_special_inode() just like we do
     in may_open()

   - Use struct_size() in dir_add() in the initramfs code

   - Use str_plural() in rd_load_image()

   - Replace strcpy() with strscpy() in find_link()

   - Rename generic_delete_inode() to inode_just_drop() and
     generic_drop_inode() to inode_generic_drop()

   - Remove unused arguments from fcntl_{g,s}et_rw_hint()

  Fixes:

   - Document @name parameter for name_contains_dotdot() helper

   - Fix spelling mistake

   - Always return zero from replace_fd() instead of the file descriptor
     number

   - Limit the size for copy_file_range() in compat mode to prevent a
     signed overflow

   - Fix debugfs mount options not being applied

   - Verify the inode mode when loading it from disk in minixfs

   - Verify the inode mode when loading it from disk in cramfs

   - Don't trigger automounts with RESOLVE_NO_XDEV

     If openat2() was called with RESOLVE_NO_XDEV it didn't traverse
     through automounts, but could still trigger them

   - Add FL_RECLAIM flag to show_fl_flags() macro so it appears in
     tracepoints

   - Fix unused variable warning in rd_load_image() on s390

   - Make INITRAMFS_PRESERVE_MTIME depend on BLK_DEV_INITRD

   - Use ns_capable_noaudit() when determining net sysctl permissions

   - Don't call path_put() under namespace semaphore in listmount() and
     statmount()"

* tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (38 commits)
  fcntl: trim arguments
  listmount: don't call path_put() under namespace semaphore
  statmount: don't call path_put() under namespace semaphore
  pid: use ns_capable_noaudit() when determining net sysctl permissions
  fs: rename generic_delete_inode() and generic_drop_inode()
  init: INITRAMFS_PRESERVE_MTIME should depend on BLK_DEV_INITRD
  initramfs: Replace strcpy() with strscpy() in find_link()
  initrd: Use str_plural() in rd_load_image()
  initramfs: Use struct_size() helper to improve dir_add()
  initrd: Fix unused variable warning in rd_load_image() on s390
  fs: use the switch statement in init_special_inode()
  fs/proc/namespaces: make ns_entries const
  filelock: add FL_RECLAIM to show_fl_flags() macro
  eventpoll: Replace rwlock with spinlock
  selftests/proc: add tests for new pidns APIs
  procfs: add "pidns" mount option
  pidns: move is-ancestor logic to helper
  openat2: don't trigger automounts with RESOLVE_NO_XDEV
  namei: move cross-device check to __traverse_mounts
  namei: remove LOOKUP_NO_XDEV check from handle_mounts
  ...
2025-09-29 09:03:07 -07:00
Mateusz Guzik
f99b391778 fs: rename generic_delete_inode() and generic_drop_inode()
generic_delete_inode() is rather misleading for what the routine is
doing. inode_just_drop() should be much clearer.

The new naming is inconsistent with generic_drop_inode(), so rename that
one as well with inode_ as the suffix.

No functional changes.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-15 16:09:42 +02:00
Li Nan
a6358f8cf6 efivarfs: Fix slab-out-of-bounds in efivarfs_d_compare
Observed on kernel 6.6 (present on master as well):

  BUG: KASAN: slab-out-of-bounds in memcmp+0x98/0xd0
  Call trace:
   kasan_check_range+0xe8/0x190
   __asan_loadN+0x1c/0x28
   memcmp+0x98/0xd0
   efivarfs_d_compare+0x68/0xd8
   __d_lookup_rcu_op_compare+0x178/0x218
   __d_lookup_rcu+0x1f8/0x228
   d_alloc_parallel+0x150/0x648
   lookup_open.isra.0+0x5f0/0x8d0
   open_last_lookups+0x264/0x828
   path_openat+0x130/0x3f8
   do_filp_open+0x114/0x248
   do_sys_openat2+0x340/0x3c0
   __arm64_sys_openat+0x120/0x1a0

If dentry->d_name.len < EFI_VARIABLE_GUID_LEN , 'guid' can become
negative, leadings to oob. The issue can be triggered by parallel
lookups using invalid filename:

  T1			T2
  lookup_open
   ->lookup
    simple_lookup
     d_add
     // invalid dentry is added to hash list

			lookup_open
			 d_alloc_parallel
			  __d_lookup_rcu
			   __d_lookup_rcu_op_compare
			    hlist_bl_for_each_entry_rcu
			    // invalid dentry can be retrieved
			     ->d_compare
			      efivarfs_d_compare
			      // oob

Fix it by checking 'guid' before cmp.

Fixes: da27a24383 ("efivarfs: guid part of filenames are case-insensitive")
Signed-off-by: Li Nan <linan122@huawei.com>
Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-08-28 08:39:49 +02:00
Linus Torvalds
57fcb7d930 Merge tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull fileattr updates from Christian Brauner:
 "This introduces the new file_getattr() and file_setattr() system calls
  after lengthy discussions.

  Both system calls serve as successors and extensible companions to
  the FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR system calls which have
  started to show their age in addition to being named in a way that
  makes it easy to conflate them with extended attribute related
  operations.

  These syscalls allow userspace to set filesystem inode attributes on
  special files. One of the usage examples is the XFS quota projects.

  XFS has project quotas which could be attached to a directory. All new
  inodes in these directories inherit project ID set on parent
  directory.

  The project is created from userspace by opening and calling
  FS_IOC_FSSETXATTR on each inode. This is not possible for special
  files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left
  with empty project ID. Those inodes then are not shown in the quota
  accounting but still exist in the directory. This is not critical but
  in the case when special files are created in the directory with
  already existing project quota, these new inodes inherit extended
  attributes. This creates a mix of special files with and without
  attributes. Moreover, special files with attributes don't have a
  possibility to become clear or change the attributes. This, in turn,
  prevents userspace from re-creating quota project on these existing
  files.

  In addition, these new system calls allow the implementation of
  additional attributes that we couldn't or didn't want to fit into the
  legacy ioctls anymore"

* tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: tighten a sanity check in file_attr_to_fileattr()
  tree-wide: s/struct fileattr/struct file_kattr/g
  fs: introduce file_getattr and file_setattr syscalls
  fs: prepare for extending file_get/setattr()
  fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP
  selinux: implement inode_file_[g|s]etattr hooks
  lsm: introduce new hooks for setting/getting inode fsxattr
  fs: split fileattr related helpers into separate file
2025-07-28 15:24:14 -07:00
Linus Torvalds
11fe69fbd5 Merge tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull dentry d_flags updates from Al Viro:
 "The current exclusion rules for dentry->d_flags stores are rather
  unpleasant. The basic rules are simple:

   - stores to dentry->d_flags are OK under dentry->d_lock

   - stores to dentry->d_flags are OK in the dentry constructor, before
     becomes potentially visible to other threads

  Unfortunately, there's a couple of exceptions to that, and that's
  where the headache comes from.

  The main PITA comes from d_set_d_op(); that primitive sets ->d_op of
  dentry and adjusts the flags that correspond to presence of individual
  methods. It's very easy to misuse; existing uses _are_ safe, but proof
  of correctness is brittle.

  Use in __d_alloc() is safe (we are within a constructor), but we might
  as well precalculate the initial value of 'd_flags' when we set the
  default ->d_op for given superblock and set 'd_flags' directly instead
  of messing with that helper.

  The reasons why other uses are safe are bloody convoluted; I'm not
  going to reproduce it here. See [1] for gory details, if you care. The
  critical part is using d_set_d_op() only just prior to
  d_splice_alias(), which makes a combination of d_splice_alias() with
  setting ->d_op, etc a natural replacement primitive.

  Better yet, if we go that way, it's easy to take setting ->d_op and
  modifying 'd_flags' under ->d_lock, which eliminates the headache as
  far as 'd_flags' exclusion rules are concerned. Other exceptions are
  minor and easy to deal with.

  What this series does:

   - d_set_d_op() is no longer available; instead a new primitive
     (d_splice_alias_ops()) is provided, equivalent to combination of
     d_set_d_op() and d_splice_alias().

   - new field of struct super_block - 's_d_flags'. This sets the
     default value of 'd_flags' to be used when allocating dentries on
     this filesystem.

   - new primitive for setting 's_d_op': set_default_d_op(). This
     replaces stores to 's_d_op' at mount time.

     All in-tree filesystems converted; out-of-tree ones will get caught
     by the compiler ('s_d_op' is renamed, so stores to it will be
     caught). 's_d_flags' is set by the same primitive to match the
     's_d_op'.

   - a lot of filesystems had sb->s_d_op->d_delete equal to
     always_delete_dentry; that is equivalent to setting
     DCACHE_DONTCACHE in 'd_flags', so such filesystems can bloody well
     set that bit in 's_d_flags' and drop 'd_delete()' from
     dentry_operations.

     In quite a few cases that results in empty dentry_operations, which
     means that we can get rid of those.

   - kill simple_dentry_operations - not needed anymore

   - massage d_alloc_parallel() to get rid of the other exception wrt
     'd_flags' stores - we can set DCACHE_PAR_LOOKUP as soon as we
     allocate the new dentry; no need to delay that until we commit to
     using the sucker.

  As the result, 'd_flags' stores are all either under ->d_lock or done
  before the dentry becomes visible in any shared data structures"

Link: https://lore.kernel.org/all/20250224010624.GT1977892@ZenIV/ [1]

* tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (21 commits)
  configfs: use DCACHE_DONTCACHE
  debugfs: use DCACHE_DONTCACHE
  efivarfs: use DCACHE_DONTCACHE instead of always_delete_dentry()
  9p: don't bother with always_delete_dentry
  ramfs, hugetlbfs, mqueue: set DCACHE_DONTCACHE
  kill simple_dentry_operations
  devpts, sunrpc, hostfs: don't bother with ->d_op
  shmem: no dentry retention past the refcount reaching zero
  d_alloc_parallel(): set DCACHE_PAR_LOOKUP earlier
  make d_set_d_op() static
  simple_lookup(): just set DCACHE_DONTCACHE
  tracefs: Add d_delete to remove negative dentries
  set_default_d_op(): calculate the matching value for ->d_flags
  correct the set of flags forbidden at d_set_d_op() time
  split d_flags calculation out of d_set_d_op()
  new helper: set_default_d_op()
  fuse: no need for special dentry_operations for root dentry
  switch procfs from d_set_d_op() to d_splice_alias_ops()
  new helper: d_splice_alias_ops()
  procfs: kill ->proc_dops
  ...
2025-07-28 09:17:57 -07:00
Breno Leitao
64e135f1ea efivarfs: Fix memory leak of efivarfs_fs_info in fs_context error paths
When processing mount options, efivarfs allocates efivarfs_fs_info (sfi)
early in fs_context initialization. However, sfi is associated with the
superblock and typically freed when the superblock is destroyed. If the
fs_context is released (final put) before fill_super is called—such as
on error paths or during reconfiguration—the sfi structure would leak,
as ownership never transfers to the superblock.

Implement the .free callback in efivarfs_context_ops to ensure any
allocated sfi is properly freed if the fs_context is torn down before
fill_super, preventing this memory leak.

Suggested-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Fixes: 5329aa5101 ("efivarfs: Add uid/gid mount options")
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-07-18 12:00:20 +02:00
Christian Brauner
ca115d7e75 tree-wide: s/struct fileattr/struct file_kattr/g
Now that we expose struct file_attr as our uapi struct rename all the
internal struct to struct file_kattr to clearly communicate that it is a
kernel internal struct. This is similar to struct mount_{k}attr and
others.

Link: https://lore.kernel.org/20250703-restlaufzeit-baurecht-9ed44552b481@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-04 16:14:39 +02:00
Al Viro
38109b6554 efivarfs: use DCACHE_DONTCACHE instead of always_delete_dentry()
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-11 13:41:05 -04:00
Al Viro
05fb0e6664 new helper: set_default_d_op()
... to be used instead of manually assigning to ->s_d_op.
All in-tree filesystem converted (and field itself is renamed,
so any out-of-tree ones in need of conversion will be caught
by compiler).

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-10 22:21:16 -04:00
Linus Torvalds
8dd53535f1 Merge tag 'vfs-6.16-rc1.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs freezing updates from Christian Brauner:
 "This contains various filesystem freezing related work for this cycle:

   - Allow the power subsystem to support filesystem freeze for suspend
     and hibernate.

     Now all the pieces are in place to actually allow the power
     subsystem to freeze/thaw filesystems during suspend/resume.
     Filesystems are only frozen and thawed if the power subsystem does
     actually own the freeze.

     If the filesystem is already frozen by the time we've frozen all
     userspace processes we don't care to freeze it again. That's
     userspace's job once the process resumes. We only actually freeze
     filesystems if we absolutely have to and we ignore other failures
     to freeze.

     We could bubble up errors and fail suspend/resume if the error
     isn't EBUSY (aka it's already frozen) but I don't think that this
     is worth it. Filesystem freezing during suspend/resume is
     best-effort. If the user has 500 ext4 filesystems mounted and 4
     fail to freeze for whatever reason then we simply skip them.

     What we have now is already a big improvement and let's see how we
     fare with it before making our lives even harder (and uglier) than
     we have to.

   - Allow efivars to support freeze and thaw

     Allow efivarfs to partake to resync variable state during system
     hibernation and suspend. Add freeze/thaw support.

     This is a pretty straightforward implementation. We simply add
     regular freeze/thaw support for both userspace and the kernel.
     efivars is the first pseudofilesystem that adds support for
     filesystem freezing and thawing.

     The simplicity comes from the fact that we simply always resync
     variable state after efivarfs has been frozen. It doesn't matter
     whether that's because of suspend, userspace initiated freeze or
     hibernation. Efivars is simple enough that it doesn't matter that
     we walk all dentries. There are no directories and there aren't
     insane amounts of entries and both freeze/thaw are already
     heavy-handed operations. If userspace initiated a freeze/thaw cycle
     they would need CAP_SYS_ADMIN in the initial user namespace (as
     that's where efivarfs is mounted) so it can't be triggered by
     random userspace. IOW, we really really don't care"

* tag 'vfs-6.16-rc1.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  f2fs: fix freezing filesystem during resize
  kernfs: add warning about implementing freeze/thaw
  efivarfs: support freeze/thaw
  power: freeze filesystems during suspend/resume
  libfs: export find_next_child()
  super: add filesystem freezing helpers for suspend and hibernate
  gfs2: pass through holder from the VFS for freeze/thaw
  super: use common iterator (Part 2)
  super: use a common iterator (Part 1)
  super: skip dying superblocks early
  super: simplify user_get_super()
  super: remove pointless s_root checks
  fs: allow all writers to be frozen
  locking/percpu-rwsem: add freezable alternative to down_read
2025-05-26 09:33:44 -07:00
Christian Brauner
0e4f948395 efivarfs: support freeze/thaw
Allow efivarfs to partake to resync variable state during system
hibernation and suspend. Add freeze/thaw support.

This is a pretty straightforward implementation. We simply add regular
freeze/thaw support for both userspace and the kernel. This works
without any big issues and congrats afaict efivars is the first
pseudofilesystem that adds support for filesystem freezing and thawing.

The simplicity comes from the fact that we simply always resync variable
state after efivarfs has been frozen. It doesn't matter whether that's
because of suspend, userspace initiated freeze or hibernation. Efivars
is simple enough that it doesn't matter that we walk all dentries. There
are no directories and there aren't insane amounts of entries and both
freeze/thaw are already heavy-handed operations. We really really don't
need to care.

Link: https://lore.kernel.org/r/20250331-work-freeze-v1-2-6dfbe8253b9f@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-09 12:41:23 +02:00
NeilBrown
06c567403a Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS
try_lookup_noperm() and d_hash_and_lookup() are nearly identical.  The
former does some validation of the name where the latter doesn't.
Outside of the VFS that validation is likely valuable, and having only
one exported function for this task is certainly a good idea.

So make d_hash_and_lookup() local to VFS files and change all other
callers to try_lookup_noperm().  Note that the arguments are swapped.

Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250319031545.2999807-6-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-08 11:24:41 +02:00
Linus Torvalds
1fa753c7b5 Merge tag 'efi-next-for-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:

 - Decouple mixed mode startup code from the traditional x86
   decompressor

 - Revert zero-length file hack in efivarfs

 - Prevent EFI zboot from using the CopyMem/SetMem boot services after
   ExitBootServices()

 - Update EFI zboot to use the ZLIB/ZSTD library interfaces directly

* tag 'efi-next-for-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi/libstub: Avoid legacy decompressor zlib/zstd wrappers
  efi/libstub: Avoid CopyMem/SetMem EFI services after ExitBootServices
  efi: efibc: change kmalloc(size * count, ...) to kmalloc_array()
  efivarfs: Revert "allow creation of zero length files"
  x86/efi/mixed: Move mixed mode startup code into libstub
  x86/efi/mixed: Simplify and document thunking logic
  x86/efi/mixed: Remove dependency on legacy startup_32 code
  x86/efi/mixed: Set up 1:1 mapping of lower 4GiB in the stub
  x86/efi/mixed: Factor out and clean up long mode entry
  x86/efi/mixed: Check CPU compatibility without relying on verify_cpu()
  x86/efistub: Merge PE and handover entrypoints
2025-03-29 11:36:19 -07:00
James Bottomley
11092db5b5 efivarfs: fix NULL dereference on resume
LSMs often inspect the path.mnt of files in the security hooks, and this
causes a NULL deref in efivarfs_pm_notify() because the path is
constructed with a NULL path.mnt.

Fix by obtaining from vfs_kern_mount() instead, and being very careful
to ensure that deactivate_super() (potentially triggered by a racing
userspace umount) is not called directly from the notifier, because it
would deadlock when efivarfs_kill_sb() tried to unregister the notifier
chain.

[ Al notes:
	Umm...  That's probably safe, but not as a long-term solution -
	it's too intimately dependent upon fs/super.c internals. The
	reasons why you can't run into ->s_umount deadlock here are
	non-trivial... ]

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Link: https://lore.kernel.org/r/e54e6a2f-1178-4980-b771-4d9bafc2aa47@tnxip.de
Link: https://lore.kernel.org/r/3e998bf87638a442cbc6864cdcd3d8d9e08ce3e3.camel@HansenPartnership.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-03-18 08:46:08 +01:00
Ard Biesheuvel
dec1277875 efivarfs: use I_MUTEX_CHILD nested lock to traverse variables on resume
syzbot warns about a potential deadlock, but this is a false positive
resulting from a missing lockdep annotation: iterate_dir() locks the
parent whereas the inode_lock() it warns about locks the child, which is
guaranteed to be a different lock.

So use inode_lock_nested() instead with the appropriate lock class.

Reported-by: syzbot+019072ad24ab1d948228@syzkaller.appspotmail.com
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-03-17 18:19:02 +01:00
Ard Biesheuvel
b6b227e36b efivarfs: Revert "allow creation of zero length files"
As agreed with the fwupd/LVFS maintainer, this reverts commit
fc20737d8b again for the v6.15 cycle,
leaving them sufficient time to roll out a fix for the issue that the
reverted commit works around.

Link: https://lore.kernel.org/all/63837c36eceaf8cf2af7933dccca54ff4dd9f30d.camel@HansenPartnership.com/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-03-07 12:31:04 +01:00
James Bottomley
fc20737d8b efivarfs: allow creation of zero length files
Temporarily allow the creation of zero length files in efivarfs so the
'fwupd' user space firmware update tool can continue to operate. This
hack should be reverted as soon as the fwupd mechanisms for updating
firmware have been fixed.

fwupd has been coded to open a firmware file, close it, remove the
immutable bit and write to it.  Since commit 908af31f48 ("efivarfs:
fix error on write to new variable leaving remnants") this behaviour
results in the first close removing the file which causes the second
write to fail.  To allow fwupd to keep working code up an indicator of
size 1 if a write fails and only remove the file on that condition (so
create at zero size is allowed).

Tested-by: Richard Hughes <richard@hughsie.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
[ardb: replace LVFS with fwupd, as suggested by Richard]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-02-27 11:27:49 +01:00
Ard Biesheuvel
cb6ae457bc efivarfs: Defer PM notifier registration until .fill_super
syzbot reports an issue that turns out to be caused by the fact that the
efivarfs PM notifier may be invoked before the efivarfs_fs_info::sb
field is populated, resulting in a NULL deference.

So defer the registration until efivarfs_fill_super() is invoked.

Reported-by: syzbot+00d13e505ef530a45100@syzkaller.appspotmail.com
Tested-by: syzbot+00d13e505ef530a45100@syzkaller.appspotmail.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-02-23 17:30:05 +01:00
James Bottomley
b5d1e6ee76 efivarfs: add variable resync after hibernation
Hibernation allows other OSs to boot and thus the variable state might
be altered by the time the hibernation image is resumed.  Resync the
variable state by looping over all the dentries and update the size
(in case of alteration) delete any which no-longer exist.  Finally,
loop over all efi variables creating any which don't have
corresponding dentries.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
[ardb: - apply error pointer fixup from Dan Carpenter
       - rebase onto latest version of James's efivarfs rework]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-01-22 13:28:17 +01:00
James Bottomley
0e2f98da20 efivarfs: abstract initial variable creation routine
Reuse later for variable creation after hibernation

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-01-22 13:28:10 +01:00
Ard Biesheuvel
e7b4b1f61d Merge branch 'efivarfs' into next 2025-01-19 17:50:58 +01:00
James Bottomley
908af31f48 efivarfs: fix error on write to new variable leaving remnants
Make variable cleanup go through the fops release mechanism and use
zero inode size as the indicator to delete the file.  Since all EFI
variables must have an initial u32 attribute, zero size occurs either
because the update deleted the variable or because an unsuccessful
write after create caused the size never to be set in the first place.
In the case of multiple racing opens and closes, the open is counted
to ensure that the zero size check is done on the last close.

Even though this fixes the bug that a create either not followed by a
write or followed by a write that errored would leave a remnant file
for the variable, the file will appear momentarily globally visible
until the last close of the fd deletes it.  This is safe because the
normal filesystem operations will mediate any races; however, it is
still possible for a directory listing at that instant between create
and close contain a zero size variable that doesn't exist in the EFI
table.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-01-19 17:50:26 +01:00
James Bottomley
a58e954464 efivarfs: remove unused efivarfs_list
Remove all function helpers and mentions of the efivarfs_list now that
all consumers of the list have been removed and entry management goes
exclusively through the inode.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-01-19 17:50:26 +01:00
James Bottomley
fddca52766 efivarfs: move variable lifetime management into the inodes
Make the inodes the default management vehicle for struct
efivar_entry, so they are now all freed automatically if the file is
removed and on unmount in kill_litter_super().  Remove the now
superfluous iterator to free the entries after kill_litter_super().

Also fixes a bug where some entry freeing was missing causing efivarfs
to leak memory.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-01-19 17:50:26 +01:00
James Bottomley
8b4bc207f9 efivarfs: prevent setting of zero size on the inodes in the cache
Current efivarfs uses simple_setattr which allows the setting of any
size in the inode cache.  This is wrong because a zero size file is
used to indicate an "uncommitted" variable, so by simple means of
truncating the file (as root) any variable may be turned to look like
it's uncommitted.  Fix by adding an efivarfs_setattr routine which
does not allow updating of the cached inode size (which now only comes
from the underlying variable).

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-01-19 17:49:10 +01:00
James Bottomley
7e365c7e2c efivarfs: make variable_is_present use dcache lookup
Instead of searching the variable entry list for a variable, use the
dcache lookup functions to find it instead.  Also add an efivarfs_
prefix to the function now it is no longer static.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-01-09 15:29:08 +01:00
James Bottomley
1aba87f92d efivarfs: add helper to convert from UC16 name and GUID to utf8 name
These will be used by a later patch to check for uniqueness on initial
EFI variable iteration.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-01-09 15:29:08 +01:00
James Bottomley
c57b6e1d8a efivarfs: remove unused efi_variable.Attributes and efivar_entry.kobj
These fields look to be remnants of older code: Attributes was likely
meant to stash the variable attributes, but doesn't because we always
read them from the variable store and kobj was likely left over from
an older iteration of code where we manually created the objects
instead of using a filesystem.

[ ardb: these fields were used by the sysfs based 'efivars' precursor to
	efivarfs, which was removed in commit 0f5b2c69a4 ("efi: vars:
	Remove deprecated 'efivars' sysfs interface") ]

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-01-09 15:25:14 +01:00
Linus Torvalds
7031a38ab7 Merge tag 'efi-fixes-for-v6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:

 - Limit EFI zboot to GZIP and ZSTD before it comes in wider use

 - Fix inconsistent error when looking up a non-existent file in
   efivarfs with a name that does not adhere to the NAME-GUID format

 - Drop some unused code

* tag 'efi-fixes-for-v6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi/esrt: remove esre_attribute::store()
  efivarfs: Fix error on non-existent file
  efi/zboot: Limit compression options to GZIP and ZSTD
2024-12-15 15:33:41 -08:00
James Bottomley
2ab0837cb9 efivarfs: Fix error on non-existent file
When looking up a non-existent file, efivarfs returns -EINVAL if the
file does not conform to the NAME-GUID format and -ENOENT if it does.
This is caused by efivars_d_hash() returning -EINVAL if the name is not
formatted correctly.  This error is returned before simple_lookup()
returns a negative dentry, and is the error value that the user sees.

Fix by removing this check.  If the file does not exist, simple_lookup()
will return a negative dentry leading to -ENOENT and efivarfs_create()
already has a validity check before it creates an entry (and will
correctly return -EINVAL)

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: <stable@vger.kernel.org>
[ardb: make efivarfs_valid_name() static]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-12-09 10:00:04 +01:00
Peter Zijlstra
cdd30ebb1b module: Convert symbol namespace to string literal
Clean up the existing export namespace code along the same lines of
commit 33def8498f ("treewide: Convert macro and uses of __section(foo)
to __section("foo")") and for the same reason, it is not desired for the
namespace argument to be a macro expansion itself.

Scripted using

  git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS | while read file;
  do
    awk -i inplace '
      /^#define EXPORT_SYMBOL_NS/ {
        gsub(/__stringify\(ns\)/, "ns");
        print;
        next;
      }
      /^#define MODULE_IMPORT_NS/ {
        gsub(/__stringify\(ns\)/, "ns");
        print;
        next;
      }
      /MODULE_IMPORT_NS/ {
        $0 = gensub(/MODULE_IMPORT_NS\(([^)]*)\)/, "MODULE_IMPORT_NS(\"\\1\")", "g");
      }
      /EXPORT_SYMBOL_NS/ {
        if ($0 ~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+),/) {
  	if ($0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/ &&
  	    $0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(\)/ &&
  	    $0 !~ /^my/) {
  	  getline line;
  	  gsub(/[[:space:]]*\\$/, "");
  	  gsub(/[[:space:]]/, "", line);
  	  $0 = $0 " " line;
  	}

  	$0 = gensub(/(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/,
  		    "\\1(\\2, \"\\3\")", "g");
        }
      }
      { print }' $file;
  done

Requested-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc
Acked-by: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-02 11:34:44 -08:00
Al Viro
cb787f4ac0 [tree-wide] finally take no_llseek out
no_llseek had been defined to NULL two years ago, in commit 868941b144
("fs: remove no_llseek")

To quote that commit,

  At -rc1 we'll need do a mechanical removal of no_llseek -

  git grep -l -w no_llseek | grep -v porting.rst | while read i; do
	sed -i '/\<no_llseek\>/d' $i
  done

  would do it.

Unfortunately, that hadn't been done.  Linus, could you do that now, so
that we could finally put that thing to rest? All instances are of the
form
	.llseek = no_llseek,
so it's obviously safe.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-27 08:18:43 -07:00
Eric Sandeen
dcffad38c7 efivarfs: Convert to new uid/gid option parsing helpers
Convert to new uid/gid option parsing helpers

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Link: https://lore.kernel.org/r/ff7c415c-43d4-4a1e-a987-246c781f8850@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-02 06:21:18 +02:00
Tim Schumacher
cda30c6542 efi: Clear up misconceptions about a maximum variable name size
The UEFI specification does not make any mention of a maximum variable
name size, so the headers and implementation shouldn't claim that one
exists either.

Comments referring to this limit have been removed or rewritten, as this
is an implementation detail local to the Linux kernel.

Where appropriate, the magic value of 1024 has been replaced with
EFI_VAR_NAME_LEN, as this is used for the efi_variable struct
definition. This in itself does not change any behavior, but should
serve as points of interest when making future changes in the same area.

A related build-time check has been added to ensure that the special
512 byte sized buffer will not overflow with a potentially decreased
EFI_VAR_NAME_LEN.

Signed-off-by: Tim Schumacher <timschumi@gmx.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-04-13 10:33:02 +02:00
Tim Schumacher
89ea21d70d efivarfs: Remove unused internal struct members
The structure was moved to the efivarfs internals in commit 2d82e6227e
("efi: vars: Move efivar caching layer into efivarfs") after previously
being used as the data ABI for efivars until its removal in commit
0f5b2c69a4 ("efi: vars: Remove deprecated 'efivars' sysfs interface").

As efivarfs only uses the structure for the variable name caching layer,
the data-related members were never in use. Remove them to avoid
implying that efivarfs is bound by the same restrictions that efivars
once had. While at it, remove the packed attribute, since we no longer
have to guarantee a stable layout.

Signed-off-by: Tim Schumacher <timschumi@gmx.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-04-13 10:33:02 +02:00
Ard Biesheuvel
2ce507f57b efivarfs: Drop 'duplicates' bool parameter on efivar_init()
The 'duplicates' bool argument is always true when efivar_init() is
called from its only caller so let's just drop it instead.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-02-25 09:43:39 +01:00
Ard Biesheuvel
9ca01c7adf efivarfs: Drop redundant cleanup on fill_super() failure
Al points out that kill_sb() will be called if efivarfs_fill_super()
fails and so there is no point in cleaning up the efivar entry list.

Reported-by: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-02-25 09:43:38 +01:00
Tim Schumacher
f45812cc23 efivarfs: Request at most 512 bytes for variable names
Work around a quirk in a few old (2011-ish) UEFI implementations, where
a call to `GetNextVariableName` with a buffer size larger than 512 bytes
will always return EFI_INVALID_PARAMETER.

There is some lore around EFI variable names being up to 1024 bytes in
size, but this has no basis in the UEFI specification, and the upper
bounds are typically platform specific, and apply to the entire variable
(name plus payload).

Given that Linux does not permit creating files with names longer than
NAME_MAX (255) bytes, 512 bytes (== 256 UTF-16 characters) is a
reasonable limit.

Cc: <stable@vger.kernel.org> # 6.1+
Signed-off-by: Tim Schumacher <timschumi@gmx.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-02-25 09:42:24 +01:00
Masahisa Kojima
94f7f6182c efivarfs: automatically update super block flag
efivar operation is updated when the tee_stmm_efi module is probed.
tee_stmm_efi module supports SetVariable runtime service, but user needs
to manually remount the efivarfs as RW to enable the write access if the
previous efivar operation does not support SetVariable and efivarfs is
mounted as read-only.

This commit notifies the update of efivar operation to efivarfs
subsystem, then drops SB_RDONLY flag if the efivar operation supports
SetVariable.

Signed-off-by: Masahisa Kojima <masahisa.kojima@linaro.org>
[ardb: use per-superblock instance of the notifier block]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-12-11 11:19:18 +01:00
Ard Biesheuvel
cdb46a8aef efivarfs: Move efivarfs list into superblock s_fs_info
syzbot reports issues with concurrent fsopen()/fsconfig() invocations on
efivarfs, which are the result of the fact that the efivarfs list (which
caches the names and GUIDs of existing EFI variables) is a global
structure. In normal use, these issues are unlikely to trigger, even in
the presence of multiple mounts of efivarfs, but the execution pattern
used by the syzkaller reproducer may result in multiple instances of the
superblock that share the global efivarfs list, and this causes list
corruption when the list is reinitialized by one user while another is
traversing it.

So let's move the list head into the superblock s_fs_info field, so that
it will never be shared between distinct instances of the superblock. In
the common case, there will still be a single instance of this list, but
in the artificial syzkaller case, no list corruption can occur any
longer.

Reported-by: syzbot+1902c359bfcaf39c46f2@syzkaller.appspotmail.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-12-11 11:19:18 +01:00
Ard Biesheuvel
547713d502 efivarfs: Free s_fs_info on unmount
Now that we allocate a s_fs_info struct on fs context creation, we
should ensure that we free it again when the superblock goes away.

Fixes: 5329aa5101 ("efivarfs: Add uid/gid mount options")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-12-11 11:19:17 +01:00
Ard Biesheuvel
d28076ddda efivarfs: Move efivar availability check into FS context init
Instead of checking whether or not EFI variables are available when
creating the superblock, check it one step earlier, when initializing
the FS context for the mount. This way, no FS context will be created at
all, and we can drop the second check at .kill_sb() time entirely.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-12-11 11:19:17 +01:00