Commit Graph

246 Commits

Author SHA1 Message Date
Christoph Hellwig
85c871a02b fs: add support for non-blocking timestamp updates
Currently file_update_time_flags unconditionally returns -EAGAIN if any
timestamp needs to be updated and IOCB_NOWAIT is passed.  This makes
non-blocking direct writes impossible on file systems with granular
enough timestamps.

Pass IOCB_NOWAIT to ->update_time and return -EAGAIN if it could block.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260108141934.2052404-9-hch@lst.de
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12 14:01:33 +01:00
Christoph Hellwig
761475268f fs: refactor ->update_time handling
Pass the type of update (atime vs c/mtime plus version) as an enum
instead of a set of flags that caused all kinds of confusion.
Because inode_update_timestamps now can't return a modified version
of those flags, return the I_DIRTY_* flags needed to persist the
update, which is what the main caller in generic_update_time wants
anyway, and which is suitable for the other callers that only want
to know if an update happened.

The whole update_time path keeps the flags argument, which will be used
to support non-blocking updates soon even if it is unused, and (the
slightly renamed) inode_update_time also gains the possibility to return
a negative errno to support this.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260108141934.2052404-6-hch@lst.de
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12 14:01:32 +01:00
Linus Torvalds
d0deeb803c Merge tag 'vfs-6.19-rc1.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull overlayfs cred guard conversion from Christian Brauner:
 "This converts all of overlayfs to use credential guards, eliminating
  manual credential management throughout the filesystem.

  Credential guard conversion:

   - Convert all of overlayfs to use credential guards, replacing the
     manual ovl_override_creds()/ovl_revert_creds() pattern with scoped
     guards.

     This makes credential handling visually explicit and eliminates a
     class of potential bugs from mismatched override/revert calls.

     (1) Basic credential guard (with_ovl_creds)
     (2) Creator credential guard (ovl_override_creator_creds):

         Introduced a specialized guard for file creation operations
         that handles the two-phase credential override (mounter
         credentials, then fs{g,u}id override). The new pattern is much
         clearer:

         with_ovl_creds(dentry->d_sb) {
                 scoped_class(prepare_creds_ovl, cred, dentry, inode, mode) {
                         if (IS_ERR(cred))
                                 return PTR_ERR(cred);
                         /* creation operations */
                 }
         }

     (3) Copy-up credential guard (ovl_cu_creds):

         Introduced a specialized guard for copy-up operations,
         simplifying the previous struct ovl_cu_creds helper and
         associated functions.

         Ported ovl_copy_up_workdir() and ovl_copy_up_tmpfile() to this
         pattern.

  Cleanups:

   - Remove ovl_revert_creds() after all callers converted to guards

   - Remove struct ovl_cu_creds and associated functions

   - Drop ovl_setup_cred_for_create() after conversion

   - Refactor ovl_fill_super(), ovl_lookup(), ovl_iterate(),
     ovl_rename() for cleaner credential guard scope

   - Introduce struct ovl_renamedata to simplify rename handling

   - Don't override credentials for ovl_check_whiteouts() (unnecessary)

   - Remove unneeded semicolon"

* tag 'vfs-6.19-rc1.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (54 commits)
  ovl: remove unneeded semicolon
  ovl: remove struct ovl_cu_creds and associated functions
  ovl: port ovl_copy_up_tmpfile() to cred guard
  ovl: mark *_cu_creds() as unused temporarily
  ovl: port ovl_copy_up_workdir() to cred guard
  ovl: add copy up credential guard
  ovl: drop ovl_setup_cred_for_create()
  ovl: port ovl_create_or_link() to new ovl_override_creator_creds cleanup guard
  ovl: mark ovl_setup_cred_for_create() as unused temporarily
  ovl: reflow ovl_create_or_link()
  ovl: port ovl_create_tmpfile() to new ovl_override_creator_creds cleanup guard
  ovl: add ovl_override_creator_creds cred guard
  ovl: remove ovl_revert_creds()
  ovl: port ovl_fill_super() to cred guard
  ovl: refactor ovl_fill_super()
  ovl: port ovl_lower_positive() to cred guard
  ovl: port ovl_lookup() to cred guard
  ovl: refactor ovl_lookup()
  ovl: port ovl_copyfile() to cred guard
  ovl: port ovl_rename() to cred guard
  ...
2025-12-01 16:31:21 -08:00
Linus Torvalds
9368f0f941 Merge tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs inode updates from Christian Brauner:
 "Features:

   - Hide inode->i_state behind accessors. Open-coded accesses prevent
     asserting they are done correctly. One obvious aspect is locking,
     but significantly more can be checked. For example it can be
     detected when the code is clearing flags which are already missing,
     or is setting flags when it is illegal (e.g., I_FREEING when
     ->i_count > 0)

   - Provide accessors for ->i_state, converts all filesystems using
     coccinelle and manual conversions (btrfs, ceph, smb, f2fs, gfs2,
     overlayfs, nilfs2, xfs), and makes plain ->i_state access fail to
     compile

   - Rework I_NEW handling to operate without fences, simplifying the
     code after the accessor infrastructure is in place

  Cleanups:

   - Move wait_on_inode() from writeback.h to fs.h

   - Spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb
     for clarity

   - Cosmetic fixes to LRU handling

   - Push list presence check into inode_io_list_del()

   - Touch up predicts in __d_lookup_rcu()

   - ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage

   - Assert on ->i_count in iput_final()

   - Assert ->i_lock held in __iget()

  Fixes:

   - Add missing fences to I_NEW handling"

* tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits)
  dcache: touch up predicts in __d_lookup_rcu()
  fs: push list presence check into inode_io_list_del()
  fs: cosmetic fixes to lru handling
  fs: rework I_NEW handling to operate without fences
  fs: make plain ->i_state access fail to compile
  xfs: use the new ->i_state accessors
  nilfs2: use the new ->i_state accessors
  overlayfs: use the new ->i_state accessors
  gfs2: use the new ->i_state accessors
  f2fs: use the new ->i_state accessors
  smb: use the new ->i_state accessors
  ceph: use the new ->i_state accessors
  btrfs: use the new ->i_state accessors
  Manual conversion to use ->i_state accessors of all places not covered by coccinelle
  Coccinelle-based conversion to use ->i_state accessors
  fs: provide accessors for ->i_state
  fs: spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb
  fs: move wait_on_inode() from writeback.h to fs.h
  fs: add missing fences to I_NEW handling
  ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage
  ...
2025-12-01 09:02:34 -08:00
Christian Brauner
4975e683c2 ovl: port ovl_fileattr_get() to cred guard
Use the scoped ovl cred guard.

Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-v4-21-b31603935724@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-19 21:58:22 +01:00
Christian Brauner
af1d5d62f3 ovl: port ovl_fileattr_set() to cred guard
Use the scoped ovl cred guard.

Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-v4-20-b31603935724@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-19 21:58:22 +01:00
Christian Brauner
a3860a808f ovl: port ovl_fiemap() to cred guard
Use the scoped ovl cred guard.

Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-v4-19-b31603935724@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-19 21:58:22 +01:00
Christian Brauner
8e9698d6e4 ovl: port ovl_set_or_remove_acl() to cred guard
Use the scoped ovl cred guard.

Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-v4-18-b31603935724@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-19 21:58:22 +01:00
Christian Brauner
71ac28fbcd ovl: port do_ovl_get_acl() to cred guard
Use the scoped ovl cred guard.

Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-v4-17-b31603935724@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-19 21:58:22 +01:00
Christian Brauner
47eba7f7fd ovl: port ovl_get_link() to cred guard
Use the scoped ovl cred guard.

Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-v4-16-b31603935724@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-19 21:58:22 +01:00
Christian Brauner
d81999b40b ovl: port ovl_permission() to cred guard
Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-v4-15-b31603935724@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-19 21:58:21 +01:00
Christian Brauner
81707ae827 ovl: port ovl_getattr() to cred guard
Use the scoped ovl cred guard.

Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-v4-14-b31603935724@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-19 21:58:21 +01:00
Christian Brauner
7aedfa5a52 ovl: port ovl_setattr() to cred guard
Use the scoped ovl cred guard.

Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-v4-13-b31603935724@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-19 21:58:21 +01:00
Mateusz Guzik
ff175a4fc2 overlayfs: use the new ->i_state accessors
Change generated with coccinelle and fixed up by hand as appropriate.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-20 20:22:27 +02:00
Andrey Albershteyn
4dd5b5ac08 Revert "fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP"
This reverts commit 474b155adf.

This patch caused regression in ioctl_setflags(). Underlying filesystems
use EOPNOTSUPP to indicate that flag is not supported. This error is
also gets converted in ioctl_setflags(). Therefore, for unsupported
flags error changed from EOPNOSUPP to ENOIOCTLCMD.

Link: https://lore.kernel.org/linux-xfs/a622643f-1585-40b0-9441-cf7ece176e83@kernel.org/
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-10 13:44:03 +02:00
André Almeida
f9377faaea ovl: Add S_CASEFOLD as part of the inode flag to be copied
To keep ovl's inodes consistent with their real inodes, create a new
mask for inode file attributes that needs to be copied.  Add the
S_CASEFOLD flag as part of the flags that need to be copied along with
the other file attributes.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2025-09-23 12:29:36 +02:00
Christian Brauner
ca115d7e75 tree-wide: s/struct fileattr/struct file_kattr/g
Now that we expose struct file_attr as our uapi struct rename all the
internal struct to struct file_kattr to clearly communicate that it is a
kernel internal struct. This is similar to struct mount_{k}attr and
others.

Link: https://lore.kernel.org/20250703-restlaufzeit-baurecht-9ed44552b481@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-04 16:14:39 +02:00
Andrey Albershteyn
474b155adf fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP
Future patches will add new syscalls which use these functions. As
this interface won't be used for ioctls only, the EOPNOSUPP is more
appropriate return code.

This patch converts return code from ENOIOCTLCMD to EOPNOSUPP for
vfs_fileattr_get and vfs_fileattr_set. To save old behavior translate
EOPNOSUPP back for current users - overlayfs, encryptfs and fs/ioctl.c.

Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Link: https://lore.kernel.org/20250630-xattrat-syscall-v6-4-c4e3bc35227b@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-02 14:29:10 +02:00
Linus Torvalds
e7675238b9 Merge tag 'ovl-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs updates from Amir Goldstein:

 - Fix a syzbot reported NULL pointer deref with bfs lower layers

 - Fix a copy up failure of large file from lower fuse fs

 - Followup cleanup of backing_file API from Miklos

 - Introduction and use of revert/override_creds_light() helpers, that
   were suggested by Christian as a mitigation to cache line bouncing
   and false sharing of fields in overlayfs creator_cred long lived
   struct cred copy.

 - Store up to two backing file references (upper and lower) in an
   ovl_file container instead of storing a single backing file in
   file->private_data.

   This is used to avoid the practice of opening a short lived backing
   file for the duration of some file operations and to avoid the
   specialized use of FDPUT_FPUT in such occasions, that was getting in
   the way of Al's fd_file() conversions.

* tag 'ovl-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
  ovl: Filter invalid inodes with missing lookup function
  ovl: convert ovl_real_fdget() callers to ovl_real_file()
  ovl: convert ovl_real_fdget_path() callers to ovl_real_file_path()
  ovl: store upper real file in ovl_file struct
  ovl: allocate a container struct ovl_file for ovl private context
  ovl: do not open non-data lower file for fsync
  ovl: Optimize override/revert creds
  ovl: pass an explicit reference of creators creds to callers
  ovl: use wrapper ovl_revert_creds()
  fs/backing-file: Convert to revert/override_creds_light()
  cred: Add a light version of override/revert_creds()
  backing-file: clean up the API
  ovl: properly handle large files in ovl_security_fileattr
2024-11-22 20:55:42 -08:00
Stefan Berger
95f567f81e fs: Simplify getattr interface function checking AT_GETATTR_NOSEC flag
Commit 8a924db2d7 ("fs: Pass AT_GETATTR_NOSEC flag to getattr interface
function")' introduced the AT_GETATTR_NOSEC flag to ensure that the
call paths only call vfs_getattr_nosec if it is set instead of vfs_getattr.
Now, simplify the getattr interface functions of filesystems where the flag
AT_GETATTR_NOSEC is checked.

There is only a single caller of inode_operations getattr function and it
is located in fs/stat.c in vfs_getattr_nosec. The caller there is the only
one from which the AT_GETATTR_NOSEC flag is passed from.

Two filesystems are checking this flag in .getattr and the flag is always
passed to them unconditionally from only vfs_getattr_nosec:

- ecryptfs:  Simplify by always calling vfs_getattr_nosec in
             ecryptfs_getattr. From there the flag is passed to no other
             function and this function is not called otherwise.

- overlayfs: Simplify by always calling vfs_getattr_nosec in
             ovl_getattr. From there the flag is passed to no other
             function and this function is not called otherwise.

The query_flags in vfs_getattr_nosec will mask-out AT_GETATTR_NOSEC from
any caller using AT_STATX_SYNC_TYPE as mask so that the flag is not
important inside this function. Also, since no filesystem is checking the
flag anymore, remove the flag entirely now, including the BUG_ON check that
never triggered.

The net change of the changes here combined with the original commit is
that ecryptfs and overlayfs do not call vfs_getattr but only
vfs_getattr_nosec.

Fixes: 8a924db2d7 ("fs: Pass AT_GETATTR_NOSEC flag to getattr interface function")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Closes: https://lore.kernel.org/linux-fsdevel/20241101011724.GN1350452@ZenIV/T/#u
Cc: Tyler Hicks <code@tyhicks.com>
Cc: ecryptfs@vger.kernel.org
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: linux-unionfs@vger.kernel.org
Cc: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-13 11:46:29 -05:00
Vinicius Costa Gomes
fc5a1d2287 ovl: use wrapper ovl_revert_creds()
Introduce ovl_revert_creds() wrapper of revert_creds() to
match callers of ovl_override_creds().

Suggested-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2024-11-11 10:45:04 +01:00
Oleksandr Tymoshenko
3b6b99ef15 ovl: properly handle large files in ovl_security_fileattr
dentry_open in ovl_security_fileattr fails for any file
larger than 2GB if open method of the underlying filesystem
calls generic_file_open (e.g. fusefs).

The issue can be reproduce using the following script:
(passthrough_ll is an example app from libfuse).

  $ D=/opt/test/mnt
  $ mkdir -p ${D}/{source,base,top/uppr,top/work,ovlfs}
  $ dd if=/dev/zero of=${D}/source/zero.bin bs=1G count=2
  $ passthrough_ll -o source=${D}/source ${D}/base
  $ mount -t overlay overlay \
      -olowerdir=${D}/base,upperdir=${D}/top/uppr,workdir=${D}/top/work \
      ${D}/ovlfs
  $ chmod 0777 ${D}/mnt/ovlfs/zero.bin

Running this script results in "Value too large for defined data type"
error message from chmod.

Signed-off-by: Oleksandr Tymoshenko <ovt@google.com>
Fixes: 72db82115d ("ovl: copy up sync/noatime fileattr flags")
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2024-11-11 10:44:36 +01:00
Thorsten Blum
e9229c18da ovl: remove duplicate included header
Remove duplicate included header file linux/posix_acl.h

Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-05-10 13:22:46 +02:00
Stefan Berger
8a924db2d7 fs: Pass AT_GETATTR_NOSEC flag to getattr interface function
When vfs_getattr_nosec() calls a filesystem's getattr interface function
then the 'nosec' should propagate into this function so that
vfs_getattr_nosec() can again be called from the filesystem's gettattr
rather than vfs_getattr(). The latter would add unnecessary security
checks that the initial vfs_getattr_nosec() call wanted to avoid.
Therefore, introduce the getattr flag GETATTR_NOSEC and allow to pass
with the new getattr_flags parameter to the getattr interface function.
In overlayfs and ecryptfs use this flag to determine which one of the
two functions to call.

In a recent code change introduced to IMA vfs_getattr_nosec() ended up
calling vfs_getattr() in overlayfs, which in turn called
security_inode_getattr() on an exiting process that did not have
current->fs set anymore, which then caused a kernel NULL pointer
dereference. With this change the call to security_inode_getattr() can
be avoided, thus avoiding the NULL pointer dereference.

Reported-by: <syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com>
Fixes: db1d1e8b98 ("IMA: use vfs_getattr_nosec to get the i_version")
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Tyler Hicks <code@tyhicks.com>
Cc: Mimi Zohar <zohar@linux.ibm.com>
Suggested-by: Christian Brauner <brauner@kernel.org>
Co-developed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Link: https://lore.kernel.org/r/20231002125733.1251467-1-stefanb@linux.vnet.ibm.com
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-11-18 14:54:07 +01:00
Amir Goldstein
420a62dde6 ovl: Move xattr support to new xattrs.c file
This moves the code from super.c and inode.c, and makes ovl_xattr_get/set()
static.

This is in preparation for doing more work on xattrs support.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-10-31 00:12:59 +02:00
Amir Goldstein
162d064440 ovl: reorder ovl_want_write() after ovl_inode_lock()
Make the locking order of ovl_inode_lock() strictly between the two
vfs stacked layers, i.e.:
- ovl vfs locks: sb_writers, inode_lock, ...
- ovl_inode_lock
- upper vfs locks: sb_writers, inode_lock, ...

To that effect, move ovl_want_write() into the helpers ovl_nlink_start()
and ovl_copy_up_start which currently take the ovl_inode_lock() after
ovl_want_write().

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-10-31 00:12:57 +02:00
Jeff Layton
4ddbd0f1fe overlayfs: convert to new timestamp accessors
Convert to using the new inode timestamp accessor functions.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20231004185347.80880-58-jlayton@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-18 14:08:25 +02:00
Linus Torvalds
63580f669d Merge tag 'ovl-update-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs updates from Amir Goldstein:

 - add verification feature needed by composefs (Alexander Larsson)

 - improve integration of overlayfs and fanotify (Amir Goldstein)

 - fortify some overlayfs code (Andrea Righi)

* tag 'ovl-update-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
  ovl: validate superblock in OVL_FS()
  ovl: make consistent use of OVL_FS()
  ovl: Kconfig: introduce CONFIG_OVERLAY_FS_DEBUG
  ovl: auto generate uuid for new overlay filesystems
  ovl: store persistent uuid/fsid with uuid=on
  ovl: add support for unique fsid per instance
  ovl: support encoding non-decodable file handles
  ovl: Handle verity during copy-up
  ovl: Validate verity xattr when resolving lowerdata
  ovl: Add versioned header for overlay.metacopy xattr
  ovl: Add framework for verity support
2023-08-30 11:54:09 -07:00
Andrea Righi
f01d08899f ovl: make consistent use of OVL_FS()
Always use OVL_FS() to retrieve the corresponding struct ovl_fs from a
struct super_block.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-08-12 19:02:54 +03:00
Amir Goldstein
16aac5ad1f ovl: support encoding non-decodable file handles
When all layers support file handles, we support encoding non-decodable
file handles (a.k.a. fid) even with nfs_export=off.

When file handles do not need to be decoded, we do not need to copy up
redirected lower directories on encode, and we encode also non-indexed
upper with lower file handle, so fid will not change on copy up.

This enables reporting fanotify events with file handles on overlayfs
with default config/mount options.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-08-12 19:02:47 +03:00
Jeff Layton
913e99287b fs: drop the timespec64 argument from update_time
Now that all of the update_time operations are prepared for it, we can
drop the timespec64 argument from the update_time operation. Do that and
remove it from some associated functions like inode_update_time and
inode_needs_update_time.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230807-mgctime-v7-8-d1dec143a704@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-11 09:04:57 +02:00
Amir Goldstein
dcb399de1e ovl: pass ovl_fs to xino helpers
Internal ovl methods should use ovl_fs and not sb as much as
possible.

Use a constant_table to translate from enum xino mode to string
in preperation for new mount api option parsing.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-06-19 14:02:00 +03:00
Amir Goldstein
4166564478 ovl: prepare for lazy lookup of lowerdata inode
Make the code handle the case of numlower > 1 and missing lowerdata
dentry gracefully.

Missing lowerdata dentry is an indication for lazy lookup of lowerdata
and in that case the lowerdata_redirect path is stored in ovl_inode.

Following commits will defer lookup and perform the lazy lookup on
access.

Reviewed-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19 14:01:14 +03:00
Amir Goldstein
2b21da9208 ovl: prepare to store lowerdata redirect for lazy lowerdata lookup
Prepare to allow ovl_lookup() to leave the last entry in a non-dir
lowerstack empty to signify lazy lowerdata lookup.

In this case, ovl_lookup() stores the redirect path from metacopy to
lowerdata in ovl_inode, which is going to be used later to perform the
lazy lowerdata lookup.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19 14:01:14 +03:00
Amir Goldstein
ab1eb5ffb7 ovl: deduplicate lowerdata and lowerstack[]
The ovl_inode contains a copy of lowerdata in lowerstack[], so the
lowerdata inode member can be removed.

Use accessors ovl_lowerdata*() to get the lowerdata whereever the member
was accessed directly.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19 14:01:13 +03:00
Amir Goldstein
ac900ed4f2 ovl: deduplicate lowerpath and lowerstack[]
The ovl_inode contains a copy of lowerpath in lowerstack[0], so the
lowerpath member can be removed.

Use accessor ovl_lowerpath() to get the lowerpath whereever the member
was accessed directly.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19 14:01:13 +03:00
Amir Goldstein
0af950f57f ovl: move ovl_entry into ovl_inode
The lower stacks of all the ovl inode aliases should be identical
and there is redundant information in ovl_entry and ovl_inode.

Move lowerstack into ovl_inode and keep only the OVL_E_FLAGS
per overlay dentry.

Following patches will deduplicate redundant ovl_inode fields.

Note that for pure upper and negative dentries, OVL_E(dentry) may be
NULL now, so it is imporatnt to use the ovl_numlower() accessor.

Reviewed-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19 14:01:13 +03:00
Zhihao Cheng
f4e19e595c ovl: fix null pointer dereference in ovl_get_acl_rcu()
Following process:
         P1                     P2
 path_openat
  link_path_walk
   may_lookup
    inode_permission(rcu)
     ovl_permission
      acl_permission_check
       check_acl
        get_cached_acl_rcu
	 ovl_get_inode_acl
	  realinode = ovl_inode_real(ovl_inode)
	                      drop_cache
		               __dentry_kill(ovl_dentry)
				iput(ovl_inode)
		                 ovl_destroy_inode(ovl_inode)
		                  dput(oi->__upperdentry)
		                   dentry_kill(upperdentry)
		                    dentry_unlink_inode
				     upperdentry->d_inode = NULL
	    ovl_inode_upper
	     upperdentry = ovl_i_dentry_upper(ovl_inode)
	     d_inode(upperdentry) // returns NULL
	  IS_POSIXACL(realinode) // NULL pointer dereference
, will trigger an null pointer dereference at realinode:
  [  205.472797] BUG: kernel NULL pointer dereference, address:
                 0000000000000028
  [  205.476701] CPU: 2 PID: 2713 Comm: ls Not tainted
                 6.3.0-12064-g2edfa098e750-dirty #1216
  [  205.478754] RIP: 0010:do_ovl_get_acl+0x5d/0x300
  [  205.489584] Call Trace:
  [  205.489812]  <TASK>
  [  205.490014]  ovl_get_inode_acl+0x26/0x30
  [  205.490466]  get_cached_acl_rcu+0x61/0xa0
  [  205.490908]  generic_permission+0x1bf/0x4e0
  [  205.491447]  ovl_permission+0x79/0x1b0
  [  205.491917]  inode_permission+0x15e/0x2c0
  [  205.492425]  link_path_walk+0x115/0x550
  [  205.493311]  path_lookupat.isra.0+0xb2/0x200
  [  205.493803]  filename_lookup+0xda/0x240
  [  205.495747]  vfs_fstatat+0x7b/0xb0

Fetch a reproducer in [Link].

Use the helper ovl_i_path_realinode() to get realinode and then do
non-nullptr checking.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217404
Fixes: 332f606b32 ("ovl: enable RCU'd ->get_acl()")
Cc: <stable@vger.kernel.org> # v5.15
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Suggested-by: Christian Brauner <brauner@kernel.org>
Suggested-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19 14:01:12 +03:00
Zhihao Cheng
1a73f5b8f0 ovl: fix null pointer dereference in ovl_permission()
Following process:
          P1                     P2
 path_lookupat
  link_path_walk
   inode_permission
    ovl_permission
      ovl_i_path_real(inode, &realpath)
        path->dentry = ovl_i_dentry_upper(inode)
                          drop_cache
			   __dentry_kill(ovl_dentry)
		            iput(ovl_inode)
		             ovl_destroy_inode(ovl_inode)
		              dput(oi->__upperdentry)
		               dentry_kill(upperdentry)
		                dentry_unlink_inode
				 upperdentry->d_inode = NULL
      realinode = d_inode(realpath.dentry) // return NULL
      inode_permission(realinode)
       inode->i_sb  // NULL pointer dereference
, will trigger an null pointer dereference at realinode:
  [  335.664979] BUG: kernel NULL pointer dereference,
                 address: 0000000000000002
  [  335.668032] CPU: 0 PID: 2592 Comm: ls Not tainted 6.3.0
  [  335.669956] RIP: 0010:inode_permission+0x33/0x2c0
  [  335.678939] Call Trace:
  [  335.679165]  <TASK>
  [  335.679371]  ovl_permission+0xde/0x320
  [  335.679723]  inode_permission+0x15e/0x2c0
  [  335.680090]  link_path_walk+0x115/0x550
  [  335.680771]  path_lookupat.isra.0+0xb2/0x200
  [  335.681170]  filename_lookup+0xda/0x240
  [  335.681922]  vfs_statx+0xa6/0x1f0
  [  335.682233]  vfs_fstatat+0x7b/0xb0

Fetch a reproducer in [Link].

Use the helper ovl_i_path_realinode() to get realinode and then do
non-nullptr checking.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217405
Fixes: 4b7791b2e9 ("ovl: handle idmappings in ovl_permission()")
Cc: <stable@vger.kernel.org> # v5.19
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Suggested-by: Christian Brauner <brauner@kernel.org>
Suggested-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19 14:01:12 +03:00
Christian Brauner
4d7ca40901 fs: port vfs{g,u}id helpers to mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:30 +01:00
Christian Brauner
9452e93e6d fs: port privilege checking helpers to mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:29 +01:00
Christian Brauner
01beba7957 fs: port inode_owner_or_capable() to mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:29 +01:00
Christian Brauner
4609e1f18e fs: port ->permission() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:28 +01:00
Christian Brauner
8782a9aea3 fs: port ->fileattr_set() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:27 +01:00
Christian Brauner
13e83a4923 fs: port ->set_acl() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:27 +01:00
Christian Brauner
7743532277 fs: port ->get_acl() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:27 +01:00
Christian Brauner
b74d24f7a7 fs: port ->getattr() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:25 +01:00
Christian Brauner
c1632a0f11 fs: port ->setattr() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:02 +01:00
Christian Brauner
5b52aebef8 ovl: call posix_acl_release() after error checking
The current placement of posix_acl_release() in ovl_set_or_remove_acl()
means it can be called on an error pointer instead of actual acls.
Fix this by moving the posix_acl_release() call after the error handling.

Fixes: 0e64185732 ("ovl: implement set acl method") # mainline only
Reported-by: syzbot+3f6ef1c4586bb6fd1f61@syzkaller.appspotmail.com
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-11-03 08:18:46 +01:00
Christian Brauner
31acceb975 ovl: use posix acl api
Now that posix acls have a proper api us it to copy them.

All filesystems that can serve as lower or upper layers for overlayfs
have gained support for the new posix acl api in previous patches.
So switch all internal overlayfs codepaths for copying posix acls to the
new posix acl api.

Acked-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-10-20 10:13:31 +02:00