Unverified Commit 88d5baf6 authored by NeilBrown's avatar NeilBrown Committed by Christian Brauner
Browse files

Change inode_operations.mkdir to return struct dentry *



Some filesystems, such as NFS, cifs, ceph, and fuse, do not have
complete control of sequencing on the actual filesystem (e.g.  on a
different server) and may find that the inode created for a mkdir
request already exists in the icache and dcache by the time the mkdir
request returns.  For example, if the filesystem is mounted twice the
directory could be visible on the other mount before it is on the
original mount, and a pair of name_to_handle_at(), open_by_handle_at()
calls could instantiate the directory inode with an IS_ROOT() dentry
before the first mkdir returns.

This means that the dentry passed to ->mkdir() may not be the one that
is associated with the inode after the ->mkdir() completes.  Some
callers need to interact with the inode after the ->mkdir completes and
they currently need to perform a lookup in the (rare) case that the
dentry is no longer hashed.

This lookup-after-mkdir requires that the directory remains locked to
avoid races.  Planned future patches to lock the dentry rather than the
directory will mean that this lookup cannot be performed atomically with
the mkdir.

To remove this barrier, this patch changes ->mkdir to return the
resulting dentry if it is different from the one passed in.
Possible returns are:
  NULL - the directory was created and no other dentry was used
  ERR_PTR() - an error occurred
  non-NULL - this other dentry was spliced in

This patch only changes file-systems to return "ERR_PTR(err)" instead of
"err" or equivalent transformations.  Subsequent patches will make
further changes to some file-systems to return a correct dentry.

Not all filesystems reliably result in a positive hashed dentry:

- NFS, cifs, hostfs will sometimes need to perform a lookup of
  the name to get inode information.  Races could result in this
  returning something different. Note that this lookup is
  non-atomic which is what we are trying to avoid.  Placing the
  lookup in filesystem code means it only happens when the filesystem
  has no other option.
- kernfs and tracefs leave the dentry negative and the ->revalidate
  operation ensures that lookup will be called to correctly populate
  the dentry.  This could be fixed but I don't think it is important
  to any of the users of vfs_mkdir() which look at the dentry.

The recommendation to use
    d_drop();d_splice_alias()
is ugly but fits with current practice.  A planned future patch will
change this.

Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
Reviewed-by: default avatarJan Kara <jack@suse.cz>
Signed-off-by: default avatarNeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250227013949.536172-2-neilb@suse.de


Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
parent 71628584
Loading
Loading
Loading
Loading
+1 −1
Original line number Diff line number Diff line
@@ -66,7 +66,7 @@ prototypes::
	int (*link) (struct dentry *,struct inode *,struct dentry *);
	int (*unlink) (struct inode *,struct dentry *);
	int (*symlink) (struct mnt_idmap *, struct inode *,struct dentry *,const char *);
	int (*mkdir) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t);
	struct dentry *(*mkdir) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t);
	int (*rmdir) (struct inode *,struct dentry *);
	int (*mknod) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t,dev_t);
	int (*rename) (struct mnt_idmap *, struct inode *, struct dentry *,
+19 −0
Original line number Diff line number Diff line
@@ -1178,3 +1178,22 @@ these conditions don't require explicit checks:

LOOKUP_EXCL now means "target must not exist".  It can be combined with
LOOK_CREATE or LOOKUP_RENAME_TARGET.

---

** mandatory**

->mkdir() now returns a 'struct dentry *'.  If the created inode is
found to already be in cache and have a dentry (often IS_ROOT()), it will
need to be spliced into the given name in place of the given dentry.
That dentry now needs to be returned.  If the original dentry is used,
NULL should be returned.  Any error should be returned with
ERR_PTR().

In general, filesystems which use d_instantiate_new() to install the new
inode can safely return NULL.  Filesystems which may not have an I_NEW inode
should use d_drop();d_splice_alias() and return the result of the latter.

If a positive dentry cannot be returned for some reason, in-kernel
clients such as cachefiles, nfsd, smb/server may not perform ideally but
will fail-safe.
+21 −2
Original line number Diff line number Diff line
@@ -495,7 +495,7 @@ As of kernel 2.6.22, the following members are defined:
		int (*link) (struct dentry *,struct inode *,struct dentry *);
		int (*unlink) (struct inode *,struct dentry *);
		int (*symlink) (struct mnt_idmap *, struct inode *,struct dentry *,const char *);
		int (*mkdir) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t);
		struct dentry *(*mkdir) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t);
		int (*rmdir) (struct inode *,struct dentry *);
		int (*mknod) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t,dev_t);
		int (*rename) (struct mnt_idmap *, struct inode *, struct dentry *,
@@ -562,7 +562,26 @@ otherwise noted.
``mkdir``
	called by the mkdir(2) system call.  Only required if you want
	to support creating subdirectories.  You will probably need to
	call d_instantiate() just as you would in the create() method
	call d_instantiate_new() just as you would in the create() method.

	If d_instantiate_new() is not used and if the fh_to_dentry()
	export operation is provided, or if the storage might be
	accessible by another path (e.g. with a network filesystem)
	then more care may be needed.  Importantly d_instantate()
	should not be used with an inode that is no longer I_NEW if there
	any chance that the inode could already be attached to a dentry.
	This is because of a hard rule in the VFS that a directory must
	only ever have one dentry.

	For example, if an NFS filesystem is mounted twice the new directory
	could be visible on the other mount before it is on the original
	mount, and a pair of name_to_handle_at(), open_by_handle_at()
	calls could instantiate the directory inode with an IS_ROOT()
	dentry before the first mkdir returns.

	If there is any chance this could happen, then the new inode
	should be d_drop()ed and attached with d_splice_alias().  The
	returned dentry (if any) should be returned by ->mkdir().

``rmdir``
	called by the rmdir(2) system call.  Only required if you want
+3 −4
Original line number Diff line number Diff line
@@ -669,7 +669,7 @@ v9fs_vfs_create(struct mnt_idmap *idmap, struct inode *dir,
 *
 */

static int v9fs_vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
static struct dentry *v9fs_vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
				     struct dentry *dentry, umode_t mode)
{
	int err;
@@ -692,8 +692,7 @@ static int v9fs_vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,

	if (fid)
		p9_fid_put(fid);

	return err;
	return ERR_PTR(err);
}

/**
+4 −4
Original line number Diff line number Diff line
@@ -350,7 +350,7 @@ v9fs_vfs_atomic_open_dotl(struct inode *dir, struct dentry *dentry,
 *
 */

static int v9fs_vfs_mkdir_dotl(struct mnt_idmap *idmap,
static struct dentry *v9fs_vfs_mkdir_dotl(struct mnt_idmap *idmap,
					  struct inode *dir, struct dentry *dentry,
					  umode_t omode)
{
@@ -417,7 +417,7 @@ static int v9fs_vfs_mkdir_dotl(struct mnt_idmap *idmap,
	p9_fid_put(fid);
	v9fs_put_acl(dacl, pacl);
	p9_fid_put(dfid);
	return err;
	return ERR_PTR(err);
}

static int
Loading