Unverified Commit 370a6de7 authored by John Garry's avatar John Garry Committed by Christian Brauner
Browse files

iomap: rework IOMAP atomic flags



Flag IOMAP_ATOMIC_SW is not really required. The idea of having this flag
is that the FS ->iomap_begin callback could check if this flag is set to
decide whether to do a SW (FS-based) atomic write. But the FS can set
which ->iomap_begin callback it wants when deciding to do a FS-based
atomic write.

Furthermore, it was thought that IOMAP_ATOMIC_HW is not a proper name, as
the block driver can use SW-methods to emulate an atomic write. So change
back to IOMAP_ATOMIC.

The ->iomap_begin callback needs though to indicate to iomap core that
REQ_ATOMIC needs to be set, so add IOMAP_F_ATOMIC_BIO for that.

These changes were suggested by Christoph Hellwig and Dave Chinner.

Signed-off-by: default avatarJohn Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20250320120250.4087011-4-john.g.garry@oracle.com


Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
parent aacd436e
Loading
Loading
Loading
Loading
+19 −16
Original line number Diff line number Diff line
@@ -514,29 +514,32 @@ IOMAP_WRITE`` with any combination of the following enhancements:
   if the mapping is unwritten and the filesystem cannot handle zeroing
   the unaligned regions without exposing stale contents.

 * ``IOMAP_ATOMIC_HW``: This write is being issued with torn-write
   protection based on HW-offload support.
   Only a single bio can be created for the write, and the write must
   not be split into multiple I/O requests, i.e. flag REQ_ATOMIC must be
   set.
 * ``IOMAP_ATOMIC``: This write is being issued with torn-write
   protection.
   Torn-write protection may be provided based on HW-offload or by a
   software mechanism provided by the filesystem.

   For HW-offload based support, only a single bio can be created for the
   write, and the write must not be split into multiple I/O requests, i.e.
   flag REQ_ATOMIC must be set.
   The file range to write must be aligned to satisfy the requirements
   of both the filesystem and the underlying block device's atomic
   commit capabilities.
   If filesystem metadata updates are required (e.g. unwritten extent
   conversion or copy on write), all updates for the entire file range
   conversion or copy-on-write), all updates for the entire file range
   must be committed atomically as well.
   Only one space mapping is allowed per untorn write.
   Untorn writes may be longer than a single file block. In all cases,
   Untorn-writes may be longer than a single file block. In all cases,
   the mapping start disk block must have at least the same alignment as
   the write offset.

 * ``IOMAP_ATOMIC_SW``: This write is being issued with torn-write
   protection via a software mechanism provided by the filesystem.
   All the disk block alignment and single bio restrictions which apply
   to IOMAP_ATOMIC_HW do not apply here.
   SW-based untorn writes would typically be used as a fallback when
   HW-based untorn writes may not be issued, e.g. the range of the write
   covers multiple extents, meaning that it is not possible to issue
   The filesystems must set IOMAP_F_ATOMIC_BIO to inform iomap core of an
   untorn-write based on HW-offload.

   For untorn-writes based on a software mechanism provided by the
   filesystem, all the disk block alignment and single bio restrictions
   which apply for HW-offload based untorn-writes do not apply.
   The mechanism would typically be used as a fallback for when
   HW-offload based untorn-writes may not be issued, e.g. the range of the
   write covers multiple extents, meaning that it is not possible to issue
   a single bio.
   All filesystem metadata updates for the entire file range must be
   committed atomically as well.
+5 −1
Original line number Diff line number Diff line
@@ -3290,6 +3290,10 @@ static void ext4_set_iomap(struct inode *inode, struct iomap *iomap,
	if (map->m_flags & EXT4_MAP_NEW)
		iomap->flags |= IOMAP_F_NEW;

	/* HW-offload atomics are always used */
	if (flags & IOMAP_ATOMIC)
		iomap->flags |= IOMAP_F_ATOMIC_BIO;

	if (flags & IOMAP_DAX)
		iomap->dax_dev = EXT4_SB(inode->i_sb)->s_daxdev;
	else
@@ -3467,7 +3471,7 @@ static inline bool ext4_want_directio_fallback(unsigned flags, ssize_t written)
		return false;

	/* atomic writes are all-or-nothing */
	if (flags & IOMAP_ATOMIC_HW)
	if (flags & IOMAP_ATOMIC)
		return false;

	/* can only try again if we wrote nothing */
+3 −5
Original line number Diff line number Diff line
@@ -349,7 +349,7 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
	if (dio->flags & IOMAP_DIO_WRITE) {
		bio_opf |= REQ_OP_WRITE;

		if (iter->flags & IOMAP_ATOMIC_HW) {
		if (iomap->flags & IOMAP_F_ATOMIC_BIO) {
			/*
			 * Ensure that the mapping covers the full write
			 * length, otherwise it won't be submitted as a single
@@ -677,10 +677,8 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
			iomi.flags |= IOMAP_OVERWRITE_ONLY;
		}

		if (dio_flags & IOMAP_DIO_ATOMIC_SW)
			iomi.flags |= IOMAP_ATOMIC_SW;
		else if (iocb->ki_flags & IOCB_ATOMIC)
			iomi.flags |= IOMAP_ATOMIC_HW;
		if (iocb->ki_flags & IOCB_ATOMIC)
			iomi.flags |= IOMAP_ATOMIC;

		/* for data sync or sync, we need sync completion processing */
		if (iocb_is_dsync(iocb)) {
+1 −1
Original line number Diff line number Diff line
@@ -99,7 +99,7 @@ DEFINE_RANGE_EVENT(iomap_dio_rw_queued);
	{ IOMAP_FAULT,		"FAULT" }, \
	{ IOMAP_DIRECT,		"DIRECT" }, \
	{ IOMAP_NOWAIT,		"NOWAIT" }, \
	{ IOMAP_ATOMIC_HW,	"ATOMIC_HW" }
	{ IOMAP_ATOMIC,		"ATOMIC" }

#define IOMAP_F_FLAGS_STRINGS \
	{ IOMAP_F_NEW,		"NEW" }, \
+4 −0
Original line number Diff line number Diff line
@@ -828,6 +828,10 @@ xfs_direct_write_iomap_begin(
	if (offset + length > i_size_read(inode))
		iomap_flags |= IOMAP_F_DIRTY;

	/* HW-offload atomics are always used in this path */
	if (flags & IOMAP_ATOMIC)
		iomap_flags |= IOMAP_F_ATOMIC_BIO;

	/*
	 * COW writes may allocate delalloc space or convert unwritten COW
	 * extents, so we need to make sure to take the lock exclusively here.
Loading