Commit 4abb9052 authored by Carlos Maiolino's avatar Carlos Maiolino
Browse files

Merge tag 'atomic-writes-6.16_2025-05-07' of...

Merge tag 'atomic-writes-6.16_2025-05-07' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into atomic_writes

large atomic writes for xfs [v12.1]

Currently atomic write support for xfs is limited to writing a single
block as we have no way to guarantee alignment and that the write covers
a single extent.

This series introduces a method to issue atomic writes via a
software-based method.

The software-based method is used as a fallback for when attempting to
issue an atomic write over misaligned or multiple extents.

For xfs, this support is based on reflink CoW support.

The basic idea of this CoW method is to alloc a range in the CoW fork,
write the data, and atomically update the mapping.

Initial mysql performance testing has shown this method to perform ok.
However, there we are only using 16K atomic writes (and 4K block size),
so typically - and thankfully - this software fallback method won't be
used often.

For other FSes which want large atomics writes and don't support CoW, I
think that they can follow the example in [0].

Catherine is currently working on further xfstests for this feature,
which we hope to share soon.

About 17/17, maybe it can be omitted as there is no strong demand to have
it included.

Based on bfecc409 (xfs/next-rc, xfs/for-next) xfs: allow ro mounts
if rtdev or logdev are read-only

[0] https://lore.kernel.org/linux-xfs/20250102140411.14617-1-john.g.garry@oracle.com/



Differences to v12:
- add more review tags

Differences to v11:
- split "xfs: ignore ..." patch
- inline sync_blockdev() in xfs_alloc_buftarg() (Christoph)
- fix xfs_calc_rtgroup_awu_max() for 0 block count (Darrick)
- Add RB tag from Christoph (thanks!)

Differences to v10:
- add "xfs: only call xfs_setsize_buftarg once ..." by Darrick
- symbol renames in "xfs: ignore HW which cannot..." by Darrick

Differences to v9:
- rework "ignore HW which cannot .." patch by Darrick
- Ensure power-of-2 max always for unit min/max when no HW support

With a bit of luck, this should all go splendidly.

Signed-off-by: default avatar"Darrick J. Wong" <djwong@kernel.org>
parents 23be716b 4528b905
Loading
Loading
Loading
Loading
+11 −0
Original line number Diff line number Diff line
@@ -151,6 +151,17 @@ When mounting an XFS filesystem, the following options are accepted.
	optional, and the log section can be separate from the data
	section or contained within it.

  max_atomic_write=value
	Set the maximum size of an atomic write.  The size may be
	specified in bytes, in kilobytes with a "k" suffix, in megabytes
	with a "m" suffix, or in gigabytes with a "g" suffix.  The size
	cannot be larger than the maximum write size, larger than the
	size of any allocation group, or larger than the size of a
	remapping operation that the log can complete atomically.

	The default value is to set the maximum I/O completion size
	to allow each CPU to handle one at a time.

  max_open_zones=value
	Specify the max number of zones to keep open for writing on a
	zoned rt device. Many open zones aids file data separation
+2 −1
Original line number Diff line number Diff line
@@ -1336,7 +1336,8 @@ void bdev_statx(struct path *path, struct kstat *stat,

		generic_fill_statx_atomic_writes(stat,
			queue_atomic_write_unit_min_bytes(bd_queue),
			queue_atomic_write_unit_max_bytes(bd_queue));
			queue_atomic_write_unit_max_bytes(bd_queue),
			0);
	}

	stat->blksize = bdev_io_min(bdev);
+1 −1
Original line number Diff line number Diff line
@@ -5692,7 +5692,7 @@ int ext4_getattr(struct mnt_idmap *idmap, const struct path *path,
			awu_max = sbi->s_awu_max;
		}

		generic_fill_statx_atomic_writes(stat, awu_min, awu_max);
		generic_fill_statx_atomic_writes(stat, awu_min, awu_max, 0);
	}

	flags = ei->i_flags & EXT4_FL_USER_VISIBLE;
+5 −1
Original line number Diff line number Diff line
@@ -136,13 +136,15 @@ EXPORT_SYMBOL(generic_fill_statx_attr);
 * @stat:	Where to fill in the attribute flags
 * @unit_min:	Minimum supported atomic write length in bytes
 * @unit_max:	Maximum supported atomic write length in bytes
 * @unit_max_opt: Optimised maximum supported atomic write length in bytes
 *
 * Fill in the STATX{_ATTR}_WRITE_ATOMIC flags in the kstat structure from
 * atomic write unit_min and unit_max values.
 */
void generic_fill_statx_atomic_writes(struct kstat *stat,
				      unsigned int unit_min,
				      unsigned int unit_max)
				      unsigned int unit_max,
				      unsigned int unit_max_opt)
{
	/* Confirm that the request type is known */
	stat->result_mask |= STATX_WRITE_ATOMIC;
@@ -153,6 +155,7 @@ void generic_fill_statx_atomic_writes(struct kstat *stat,
	if (unit_min) {
		stat->atomic_write_unit_min = unit_min;
		stat->atomic_write_unit_max = unit_max;
		stat->atomic_write_unit_max_opt = unit_max_opt;
		/* Initially only allow 1x segment */
		stat->atomic_write_segments_max = 1;

@@ -732,6 +735,7 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer)
	tmp.stx_atomic_write_unit_min = stat->atomic_write_unit_min;
	tmp.stx_atomic_write_unit_max = stat->atomic_write_unit_max;
	tmp.stx_atomic_write_segments_max = stat->atomic_write_segments_max;
	tmp.stx_atomic_write_unit_max_opt = stat->atomic_write_unit_max_opt;

	return copy_to_user(buffer, &tmp, sizeof(tmp)) ? -EFAULT : 0;
}
+5 −0
Original line number Diff line number Diff line
@@ -3312,6 +3312,11 @@ xfs_bmap_compute_alignments(
		align = xfs_get_cowextsz_hint(ap->ip);
	else if (ap->datatype & XFS_ALLOC_USERDATA)
		align = xfs_get_extsz_hint(ap->ip);

	/* Try to align start block to any minimum allocation alignment */
	if (align > 1 && (ap->flags & XFS_BMAPI_EXTSZALIGN))
		args->alignment = align;

	if (align) {
		if (xfs_bmap_extsize_align(mp, &ap->got, &ap->prev, align, 0,
					ap->eof, 0, ap->conv, &ap->offset,
Loading