Commit ab73b29e authored by David Hildenbrand's avatar David Hildenbrand Committed by Claudio Imbrenda
Browse files

s390/uv: Improve splitting of large folios that cannot be split while dirty



Currently, starting a PV VM on an iomap-based filesystem with large
folio support, such as XFS, will not work. We'll be stuck in
unpack_one()->gmap_make_secure(), because we can't seem to make progress
splitting the large folio.

The problem is that we require a writable PTE but a writable PTE under such
filesystems will imply a dirty folio.

So whenever we have a writable PTE, we'll have a dirty folio, and dirty
iomap folios cannot currently get split, because
split_folio()->split_huge_page_to_list_to_order()->filemap_release_folio()
will fail in iomap_release_folio().

So we will not make any progress splitting such large folios.

Until dirty folios can be split more reliably, let's manually trigger
writeback of the problematic folio using
filemap_write_and_wait_range(), and retry the split immediately
afterwards exactly once, before looking up the folio again.

Should this logic be part of split_folio()? Likely not; most split users
don't have to split so eagerly to make any progress.

For now, this seems to affect xfs, zonefs and erofs, and this patch
makes it work again (tested on xfs only).

While this could be considered a fix for commit 67958013 ("xfs: Support
large folios"), commit df2f9708 ("zonefs: enable support for large
folios") and commit ce529cc2 ("erofs: enable large folios for iomap
mode"), before commit eef88fe4 ("s390/uv: Split large folios in
gmap_make_secure()"), we did not try splitting large folios at all. So it's
all rather part of making SE compatible with file systems that support
large folios. But to have some "Fixes:" tag, let's just use eef88fe4.

Not CCing stable, because there are a lot of dependencies, and it simply
not working is not critical in stable kernels.

Reported-by: default avatarSebastian Mitterle <smitterl@redhat.com>
Closes: https://issues.redhat.com/browse/RHEL-58218


Fixes: eef88fe4 ("s390/uv: Split large folios in gmap_make_secure()")
Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20250516123946.1648026-4-david@redhat.com


Message-ID: <20250516123946.1648026-4-david@redhat.com>
Reviewed-by: default avatarClaudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: default avatarClaudio Imbrenda <imbrenda@linux.ibm.com>
parent bd428b8c
Loading
Loading
Loading
Loading
+60 −6
Original line number Diff line number Diff line
@@ -15,6 +15,7 @@
#include <linux/pagemap.h>
#include <linux/swap.h>
#include <linux/pagewalk.h>
#include <linux/backing-dev.h>
#include <asm/facility.h>
#include <asm/sections.h>
#include <asm/uv.h>
@@ -338,22 +339,75 @@ static int make_folio_secure(struct mm_struct *mm, struct folio *folio, struct u
 */
static int s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio)
{
	int rc;
	int rc, tried_splits;

	lockdep_assert_not_held(&mm->mmap_lock);
	folio_wait_writeback(folio);
	lru_add_drain_all();

	if (folio_test_large(folio)) {
	if (!folio_test_large(folio))
		return 0;

	for (tried_splits = 0; tried_splits < 2; tried_splits++) {
		struct address_space *mapping;
		loff_t lstart, lend;
		struct inode *inode;

		folio_lock(folio);
		rc = split_folio(folio);
		if (rc != -EBUSY) {
			folio_unlock(folio);

		if (rc != -EBUSY)
			return rc;
		return -EAGAIN;
		}
	return 0;

		/*
		 * Splitting with -EBUSY can fail for various reasons, but we
		 * have to handle one case explicitly for now: some mappings
		 * don't allow for splitting dirty folios; writeback will
		 * mark them clean again, including marking all page table
		 * entries mapping the folio read-only, to catch future write
		 * attempts.
		 *
		 * While the system should be writing back dirty folios in the
		 * background, we obtained this folio by looking up a writable
		 * page table entry. On these problematic mappings, writable
		 * page table entries imply dirty folios, preventing the
		 * split in the first place.
		 *
		 * To prevent a livelock when trigger writeback manually and
		 * letting the caller look up the folio again in the page
		 * table (turning it dirty), immediately try to split again.
		 *
		 * This is only a problem for some mappings (e.g., XFS);
		 * mappings that do not support writeback (e.g., shmem) do not
		 * apply.
		 */
		if (!folio_test_dirty(folio) || folio_test_anon(folio) ||
		    !folio->mapping || !mapping_can_writeback(folio->mapping)) {
			folio_unlock(folio);
			break;
		}

		/*
		 * Ideally, we'd only trigger writeback on this exact folio. But
		 * there is no easy way to do that, so we'll stabilize the
		 * mapping while we still hold the folio lock, so we can drop
		 * the folio lock to trigger writeback on the range currently
		 * covered by the folio instead.
		 */
		mapping = folio->mapping;
		lstart = folio_pos(folio);
		lend = lstart + folio_size(folio) - 1;
		inode = igrab(mapping->host);
		folio_unlock(folio);

		if (unlikely(!inode))
			break;

		filemap_write_and_wait_range(mapping, lstart, lend);
		iput(mapping->host);
	}
	return -EAGAIN;
}

int make_hva_secure(struct mm_struct *mm, unsigned long hva, struct uv_cb_header *uvcb)