Commit 9651fced authored by Jason A. Donenfeld's avatar Jason A. Donenfeld
Browse files

mm: add MAP_DROPPABLE for designating always lazily freeable mappings



The vDSO getrandom() implementation works with a buffer allocated with a
new system call that has certain requirements:

- It shouldn't be written to core dumps.
  * Easy: VM_DONTDUMP.
- It should be zeroed on fork.
  * Easy: VM_WIPEONFORK.

- It shouldn't be written to swap.
  * Uh-oh: mlock is rlimited.
  * Uh-oh: mlock isn't inherited by forks.

- It shouldn't reserve actual memory, but it also shouldn't crash when
  page faulting in memory if none is available
  * Uh-oh: VM_NORESERVE means segfaults.

It turns out that the vDSO getrandom() function has three really nice
characteristics that we can exploit to solve this problem:

1) Due to being wiped during fork(), the vDSO code is already robust to
   having the contents of the pages it reads zeroed out midway through
   the function's execution.

2) In the absolute worst case of whatever contingency we're coding for,
   we have the option to fallback to the getrandom() syscall, and
   everything is fine.

3) The buffers the function uses are only ever useful for a maximum of
   60 seconds -- a sort of cache, rather than a long term allocation.

These characteristics mean that we can introduce VM_DROPPABLE, which
has the following semantics:

a) It never is written out to swap.
b) Under memory pressure, mm can just drop the pages (so that they're
   zero when read back again).
c) It is inherited by fork.
d) It doesn't count against the mlock budget, since nothing is locked.
e) If there's not enough memory to service a page fault, it's not fatal,
   and no signal is sent.

This way, allocations used by vDSO getrandom() can use:

    VM_DROPPABLE | VM_DONTDUMP | VM_WIPEONFORK | VM_NORESERVE

And there will be no problem with OOMing, crashing on overcommitment,
using memory when not in use, not wiping on fork(), coredumps, or
writing out to swap.

In order to let vDSO getrandom() use this, expose these via mmap(2) as
MAP_DROPPABLE.

Note that this involves removing the MADV_FREE special case from
sort_folio(), which according to Yu Zhao is unnecessary and will simply
result in an extra call to shrink_folio_list() in the worst case. The
chunk removed reenables the swapbacked flag, which we don't want for
VM_DROPPABLE, and we can't conditionalize it here because there isn't a
vma reference available.

Finally, the provided self test ensures that this is working as desired.

Cc: linux-mm@kvack.org
Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
parent 8a18fda0
Loading
Loading
Loading
Loading
+1 −0
Original line number Diff line number Diff line
@@ -708,6 +708,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
		[ilog2(VM_SHADOW_STACK)] = "ss",
#endif
#ifdef CONFIG_64BIT
		[ilog2(VM_DROPPABLE)] = "dp",
		[ilog2(VM_SEALED)] = "sl",
#endif
	};
+7 −0
Original line number Diff line number Diff line
@@ -406,6 +406,13 @@ extern unsigned int kobjsize(const void *objp);
#define VM_ALLOW_ANY_UNCACHED		VM_NONE
#endif

#ifdef CONFIG_64BIT
#define VM_DROPPABLE_BIT	40
#define VM_DROPPABLE		BIT(VM_DROPPABLE_BIT)
#else
#define VM_DROPPABLE		VM_NONE
#endif

#ifdef CONFIG_64BIT
/* VM is sealed, in vm_flags */
#define VM_SEALED	_BITUL(63)
+3 −0
Original line number Diff line number Diff line
@@ -218,6 +218,9 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma,
{
	vm_flags &= __VM_UFFD_FLAGS;

	if (vm_flags & VM_DROPPABLE)
		return false;

	if ((vm_flags & VM_UFFD_MINOR) &&
	    (!is_vm_hugetlb_page(vma) && !vma_is_shmem(vma)))
		return false;
+7 −0
Original line number Diff line number Diff line
@@ -165,6 +165,12 @@ IF_HAVE_PG_ARCH_X(arch_3)
# define IF_HAVE_UFFD_MINOR(flag, name)
#endif

#ifdef CONFIG_64BIT
# define IF_HAVE_VM_DROPPABLE(flag, name) {flag, name},
#else
# define IF_HAVE_VM_DROPPABLE(flag, name)
#endif

#define __def_vmaflag_names						\
	{VM_READ,			"read"		},		\
	{VM_WRITE,			"write"		},		\
@@ -197,6 +203,7 @@ IF_HAVE_VM_SOFTDIRTY(VM_SOFTDIRTY, "softdirty" ) \
	{VM_MIXEDMAP,			"mixedmap"	},		\
	{VM_HUGEPAGE,			"hugepage"	},		\
	{VM_NOHUGEPAGE,			"nohugepage"	},		\
IF_HAVE_VM_DROPPABLE(VM_DROPPABLE,	"droppable"	)		\
	{VM_MERGEABLE,			"mergeable"	}		\

#define show_vma_flags(flags)						\
+1 −0
Original line number Diff line number Diff line
@@ -17,6 +17,7 @@
#define MAP_SHARED	0x01		/* Share changes */
#define MAP_PRIVATE	0x02		/* Changes are private */
#define MAP_SHARED_VALIDATE 0x03	/* share + validate extension flags */
#define MAP_DROPPABLE	0x08		/* Zero memory under memory pressure. */

/*
 * Huge page size encoding when MAP_HUGETLB is specified, and a huge page
Loading