mm: stop maintaining the per-page mapcount of large folios (CONFIG_NO_PAGE_MAPCOUNT) (74949222) · Commits · git / linux-net

Documentation/admin-guide/cgroup-v1/memory.rst

+4 −0

Original line number	Diff line number	Diff line
		@@ -609,6 +609,10 @@ memory.stat file includes following statistics:

		'rss + mapped_file" will give you resident set size of cgroup.

		Note that some kernel configurations might account complete larger
		allocations (e.g., THP) towards 'rss' and 'mapped_file', even if
		only some, but not all that memory is mapped.

		(Note: file and shmem may be shared among other cgroups. In that case,
		mapped_file is accounted only when the memory cgroup is owner of page
		cache.)

Documentation/admin-guide/cgroup-v2.rst

+8 −2

Original line number	Diff line number	Diff line
		@@ -1440,7 +1440,10 @@ The following nested keys are defined.

		anon
		Amount of memory used in anonymous mappings such as
		brk(), sbrk(), and mmap(MAP_ANONYMOUS)
		brk(), sbrk(), and mmap(MAP_ANONYMOUS). Note that
		some kernel configurations might account complete larger
		allocations (e.g., THP) if only some, but not all the
		memory of such an allocation is mapped anymore.

		file
		Amount of memory used to cache filesystem data,
		@@ -1483,7 +1486,10 @@ The following nested keys are defined.
		Amount of application memory swapped out to zswap.

		file_mapped
		Amount of cached filesystem data mapped with mmap()
		Amount of cached filesystem data mapped with mmap(). Note
		that some kernel configurations might account complete
		larger allocations (e.g., THP) if only some, but not
		not all the memory of such an allocation is mapped.

		file_dirty
		Amount of cached filesystem data that was modified but

Documentation/filesystems/proc.rst

+8 −2

Original line number	Diff line number	Diff line
		@@ -1153,9 +1153,15 @@ Dirty
		Writeback
		Memory which is actively being written back to the disk
		AnonPages
		Non-file backed pages mapped into userspace page tables
		Non-file backed pages mapped into userspace page tables. Note that
		some kernel configurations might consider all pages part of a
		larger allocation (e.g., THP) as "mapped", as soon as a single
		page is mapped.
		Mapped
		files which have been mmapped, such as libraries
		files which have been mmapped, such as libraries. Note that some
		kernel configurations might consider all pages part of a larger
		allocation (e.g., THP) as "mapped", as soon as a single page is
		mapped.
		Shmem
		Total memory used by shared memory (shmem) and tmpfs
		KReclaimable

Documentation/mm/transhuge.rst

+23 −8

Original line number	Diff line number	Diff line
		@@ -116,23 +116,28 @@ pages:
		succeeds on tail pages.

		- map/unmap of a PMD entry for the whole THP increment/decrement
		folio->_entire_mapcount, increment/decrement folio->_large_mapcount
		and also increment/decrement folio->_nr_pages_mapped by ENTIRELY_MAPPED
		when _entire_mapcount goes from -1 to 0 or 0 to -1.
		folio->_entire_mapcount and folio->_large_mapcount.

		We also maintain the two slots for tracking MM owners (MM ID and
		corresponding mapcount), and the current status ("maybe mapped shared" vs.
		"mapped exclusively").

		With CONFIG_PAGE_MAPCOUNT, we also increment/decrement
		folio->_nr_pages_mapped by ENTIRELY_MAPPED when _entire_mapcount goes
		from -1 to 0 or 0 to -1.

		- map/unmap of individual pages with PTE entry increment/decrement
		page->_mapcount, increment/decrement folio->_large_mapcount and also
		increment/decrement folio->_nr_pages_mapped when page->_mapcount goes
		from -1 to 0 or 0 to -1 as this counts the number of pages mapped by PTE.
		folio->_large_mapcount.

		We also maintain the two slots for tracking MM owners (MM ID and
		corresponding mapcount), and the current status ("maybe mapped shared" vs.
		"mapped exclusively").

		With CONFIG_PAGE_MAPCOUNT, we also increment/decrement
		page->_mapcount and increment/decrement folio->_nr_pages_mapped when
		page->_mapcount goes from -1 to 0 or 0 to -1 as this counts the number
		of pages mapped by PTE.

		split_huge_page internally has to distribute the refcounts in the head
		page to the tail pages before clearing all PG_head/tail bits from the page
		structures. It can be done easily for refcounts taken by page table
		@@ -159,8 +164,8 @@ clear where references should go after split: it will stay on the head page.
		Note that split_huge_pmd() doesn't have any limitations on refcounting:
		pmd can be split at any point and never fails.

		Partial unmap and deferred_split_folio()
		========================================
		Partial unmap and deferred_split_folio() (anon THP only)
		========================================================

		Unmapping part of THP (with munmap() or other way) is not going to free
		memory immediately. Instead, we detect that a subpage of THP is not in use
		@@ -175,3 +180,13 @@ a THP crosses a VMA boundary.
		The function deferred_split_folio() is used to queue a folio for splitting.
		The splitting itself will happen when we get memory pressure via shrinker
		interface.

		With CONFIG_PAGE_MAPCOUNT, we reliably detect partial mappings based on
		folio->_nr_pages_mapped.

		With CONFIG_NO_PAGE_MAPCOUNT, we detect partial mappings based on the
		average per-page mapcount in a THP: if the average is < 1, an anon THP is
		certainly partially mapped. As long as only a single process maps a THP,
		this detection is reliable. With long-running child processes, there can
		be scenarios where partial mappings can currently not be detected, and
		might need asynchronous detection during memory reclaim in the future.

include/linux/rmap.h

+29 −6

Original line number	Diff line number	Diff line
		@@ -240,7 +240,7 @@ static __always_inline void folio_set_large_mapcount(struct folio *folio,
		folio_set_mm_id(folio, 0, vma->vm_mm->mm_id);
		}

		static __always_inline void folio_add_large_mapcount(struct folio *folio,
		static __always_inline int folio_add_return_large_mapcount(struct folio *folio,
		int diff, struct vm_area_struct *vma)
		{
		const mm_id_t mm_id = vma->vm_mm->mm_id;
		@@ -286,9 +286,11 @@ static __always_inline void folio_add_large_mapcount(struct folio *folio,
		folio->_mm_ids \|= FOLIO_MM_IDS_SHARED_BIT;
		}
		folio_unlock_large_mapcount(folio);
		return new_mapcount_val + 1;
		}
		#define folio_add_large_mapcount folio_add_return_large_mapcount

		static __always_inline void folio_sub_large_mapcount(struct folio *folio,
		static __always_inline int folio_sub_return_large_mapcount(struct folio *folio,
		int diff, struct vm_area_struct *vma)
		{
		const mm_id_t mm_id = vma->vm_mm->mm_id;
		@@ -331,7 +333,9 @@ static __always_inline void folio_sub_large_mapcount(struct folio *folio,
		folio->_mm_ids &= ~FOLIO_MM_IDS_SHARED_BIT;
		out:
		folio_unlock_large_mapcount(folio);
		return new_mapcount_val + 1;
		}
		#define folio_sub_large_mapcount folio_sub_return_large_mapcount
		#else /* !CONFIG_MM_ID */
		/*
		* See __folio_rmap_sanity_checks(), we might map large folios even without
		@@ -350,17 +354,33 @@ static inline void folio_add_large_mapcount(struct folio *folio,
		atomic_add(diff, &folio->_large_mapcount);
		}

		static inline int folio_add_return_large_mapcount(struct folio *folio,
		int diff, struct vm_area_struct *vma)
		{
		BUILD_BUG();
		}

		static inline void folio_sub_large_mapcount(struct folio *folio,
		int diff, struct vm_area_struct *vma)
		{
		atomic_sub(diff, &folio->_large_mapcount);
		}

		static inline int folio_sub_return_large_mapcount(struct folio *folio,
		int diff, struct vm_area_struct *vma)
		{
		BUILD_BUG();
		}
		#endif /* CONFIG_MM_ID */

		#define folio_inc_large_mapcount(folio, vma) \
		folio_add_large_mapcount(folio, 1, vma)
		#define folio_inc_return_large_mapcount(folio, vma) \
		folio_add_return_large_mapcount(folio, 1, vma)
		#define folio_dec_large_mapcount(folio, vma) \
		folio_sub_large_mapcount(folio, 1, vma)
		#define folio_dec_return_large_mapcount(folio, vma) \
		folio_sub_return_large_mapcount(folio, 1, vma)

		/* RMAP flags, currently only relevant for some anon rmap operations. */
		typedef int __bitwise rmap_t;
		@@ -538,9 +558,11 @@ static __always_inline void __folio_dup_file_rmap(struct folio *folio,
		break;
		}

		if (IS_ENABLED(CONFIG_PAGE_MAPCOUNT)) {
		do {
		atomic_inc(&page->_mapcount);
		} while (page++, --nr_pages > 0);
		}
		folio_add_large_mapcount(folio, orig_nr_pages, dst_vma);
		break;
		case RMAP_LEVEL_PMD:
		@@ -638,6 +660,7 @@ static __always_inline int __folio_try_dup_anon_rmap(struct folio *folio,
		do {
		if (PageAnonExclusive(page))
		ClearPageAnonExclusive(page);
		if (IS_ENABLED(CONFIG_PAGE_MAPCOUNT))
		atomic_inc(&page->_mapcount);
		} while (page++, --nr_pages > 0);
		folio_add_large_mapcount(folio, orig_nr_pages, dst_vma);