Commit 3be306cc authored by Kyle Meyer's avatar Kyle Meyer Committed by Andrew Morton
Browse files

mm/memory-failure: fix redundant updates for already poisoned pages

Duplicate memory errors can be reported by multiple sources.

Passing an already poisoned page to action_result() causes issues:

* The amount of hardware corrupted memory is incorrectly updated.
* Per NUMA node MF stats are incorrectly updated.
* Redundant "already poisoned" messages are printed.

Avoid those issues by:

* Skipping hardware corrupted memory updates for already poisoned pages.
* Skipping per NUMA node MF stats updates for already poisoned pages.
* Dropping redundant "already poisoned" messages.

Make MF_MSG_ALREADY_POISONED consistent with other action_page_types and
make calls to action_result() consistent for already poisoned normal pages
and huge pages.

Link: https://lkml.kernel.org/r/aLCiHMy12Ck3ouwC@hpe.com


Fixes: b8b9488d ("mm/memory-failure: improve memory failure action_result messages")
Signed-off-by: default avatarKyle Meyer <kyle.meyer@hpe.com>
Reviewed-by: default avatarJiaqi Yan <jiaqiyan@google.com>
Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
Reviewed-by: default avatarJane Chu <jane.chu@oracle.com>
Acked-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Kyle Meyer <kyle.meyer@hpe.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
parent e67f0bd0
Loading
Loading
Loading
Loading
+6 −7
Original line number Diff line number Diff line
@@ -956,7 +956,7 @@ static const char * const action_page_types[] = {
	[MF_MSG_BUDDY]			= "free buddy page",
	[MF_MSG_DAX]			= "dax page",
	[MF_MSG_UNSPLIT_THP]		= "unsplit thp",
	[MF_MSG_ALREADY_POISONED]	= "already poisoned",
	[MF_MSG_ALREADY_POISONED]	= "already poisoned page",
	[MF_MSG_UNKNOWN]		= "unknown page",
};

@@ -1349,9 +1349,10 @@ static int action_result(unsigned long pfn, enum mf_action_page_type type,
{
	trace_memory_failure_event(pfn, type, result);

	if (type != MF_MSG_ALREADY_POISONED) {
		num_poisoned_pages_inc(pfn);

		update_per_node_mf_stats(pfn, result);
	}

	pr_err("%#lx: recovery action for %s: %s\n",
		pfn, action_page_types[type], action_name[result]);
@@ -2094,12 +2095,11 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
		*hugetlb = 0;
		return 0;
	} else if (res == -EHWPOISON) {
		pr_err("%#lx: already hardware poisoned\n", pfn);
		if (flags & MF_ACTION_REQUIRED) {
			folio = page_folio(p);
			res = kill_accessing_process(current, folio_pfn(folio), flags);
			action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
		}
		action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
		return res;
	} else if (res == -EBUSY) {
		if (!(flags & MF_NO_RETRY)) {
@@ -2285,7 +2285,6 @@ int memory_failure(unsigned long pfn, int flags)
		goto unlock_mutex;

	if (TestSetPageHWPoison(p)) {
		pr_err("%#lx: already hardware poisoned\n", pfn);
		res = -EHWPOISON;
		if (flags & MF_ACTION_REQUIRED)
			res = kill_accessing_process(current, pfn, flags);