Commit 0ea680ed authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull slab updates from Vlastimil Babka:

 - Freelist loading optimization (Chengming Zhou)

   When the per-cpu slab is depleted and a new one loaded from the cpu
   partial list, optimize the loading to avoid an irq enable/disable
   cycle. This results in a 3.5% performance improvement on the "perf
   bench sched messaging" test.

 - Kernel boot parameters cleanup after SLAB removal (Xiongwei Song)

   Due to two different main slab implementations we've had boot
   parameters prefixed either slab_ and slub_ with some later becoming
   an alias as both implementations gained the same functionality (i.e.
   slab_nomerge vs slub_nomerge). In order to eventually get rid of the
   implementation-specific names, the canonical and documented
   parameters are now all prefixed slab_ and the slub_ variants become
   deprecated but still working aliases.

 - SLAB_ kmem_cache creation flags cleanup (Vlastimil Babka)

   The flags had hardcoded #define values which became tedious and
   error-prone when adding new ones. Assign the values via an enum that
   takes care of providing unique bit numbers. Also deprecate
   SLAB_MEM_SPREAD which was only used by SLAB, so it's a no-op since
   SLAB removal. Assign it an explicit zero value. The removals of the
   flag usage are handled independently in the respective subsystems,
   with a final removal of any leftover usage planned for the next
   release.

 - Misc cleanups and fixes (Chengming Zhou, Xiaolei Wang, Zheng Yejian)

   Includes removal of unused code or function parameters and a fix of a
   memleak.

* tag 'slab-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  slab: remove PARTIAL_NODE slab_state
  mm, slab: remove memcg_from_slab_obj()
  mm, slab: remove the corner case of inc_slabs_node()
  mm/slab: Fix a kmemleak in kmem_cache_destroy()
  mm, slab, kasan: replace kasan_never_merge() with SLAB_NO_MERGE
  mm, slab: use an enum to define SLAB_ cache creation flags
  mm, slab: deprecate SLAB_MEM_SPREAD flag
  mm, slab: fix the comment of cpu partial list
  mm, slab: remove unused object_size parameter in kmem_cache_flags()
  mm/slub: remove parameter 'flags' in create_kmalloc_caches()
  mm/slub: remove unused parameter in next_freelist_entry()
  mm/slub: remove full list manipulation for non-debug slab
  mm/slub: directly load freelist from cpu partial slab in the likely case
  mm/slub: make the description of slab_min_objects helpful in doc
  mm/slub: replace slub_$params with slab_$params in slub.rst
  mm/slub: unify all sl[au]b parameters with "slab_$param"
  Documentation: kernel-parameters: remove noaliencache
parents cc4a875c 1a1c4e45
Loading
Loading
Loading
Loading
+32 −43
Original line number Diff line number Diff line
@@ -3771,10 +3771,6 @@
	no5lvl		[X86-64,RISCV,EARLY] Disable 5-level paging mode. Forces
			kernel to use 4-level paging instead.

	noaliencache	[MM, NUMA, SLAB] Disables the allocation of alien
			caches in the slab allocator.  Saves per-node memory,
			but will impact performance.

	noalign		[KNL,ARM]

	noaltinstr	[S390,EARLY] Disables alternative instructions
@@ -5930,65 +5926,58 @@
	simeth=		[IA-64]
	simscsi=

	slram=		[HW,MTD]

	slab_merge	[MM]
			Enable merging of slabs with similar size when the
			kernel is built without CONFIG_SLAB_MERGE_DEFAULT.

	slab_nomerge	[MM]
			Disable merging of slabs with similar size. May be
			necessary if there is some reason to distinguish
			allocs to different slabs, especially in hardened
			environments where the risk of heap overflows and
			layout control by attackers can usually be
			frustrated by disabling merging. This will reduce
			most of the exposure of a heap attack to a single
			cache (risks via metadata attacks are mostly
			unchanged). Debug options disable merging on their
			own.
			For more information see Documentation/mm/slub.rst.

	slab_max_order=	[MM, SLAB]
			Determines the maximum allowed order for slabs.
			A high setting may cause OOMs due to memory
			fragmentation.  Defaults to 1 for systems with
			more than 32MB of RAM, 0 otherwise.

	slub_debug[=options[,slabs][;[options[,slabs]]...]	[MM, SLUB]
			Enabling slub_debug allows one to determine the
	slab_debug[=options[,slabs][;[options[,slabs]]...]	[MM]
			Enabling slab_debug allows one to determine the
			culprit if slab objects become corrupted. Enabling
			slub_debug can create guard zones around objects and
			slab_debug can create guard zones around objects and
			may poison objects when not in use. Also tracks the
			last alloc / free. For more information see
			Documentation/mm/slub.rst.
			(slub_debug legacy name also accepted for now)

	slub_max_order= [MM, SLUB]
	slab_max_order= [MM]
			Determines the maximum allowed order for slabs.
			A high setting may cause OOMs due to memory
			fragmentation. For more information see
			Documentation/mm/slub.rst.
			(slub_max_order legacy name also accepted for now)

	slab_merge	[MM]
			Enable merging of slabs with similar size when the
			kernel is built without CONFIG_SLAB_MERGE_DEFAULT.
			(slub_merge legacy name also accepted for now)

	slub_min_objects=	[MM, SLUB]
	slab_min_objects=	[MM]
			The minimum number of objects per slab. SLUB will
			increase the slab order up to slub_max_order to
			increase the slab order up to slab_max_order to
			generate a sufficiently large slab able to contain
			the number of objects indicated. The higher the number
			of objects the smaller the overhead of tracking slabs
			and the less frequently locks need to be acquired.
			For more information see Documentation/mm/slub.rst.
			(slub_min_objects legacy name also accepted for now)

	slub_min_order=	[MM, SLUB]
	slab_min_order=	[MM]
			Determines the minimum page order for slabs. Must be
			lower than slub_max_order.
			For more information see Documentation/mm/slub.rst.
			lower or equal to slab_max_order. For more information see
			Documentation/mm/slub.rst.
			(slub_min_order legacy name also accepted for now)

	slub_merge	[MM, SLUB]
			Same with slab_merge.
	slab_nomerge	[MM]
			Disable merging of slabs with similar size. May be
			necessary if there is some reason to distinguish
			allocs to different slabs, especially in hardened
			environments where the risk of heap overflows and
			layout control by attackers can usually be
			frustrated by disabling merging. This will reduce
			most of the exposure of a heap attack to a single
			cache (risks via metadata attacks are mostly
			unchanged). Debug options disable merging on their
			own.
			For more information see Documentation/mm/slub.rst.
			(slub_nomerge legacy name also accepted for now)

	slub_nomerge	[MM, SLUB]
			Same with slab_nomerge. This is supported for legacy.
			See slab_nomerge for more information.
	slram=		[HW,MTD]

	smart2=		[HW]
			Format: <io1>[,<io2>[,...,<io8>]]
+30 −30
Original line number Diff line number Diff line
@@ -9,7 +9,7 @@ SLUB can enable debugging only for selected slabs in order to avoid
an impact on overall system performance which may make a bug more
difficult to find.

In order to switch debugging on one can add an option ``slub_debug``
In order to switch debugging on one can add an option ``slab_debug``
to the kernel command line. That will enable full debugging for
all slabs.

@@ -26,16 +26,16 @@ be enabled on the command line. F.e. no tracking information will be
available without debugging on and validation can only partially
be performed if debugging was not switched on.

Some more sophisticated uses of slub_debug:
Some more sophisticated uses of slab_debug:
-------------------------------------------

Parameters may be given to ``slub_debug``. If none is specified then full
Parameters may be given to ``slab_debug``. If none is specified then full
debugging is enabled. Format:

slub_debug=<Debug-Options>
slab_debug=<Debug-Options>
	Enable options for all slabs

slub_debug=<Debug-Options>,<slab name1>,<slab name2>,...
slab_debug=<Debug-Options>,<slab name1>,<slab name2>,...
	Enable options only for select slabs (no spaces
	after a comma)

@@ -60,23 +60,23 @@ Possible debug options are::

F.e. in order to boot just with sanity checks and red zoning one would specify::

	slub_debug=FZ
	slab_debug=FZ

Trying to find an issue in the dentry cache? Try::

	slub_debug=,dentry
	slab_debug=,dentry

to only enable debugging on the dentry cache.  You may use an asterisk at the
end of the slab name, in order to cover all slabs with the same prefix.  For
example, here's how you can poison the dentry cache as well as all kmalloc
slabs::

	slub_debug=P,kmalloc-*,dentry
	slab_debug=P,kmalloc-*,dentry

Red zoning and tracking may realign the slab.  We can just apply sanity checks
to the dentry cache with::

	slub_debug=F,dentry
	slab_debug=F,dentry

Debugging options may require the minimum possible slab order to increase as
a result of storing the metadata (for example, caches with PAGE_SIZE object
@@ -84,20 +84,20 @@ sizes). This has a higher liklihood of resulting in slab allocation errors
in low memory situations or if there's high fragmentation of memory.  To
switch off debugging for such caches by default, use::

	slub_debug=O
	slab_debug=O

You can apply different options to different list of slab names, using blocks
of options. This will enable red zoning for dentry and user tracking for
kmalloc. All other slabs will not get any debugging enabled::

	slub_debug=Z,dentry;U,kmalloc-*
	slab_debug=Z,dentry;U,kmalloc-*

You can also enable options (e.g. sanity checks and poisoning) for all caches
except some that are deemed too performance critical and don't need to be
debugged by specifying global debug options followed by a list of slab names
with "-" as options::

	slub_debug=FZ;-,zs_handle,zspage
	slab_debug=FZ;-,zs_handle,zspage

The state of each debug option for a slab can be found in the respective files
under::
@@ -105,7 +105,7 @@ under::
	/sys/kernel/slab/<slab name>/

If the file contains 1, the option is enabled, 0 means disabled. The debug
options from the ``slub_debug`` parameter translate to the following files::
options from the ``slab_debug`` parameter translate to the following files::

	F	sanity_checks
	Z	red_zone
@@ -129,7 +129,7 @@ in order to reduce overhead and increase cache hotness of objects.
Slab validation
===============

SLUB can validate all object if the kernel was booted with slub_debug. In
SLUB can validate all object if the kernel was booted with slab_debug. In
order to do so you must have the ``slabinfo`` tool. Then you can do
::

@@ -150,29 +150,29 @@ list_lock once in a while to deal with partial slabs. That overhead is
governed by the order of the allocation for each slab. The allocations
can be influenced by kernel parameters:

.. slub_min_objects=x		(default 4)
.. slub_min_order=x		(default 0)
.. slub_max_order=x		(default 3 (PAGE_ALLOC_COSTLY_ORDER))
.. slab_min_objects=x		(default: automatically scaled by number of cpus)
.. slab_min_order=x		(default 0)
.. slab_max_order=x		(default 3 (PAGE_ALLOC_COSTLY_ORDER))

``slub_min_objects``
``slab_min_objects``
	allows to specify how many objects must at least fit into one
	slab in order for the allocation order to be acceptable.  In
	general slub will be able to perform this number of
	allocations on a slab without consulting centralized resources
	(list_lock) where contention may occur.

``slub_min_order``
``slab_min_order``
	specifies a minimum order of slabs. A similar effect like
	``slub_min_objects``.
	``slab_min_objects``.

``slub_max_order``
	specified the order at which ``slub_min_objects`` should no
``slab_max_order``
	specified the order at which ``slab_min_objects`` should no
	longer be checked. This is useful to avoid SLUB trying to
	generate super large order pages to fit ``slub_min_objects``
	generate super large order pages to fit ``slab_min_objects``
	of a slab cache with large object sizes into one high order
	page. Setting command line parameter
	``debug_guardpage_minorder=N`` (N > 0), forces setting
	``slub_max_order`` to 0, what cause minimum possible order of
	``slab_max_order`` to 0, what cause minimum possible order of
	slabs allocation.

SLUB Debug output
@@ -219,7 +219,7 @@ Here is a sample of slub debug output::
 FIX kmalloc-8: Restoring Redzone 0xc90f6d28-0xc90f6d2b=0xcc

If SLUB encounters a corrupted object (full detection requires the kernel
to be booted with slub_debug) then the following output will be dumped
to be booted with slab_debug) then the following output will be dumped
into the syslog:

1. Description of the problem encountered
@@ -239,7 +239,7 @@ into the syslog:
	pid=<pid of the process>

   (Object allocation / free information is only available if SLAB_STORE_USER is
   set for the slab. slub_debug sets that option)
   set for the slab. slab_debug sets that option)

2. The object contents if an object was involved.

@@ -262,7 +262,7 @@ into the syslog:
	the object boundary.

	(Redzone information is only available if SLAB_RED_ZONE is set.
	slub_debug sets that option)
	slab_debug sets that option)

   Padding <address> : <bytes>
	Unused data to fill up the space in order to get the next object
@@ -296,7 +296,7 @@ Emergency operations

Minimal debugging (sanity checks alone) can be enabled by booting with::

	slub_debug=F
	slab_debug=F

This will be generally be enough to enable the resiliency features of slub
which will keep the system running even if a bad kernel component will
@@ -311,13 +311,13 @@ and enabling debugging only for that cache

I.e.::

	slub_debug=F,dentry
	slab_debug=F,dentry

If the corruption occurs by writing after the end of the object then it
may be advisable to enable a Redzone to avoid corrupting the beginning
of other objects::

	slub_debug=FZ,dentry
	slab_debug=FZ,dentry

Extended slabinfo mode and plotting
===================================
+1 −1
Original line number Diff line number Diff line
@@ -48,7 +48,7 @@ static void lkdtm_VMALLOC_LINEAR_OVERFLOW(void)
 * correctly.
 *
 * This should get caught by either memory tagging, KASan, or by using
 * CONFIG_SLUB_DEBUG=y and slub_debug=ZF (or CONFIG_SLUB_DEBUG_ON=y).
 * CONFIG_SLUB_DEBUG=y and slab_debug=ZF (or CONFIG_SLUB_DEBUG_ON=y).
 */
static void lkdtm_SLAB_LINEAR_OVERFLOW(void)
{
+0 −6
Original line number Diff line number Diff line
@@ -429,7 +429,6 @@ struct kasan_cache {
};

size_t kasan_metadata_size(struct kmem_cache *cache, bool in_object);
slab_flags_t kasan_never_merge(void);
void kasan_cache_create(struct kmem_cache *cache, unsigned int *size,
			slab_flags_t *flags);

@@ -446,11 +445,6 @@ static inline size_t kasan_metadata_size(struct kmem_cache *cache,
{
	return 0;
}
/* And thus nothing prevents cache merging. */
static inline slab_flags_t kasan_never_merge(void)
{
	return 0;
}
/* And no cache-related metadata initialization is required. */
static inline void kasan_cache_create(struct kmem_cache *cache,
				      unsigned int *size,
+69 −28
Original line number Diff line number Diff line
@@ -21,29 +21,69 @@
#include <linux/cleanup.h>
#include <linux/hash.h>

enum _slab_flag_bits {
	_SLAB_CONSISTENCY_CHECKS,
	_SLAB_RED_ZONE,
	_SLAB_POISON,
	_SLAB_KMALLOC,
	_SLAB_HWCACHE_ALIGN,
	_SLAB_CACHE_DMA,
	_SLAB_CACHE_DMA32,
	_SLAB_STORE_USER,
	_SLAB_PANIC,
	_SLAB_TYPESAFE_BY_RCU,
	_SLAB_TRACE,
#ifdef CONFIG_DEBUG_OBJECTS
	_SLAB_DEBUG_OBJECTS,
#endif
	_SLAB_NOLEAKTRACE,
	_SLAB_NO_MERGE,
#ifdef CONFIG_FAILSLAB
	_SLAB_FAILSLAB,
#endif
#ifdef CONFIG_MEMCG_KMEM
	_SLAB_ACCOUNT,
#endif
#ifdef CONFIG_KASAN_GENERIC
	_SLAB_KASAN,
#endif
	_SLAB_NO_USER_FLAGS,
#ifdef CONFIG_KFENCE
	_SLAB_SKIP_KFENCE,
#endif
#ifndef CONFIG_SLUB_TINY
	_SLAB_RECLAIM_ACCOUNT,
#endif
	_SLAB_OBJECT_POISON,
	_SLAB_CMPXCHG_DOUBLE,
	_SLAB_FLAGS_LAST_BIT
};

#define __SLAB_FLAG_BIT(nr)	((slab_flags_t __force)(1U << (nr)))
#define __SLAB_FLAG_UNUSED	((slab_flags_t __force)(0U))

/*
 * Flags to pass to kmem_cache_create().
 * The ones marked DEBUG need CONFIG_SLUB_DEBUG enabled, otherwise are no-op
 */
/* DEBUG: Perform (expensive) checks on alloc/free */
#define SLAB_CONSISTENCY_CHECKS	((slab_flags_t __force)0x00000100U)
#define SLAB_CONSISTENCY_CHECKS	__SLAB_FLAG_BIT(_SLAB_CONSISTENCY_CHECKS)
/* DEBUG: Red zone objs in a cache */
#define SLAB_RED_ZONE		((slab_flags_t __force)0x00000400U)
#define SLAB_RED_ZONE		__SLAB_FLAG_BIT(_SLAB_RED_ZONE)
/* DEBUG: Poison objects */
#define SLAB_POISON		((slab_flags_t __force)0x00000800U)
#define SLAB_POISON		__SLAB_FLAG_BIT(_SLAB_POISON)
/* Indicate a kmalloc slab */
#define SLAB_KMALLOC		((slab_flags_t __force)0x00001000U)
#define SLAB_KMALLOC		__SLAB_FLAG_BIT(_SLAB_KMALLOC)
/* Align objs on cache lines */
#define SLAB_HWCACHE_ALIGN	((slab_flags_t __force)0x00002000U)
#define SLAB_HWCACHE_ALIGN	__SLAB_FLAG_BIT(_SLAB_HWCACHE_ALIGN)
/* Use GFP_DMA memory */
#define SLAB_CACHE_DMA		((slab_flags_t __force)0x00004000U)
#define SLAB_CACHE_DMA		__SLAB_FLAG_BIT(_SLAB_CACHE_DMA)
/* Use GFP_DMA32 memory */
#define SLAB_CACHE_DMA32	((slab_flags_t __force)0x00008000U)
#define SLAB_CACHE_DMA32	__SLAB_FLAG_BIT(_SLAB_CACHE_DMA32)
/* DEBUG: Store the last owner for bug hunting */
#define SLAB_STORE_USER		((slab_flags_t __force)0x00010000U)
#define SLAB_STORE_USER		__SLAB_FLAG_BIT(_SLAB_STORE_USER)
/* Panic if kmem_cache_create() fails */
#define SLAB_PANIC		((slab_flags_t __force)0x00040000U)
#define SLAB_PANIC		__SLAB_FLAG_BIT(_SLAB_PANIC)
/*
 * SLAB_TYPESAFE_BY_RCU - **WARNING** READ THIS!
 *
@@ -95,21 +135,19 @@
 * Note that SLAB_TYPESAFE_BY_RCU was originally named SLAB_DESTROY_BY_RCU.
 */
/* Defer freeing slabs to RCU */
#define SLAB_TYPESAFE_BY_RCU	((slab_flags_t __force)0x00080000U)
/* Spread some memory over cpuset */
#define SLAB_MEM_SPREAD		((slab_flags_t __force)0x00100000U)
#define SLAB_TYPESAFE_BY_RCU	__SLAB_FLAG_BIT(_SLAB_TYPESAFE_BY_RCU)
/* Trace allocations and frees */
#define SLAB_TRACE		((slab_flags_t __force)0x00200000U)
#define SLAB_TRACE		__SLAB_FLAG_BIT(_SLAB_TRACE)

/* Flag to prevent checks on free */
#ifdef CONFIG_DEBUG_OBJECTS
# define SLAB_DEBUG_OBJECTS	((slab_flags_t __force)0x00400000U)
# define SLAB_DEBUG_OBJECTS	__SLAB_FLAG_BIT(_SLAB_DEBUG_OBJECTS)
#else
# define SLAB_DEBUG_OBJECTS	0
# define SLAB_DEBUG_OBJECTS	__SLAB_FLAG_UNUSED
#endif

/* Avoid kmemleak tracing */
#define SLAB_NOLEAKTRACE	((slab_flags_t __force)0x00800000U)
#define SLAB_NOLEAKTRACE	__SLAB_FLAG_BIT(_SLAB_NOLEAKTRACE)

/*
 * Prevent merging with compatible kmem caches. This flag should be used
@@ -121,25 +159,25 @@
 * - performance critical caches, should be very rare and consulted with slab
 *   maintainers, and not used together with CONFIG_SLUB_TINY
 */
#define SLAB_NO_MERGE		((slab_flags_t __force)0x01000000U)
#define SLAB_NO_MERGE		__SLAB_FLAG_BIT(_SLAB_NO_MERGE)

/* Fault injection mark */
#ifdef CONFIG_FAILSLAB
# define SLAB_FAILSLAB		((slab_flags_t __force)0x02000000U)
# define SLAB_FAILSLAB		__SLAB_FLAG_BIT(_SLAB_FAILSLAB)
#else
# define SLAB_FAILSLAB		0
# define SLAB_FAILSLAB		__SLAB_FLAG_UNUSED
#endif
/* Account to memcg */
#ifdef CONFIG_MEMCG_KMEM
# define SLAB_ACCOUNT		((slab_flags_t __force)0x04000000U)
# define SLAB_ACCOUNT		__SLAB_FLAG_BIT(_SLAB_ACCOUNT)
#else
# define SLAB_ACCOUNT		0
# define SLAB_ACCOUNT		__SLAB_FLAG_UNUSED
#endif

#ifdef CONFIG_KASAN_GENERIC
#define SLAB_KASAN		((slab_flags_t __force)0x08000000U)
#define SLAB_KASAN		__SLAB_FLAG_BIT(_SLAB_KASAN)
#else
#define SLAB_KASAN		0
#define SLAB_KASAN		__SLAB_FLAG_UNUSED
#endif

/*
@@ -147,23 +185,26 @@
 * Intended for caches created for self-tests so they have only flags
 * specified in the code and other flags are ignored.
 */
#define SLAB_NO_USER_FLAGS	((slab_flags_t __force)0x10000000U)
#define SLAB_NO_USER_FLAGS	__SLAB_FLAG_BIT(_SLAB_NO_USER_FLAGS)

#ifdef CONFIG_KFENCE
#define SLAB_SKIP_KFENCE	((slab_flags_t __force)0x20000000U)
#define SLAB_SKIP_KFENCE	__SLAB_FLAG_BIT(_SLAB_SKIP_KFENCE)
#else
#define SLAB_SKIP_KFENCE	0
#define SLAB_SKIP_KFENCE	__SLAB_FLAG_UNUSED
#endif

/* The following flags affect the page allocator grouping pages by mobility */
/* Objects are reclaimable */
#ifndef CONFIG_SLUB_TINY
#define SLAB_RECLAIM_ACCOUNT	((slab_flags_t __force)0x00020000U)
#define SLAB_RECLAIM_ACCOUNT	__SLAB_FLAG_BIT(_SLAB_RECLAIM_ACCOUNT)
#else
#define SLAB_RECLAIM_ACCOUNT	((slab_flags_t __force)0)
#define SLAB_RECLAIM_ACCOUNT	__SLAB_FLAG_UNUSED
#endif
#define SLAB_TEMPORARY		SLAB_RECLAIM_ACCOUNT	/* Objects are short-lived */

/* Obsolete unused flag, to be removed */
#define SLAB_MEM_SPREAD		__SLAB_FLAG_UNUSED

/*
 * ZERO_SIZE_PTR will be returned for zero sized kmalloc requests.
 *
Loading