+15
−115
Loading
Switch to kmalloc_nolock() universally in local storage. Socket local
storage didn't move to kmalloc_nolock() when BPF memory allocator was
replaced by it for performance reasons. Now that kfree_rcu() supports
freeing memory allocated by kmalloc_nolock(), we can move the remaining
local storages to use kmalloc_nolock() and cleanup the cluttered free
paths.
Use kfree() instead of kfree_nolock() in bpf_selem_free_trace_rcu() and
bpf_local_storage_free_trace_rcu(). Both callbacks run in process context
where spinning is allowed, so kfree_nolock() is unnecessary.
Benchmark:
./bench -p 1 local-storage-create --storage-type socket \
--batch-size {16,32,64}
The benchmark is a microbenchmark stress-testing how fast local storage
can be created. There is no measurable throughput change for socket local
storage after switching from kzalloc() to kmalloc_nolock().
Socket local storage
batch creation speed diff
--------------- ---- ------------------ ----
Baseline 16 433.9 ± 0.6 k/s
32 434.3 ± 1.4 k/s
64 434.2 ± 0.7 k/s
After 16 439.0 ± 1.9 k/s +1.2%
32 437.3 ± 2.0 k/s +0.7%
64 435.8 ± 2.5k/s +0.4%
Also worth noting that the baseline got a 5% throughput boost when sheaf
replaces percpu partial slab recently [0].
[0] https://lore.kernel.org/bpf/20260123-sheaves-for-all-v4-0-041323d506f7@suse.cz/
Signed-off-by:
Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260411015419.114016-3-ameryhung@gmail.com
Signed-off-by:
Alexei Starovoitov <ast@kernel.org>