Commit 6785aa9d authored by Jakub Kicinski's avatar Jakub Kicinski
Browse files

Merge tag 'ipsec-next-2025-11-18' of...

Merge tag 'ipsec-next-2025-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2025-11-18

1) Relax a lock contention bottleneck to improve IPsec crypto
   offload performance. From Jianbo Liu.

2) Deprecate pfkey, the interface will be removed in 2027.

3) Update xfrm documentation and move it to ipsec maintainance.
   From Bagas Sanjaya.

* tag 'ipsec-next-2025-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
  MAINTAINERS: Add entry for XFRM documentation
  net: Move XFRM documentation into its own subdirectory
  Documentation: xfrm_sync: Number the fifth section
  Documentation: xfrm_sysctl: Trim trailing colon in section heading
  Documentation: xfrm_sync: Trim excess section heading characters
  Documentation: xfrm_sync: Properly reindent list text
  Documentation: xfrm_device: Separate hardware offload sublists
  Documentation: xfrm_device: Use numbered list for offloading steps
  Documentation: xfrm_device: Wrap iproute2 snippets in literal code block
  pfkey: Deprecate pfkey
  xfrm: Skip redundant replay recheck for the hardware offload path
  xfrm: Refactor xfrm_input lock to reduce contention with RSS
====================

Link: https://patch.msgid.link/20251118092610.2223552-1-steffen.klassert@secunet.com


Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parents f4e3402f 939ba8c5
Loading
Loading
Loading
Loading
+1 −4
Original line number Diff line number Diff line
@@ -131,10 +131,7 @@ Contents:
   vxlan
   x25
   x25-iface
   xfrm_device
   xfrm_proc
   xfrm_sync
   xfrm_sysctl
   xfrm/index
   xdp-rx-metadata
   xsk-tx-metadata

+13 −0
Original line number Diff line number Diff line
.. SPDX-License-Identifier: GPL-2.0

==============
XFRM Framework
==============

.. toctree::
   :maxdepth: 2

   xfrm_device
   xfrm_proc
   xfrm_sync
   xfrm_sysctl
+12 −8
Original line number Diff line number Diff line
@@ -20,11 +20,15 @@ can radically increase throughput and decrease CPU utilization. The XFRM
Device interface allows NIC drivers to offer to the stack access to the
hardware offload.

Right now, there are two types of hardware offload that kernel supports.
Right now, there are two types of hardware offload that kernel supports:

 * IPsec crypto offload:

   * NIC performs encrypt/decrypt
   * Kernel does everything else

 * IPsec packet offload:

   * NIC performs encrypt/decrypt
   * NIC does encapsulation
   * Kernel and NIC have SA and policy in-sync
@@ -34,7 +38,7 @@ Right now, there are two types of hardware offload that kernel supports.
Userland access to the offload is typically through a system such as
libreswan or KAME/raccoon, but the iproute2 'ip xfrm' command set can
be handy when experimenting.  An example command might look something
like this for crypto offload:
like this for crypto offload::

  ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \
     reqid 0x07 replay-window 32 \
@@ -42,7 +46,7 @@ like this for crypto offload:
     sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \
     offload dev eth4 dir in

and for packet offload
and for packet offload::

  ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \
     reqid 0x07 replay-window 32 \
@@ -153,26 +157,26 @@ the packet's skb. At this point the data should be decrypted but the
IPsec headers are still in the packet data; they are removed later up
the stack in xfrm_input().

	find and hold the SA that was used to the Rx skb::
1. Find and hold the SA that was used to the Rx skb::

		get spi, protocol, and destination IP from packet headers
		/* get spi, protocol, and destination IP from packet headers */
		xs = find xs from (spi, protocol, dest_IP)
		xfrm_state_hold(xs);

	store the state information into the skb::
2. Store the state information into the skb::

		sp = secpath_set(skb);
		if (!sp) return;
		sp->xvec[sp->len++] = xs;
		sp->olen++;

	indicate the success and/or error status of the offload::
3. Indicate the success and/or error status of the offload::

		xo = xfrm_offload(skb);
		xo->flags = CRYPTO_DONE;
		xo->status = crypto_status;

	hand the packet to napi_gro_receive() as usual
4. Hand the packet to napi_gro_receive() as usual.

In ESN mode, xdo_dev_state_advance_esn() is called from
xfrm_replay_advance_esn() for RX, and xfrm_replay_overflow_offload_esn for TX.
+50 −47
Original line number Diff line number Diff line
.. SPDX-License-Identifier: GPL-2.0

====
XFRM
====
=========
XFRM sync
=========

The sync patches work is based on initial patches from
Krisztian <hidden@balabit.hu> and others and additional patches
@@ -36,7 +36,7 @@ is not driven by packet arrival.
- the replay sequence for both inbound and outbound

1) Message Structure
----------------------
--------------------

nlmsghdr:aevent_id:optional-TLVs.

@@ -83,8 +83,8 @@ when going from kernel to user space)
A program needs to subscribe to multicast group XFRMNLGRP_AEVENTS
to get notified of these events.

2) TLVS reflect the different parameters:
-----------------------------------------
2) TLVS reflect the different parameters
----------------------------------------

a) byte value (XFRMA_LTIME_VAL)

@@ -106,8 +106,8 @@ d) expiry timer (XFRMA_ETIMER_THRESH)
   This is a timer value in milliseconds which is used as the nagle
   value to rate limit the events.

3) Default configurations for the parameters:
---------------------------------------------
3) Default configurations for the parameters
--------------------------------------------

By default these events should be turned off unless there is
at least one listener registered to listen to the multicast
@@ -121,11 +121,13 @@ in case they are not specified.
the two sysctls/proc entries are:

a) /proc/sys/net/core/sysctl_xfrm_aevent_etime
used to provide default values for the XFRMA_ETIMER_THRESH in incremental

   Used to provide default values for the XFRMA_ETIMER_THRESH in incremental
   units of time of 100ms. The default is 10 (1 second)

b) /proc/sys/net/core/sysctl_xfrm_aevent_rseqth
used to provide default values for XFRMA_REPLAY_THRESH parameter

   Used to provide default values for XFRMA_REPLAY_THRESH parameter
   in incremental packet count. The default is two packets.

4) Message types
@@ -138,6 +140,7 @@ The response is a XFRM_MSG_NEWAE which is formatted based on what
   XFRM_MSG_GETAE queried for.

   The response will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.

     * if XFRM_AE_RTHR flag is set, then XFRMA_REPLAY_THRESH is also retrieved
     * if XFRM_AE_ETHR flag is set, then XFRMA_ETIMER_THRESH is also retrieved

@@ -176,8 +179,8 @@ happened) is set to inform the user what happened.
Note the two flags are mutually exclusive.
The message will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.

Exceptions to threshold settings
--------------------------------
5) Exceptions to threshold settings
-----------------------------------

If you have an SA that is getting hit by traffic in bursts such that
there is a period where the timer threshold expires with no packets
Loading