Commit cc1b6251 authored by Jakub Kicinski's avatar Jakub Kicinski
Browse files

Merge branch 'mptcp-memcg-accounting-for-passive-sockets-backlog-processing'

Matthieu Baerts says:

====================
mptcp: memcg accounting for passive sockets & backlog processing

This series is split in two: the 4 first patches are linked to memcg
accounting for passive sockets, and the rest introduce the backlog
processing. They are sent together, because the first one appeared to be
needed to get the second one fully working.

The second part includes RX path improvement built around backlog
processing. The main goals are improving the RX performances _and_
increase the long term maintainability.

- Patches 1-3: preparation work to ease the introduction of the next
  patch.

- Patch 4: fix memcg accounting for passive sockets. Note that this is a
  (non-urgent) fix, but it depends on material that is currently only in
  net-next, e.g. commit 4a997d49 ("tcp: Save lock_sock() for memcg
  in inet_csk_accept().").

- Patches 5-6: preparation of the stack for backlog processing, removing
  assumptions that will not hold true any more after the backlog
  introduction.

- Patches 7,8,10,11,12 are more cleanups that will make the backlog
  patch a little less huge.

- Patch 9: somewhat an unrelated cleanup, included here not to forget
  about it.

- Patches 13-14: The real work is done by them. Patch 13 introduces the
  helpers needed to manipulate the msk-level backlog, and the data
  struct itself, without any actual functional change. Patch 14 finally
  uses the backlog for RX skb processing. Note that MPTCP can't use the
  sk_backlog, as the MPTCP release callback can also release and
  re-acquire the msk-level spinlock and core backlog processing works
  under the assumption that such event is not possible.
  A relevant point is memory accounts for skbs in the backlog. It's
  somewhat "original" due to MPTCP constraints. Such skbs use space from
  the incoming subflow receive buffer, do not use explicitly any forward
  allocated memory, as we can't update the msk fwd mem while enqueuing,
  nor we want to acquire again the ssk socket lock while processing the
  skbs. Instead the msk borrows memory from the subflow and reserve it
  for the backlog, see patch 5 and 14 for the gory details.
====================

Link: https://patch.msgid.link/20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org


Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parents f296b73d 6228efe0
Loading
Loading
Loading
Loading
+2 −0
Original line number Diff line number Diff line
@@ -1631,6 +1631,8 @@ static inline void sk_mem_uncharge(struct sock *sk, int size)
	sk_mem_reclaim(sk);
}

void __sk_charge(struct sock *sk, gfp_t gfp);

#if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(CONFIG_MODULES)
static inline void sk_owner_set(struct sock *sk, struct module *owner)
{
+18 −0
Original line number Diff line number Diff line
@@ -3448,6 +3448,24 @@ void __sk_mem_reclaim(struct sock *sk, int amount)
}
EXPORT_SYMBOL(__sk_mem_reclaim);

void __sk_charge(struct sock *sk, gfp_t gfp)
{
	int amt;

	gfp |= __GFP_NOFAIL;
	if (mem_cgroup_from_sk(sk)) {
		/* The socket has not been accepted yet, no need
		 * to look at newsk->sk_wmem_queued.
		 */
		amt = sk_mem_pages(sk->sk_forward_alloc +
				   atomic_read(&sk->sk_rmem_alloc));
		if (amt)
			mem_cgroup_sk_charge(sk, amt, gfp);
	}

	kmem_cache_charge(sk, gfp);
}

int sk_set_peek_off(struct sock *sk, int val)
{
	WRITE_ONCE(sk->sk_peek_off, val);
+1 −16
Original line number Diff line number Diff line
@@ -756,23 +756,8 @@ EXPORT_SYMBOL(inet_stream_connect);
void __inet_accept(struct socket *sock, struct socket *newsock, struct sock *newsk)
{
	if (mem_cgroup_sockets_enabled) {
		gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL;

		mem_cgroup_sk_alloc(newsk);

		if (mem_cgroup_from_sk(newsk)) {
			int amt;

			/* The socket has not been accepted yet, no need
			 * to look at newsk->sk_wmem_queued.
			 */
			amt = sk_mem_pages(newsk->sk_forward_alloc +
					   atomic_read(&newsk->sk_rmem_alloc));
			if (amt)
				mem_cgroup_sk_charge(newsk, amt, gfp);
		}

		kmem_cache_charge(newsk, gfp);
		__sk_charge(newsk, GFP_KERNEL);
	}

	sock_rps_record_flow(newsk);
+3 −1
Original line number Diff line number Diff line
@@ -32,7 +32,8 @@ void mptcp_fastopen_subflow_synack_set_params(struct mptcp_subflow_context *subf
	/* dequeue the skb from sk receive queue */
	__skb_unlink(skb, &ssk->sk_receive_queue);
	skb_ext_reset(skb);
	skb_orphan(skb);

	mptcp_subflow_lend_fwdmem(subflow, skb);

	/* We copy the fastopen data, but that don't belong to the mptcp sequence
	 * space, need to offset it in the subflow sequence, see mptcp_subflow_get_map_offset()
@@ -50,6 +51,7 @@ void mptcp_fastopen_subflow_synack_set_params(struct mptcp_subflow_context *subf
	mptcp_data_lock(sk);
	DEBUG_NET_WARN_ON_ONCE(sock_owned_by_user_nocheck(sk));

	mptcp_borrow_fwdmem(sk, skb);
	skb_set_owner_r(skb, sk);
	__skb_queue_tail(&sk->sk_receive_queue, skb);
	mptcp_sk(sk)->bytes_received += skb->len;
+0 −1
Original line number Diff line number Diff line
@@ -71,7 +71,6 @@ static const struct snmp_mib mptcp_snmp_list[] = {
	SNMP_MIB_ITEM("MPFastcloseRx", MPTCP_MIB_MPFASTCLOSERX),
	SNMP_MIB_ITEM("MPRstTx", MPTCP_MIB_MPRSTTX),
	SNMP_MIB_ITEM("MPRstRx", MPTCP_MIB_MPRSTRX),
	SNMP_MIB_ITEM("RcvPruned", MPTCP_MIB_RCVPRUNED),
	SNMP_MIB_ITEM("SubflowStale", MPTCP_MIB_SUBFLOWSTALE),
	SNMP_MIB_ITEM("SubflowRecover", MPTCP_MIB_SUBFLOWRECOVER),
	SNMP_MIB_ITEM("SndWndShared", MPTCP_MIB_SNDWNDSHARED),
Loading