Commit f38805c5 authored by Jason Xing's avatar Jason Xing Committed by Jakub Kicinski
Browse files

tcp: support TCP_RTO_MIN_US for set/getsockopt use



Support adjusting/reading RTO MIN for socket level by using set/getsockopt().

This new option has the same effect as TCP_BPF_RTO_MIN, which means it
doesn't affect RTAX_RTO_MIN usage (by using ip route...). Considering that
bpf option was implemented before this patch, so we need to use a standalone
new option for pure tcp set/getsockopt() use.

When the socket is created, its icsk_rto_min is set to the default
value that is controlled by sysctl_tcp_rto_min_us. Then if application
calls setsockopt() with TCP_RTO_MIN_US flag to pass a valid value, then
icsk_rto_min will be overridden in jiffies unit.

This patch adds WRITE_ONCE/READ_ONCE to avoid data-race around
icsk_rto_min.

Signed-off-by: default avatarJason Xing <kerneljasonxing@gmail.com>
Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250317120314.41404-2-kerneljasonxing@gmail.com


Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parent 98b2c048
Loading
Loading
Loading
Loading
+2 −2
Original line number Diff line number Diff line
@@ -1229,8 +1229,8 @@ tcp_pingpong_thresh - INTEGER
tcp_rto_min_us - INTEGER
	Minimal TCP retransmission timeout (in microseconds). Note that the
	rto_min route option has the highest precedence for configuring this
	setting, followed by the TCP_BPF_RTO_MIN socket option, followed by
	this tcp_rto_min_us sysctl.
	setting, followed by the TCP_BPF_RTO_MIN and TCP_RTO_MIN_US socket
	options, followed by this tcp_rto_min_us sysctl.

	The recommended practice is to use a value less or equal to 200000
	microseconds.
+1 −1
Original line number Diff line number Diff line
@@ -844,7 +844,7 @@ u32 tcp_delack_max(const struct sock *sk);
static inline u32 tcp_rto_min(const struct sock *sk)
{
	const struct dst_entry *dst = __sk_dst_get(sk);
	u32 rto_min = inet_csk(sk)->icsk_rto_min;
	u32 rto_min = READ_ONCE(inet_csk(sk)->icsk_rto_min);

	if (dst && dst_metric_locked(dst, RTAX_RTO_MIN))
		rto_min = dst_metric_rtt(dst, RTAX_RTO_MIN);
+1 −0
Original line number Diff line number Diff line
@@ -140,6 +140,7 @@ enum {

#define TCP_IS_MPTCP		43	/* Is MPTCP being used? */
#define TCP_RTO_MAX_MS		44	/* max rto time in ms */
#define TCP_RTO_MIN_US		45	/* min rto time in us */

#define TCP_REPAIR_ON		1
#define TCP_REPAIR_OFF		0
+12 −1
Original line number Diff line number Diff line
@@ -3352,7 +3352,7 @@ int tcp_disconnect(struct sock *sk, int flags)
	icsk->icsk_probes_out = 0;
	icsk->icsk_probes_tstamp = 0;
	icsk->icsk_rto = TCP_TIMEOUT_INIT;
	icsk->icsk_rto_min = TCP_RTO_MIN;
	WRITE_ONCE(icsk->icsk_rto_min, TCP_RTO_MIN);
	icsk->icsk_delack_max = TCP_DELACK_MAX;
	tp->snd_ssthresh = TCP_INFINITE_SSTHRESH;
	tcp_snd_cwnd_set(tp, TCP_INIT_CWND);
@@ -3833,6 +3833,14 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname,
			return -EINVAL;
		WRITE_ONCE(inet_csk(sk)->icsk_rto_max, msecs_to_jiffies(val));
		return 0;
	case TCP_RTO_MIN_US: {
		int rto_min = usecs_to_jiffies(val);

		if (rto_min > TCP_RTO_MIN || rto_min < TCP_TIMEOUT_MIN)
			return -EINVAL;
		WRITE_ONCE(inet_csk(sk)->icsk_rto_min, rto_min);
		return 0;
	}
	}

	sockopt_lock_sock(sk);
@@ -4672,6 +4680,9 @@ int do_tcp_getsockopt(struct sock *sk, int level,
	case TCP_RTO_MAX_MS:
		val = jiffies_to_msecs(tcp_rto_max(sk));
		break;
	case TCP_RTO_MIN_US:
		val = jiffies_to_usecs(READ_ONCE(inet_csk(sk)->icsk_rto_min));
		break;
	default:
		return -ENOPROTOOPT;
	}