Commit 35527de5 authored by Jakub Kicinski's avatar Jakub Kicinski
Browse files

Merge branch 'geneve-introduce-double-tunnel-gso-gro-support'

Paolo Abeni says:

====================
geneve: introduce double tunnel GSO/GRO support

This is the [belated] incarnation of topic discussed in the last Neconf
[1].

In container orchestration in virtual environments there is a consistent
usage of double UDP tunneling - specifically geneve. Such setup lack
support of GRO and GSO for inter VM traffic.

After commit b430f6c3 ("Merge branch 'virtio_udp_tunnel_08_07_2025'
of https://github.com/pabeni/linux-devel") and the qemu cunter-part, VMs
are able to send/receive GSO over UDP aggregated packets.

This series introduces the missing bit for full end-to-end aggregation
in the above mentioned scenario. Specifically:

- introduces a new netdev feature set to generalize existing per device
driver GSO admission check.1
- adds GSO partial support for the geneve and vxlan drivers
- introduces and use a geneve option to assist double tunnel GRO
- adds some simple functional tests for the above.

The new device features set is not strictly needed for the following
work, but avoids the introduction of trivial `ndo_features_check` to
support GSO partial and thus possible performance regression due to the
additional indirect call. Such feature set could be leveraged by a
number of existing drivers (intel, meta and possibly wangxun) to avoid
duplicate code/tests. Such part has been omitted here to keep the series
small.

Both GSO partial support and double GRO support have some downsides.
With the first in place, GSO partial packets will traverse the network
stack 'downstream' the outer geneve UDP tunnel and will be visible by
the udp/IP/IPv6 and by netfilter. Currently only H/W NICs implement GSO
partial support and such packets are visible only via software taps.

Double UDP tunnel GRO will cook 'GSO partial' like aggregate packets,
i.e. the inner UDP encapsulation headers set will still carry the
wire-level lengths and csum, so that segmentation considering such
headers parts of a giant, constant encapsulation header will yield the
correct result.

The correct GSO packet layout is applied when the packet traverse the
outermost geneve encapsulation.

Both GSO partial and double UDP encap are disabled by default and must
be explicitly enabled via, respectively ethtool and geneve device
configuration.

Finally note that the GSO partial feature could potentially be applied
to all the other UDP tunnels, but this series limits its usage to geneve
and vxlan devices.

Link: https://netdev.bots.linux.dev/netconf/2024/paolo.pdf [1]
====================

Link: https://patch.msgid.link/cover.1769011015.git.pabeni@redhat.com


Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parents 8811df1d 40146bf7
Loading
Loading
Loading
Loading
+3 −0
Original line number Diff line number Diff line
@@ -1914,6 +1914,9 @@ attribute-sets:
        name: port-range
        type: binary
        struct: ifla-geneve-port-range
      -
        name: gro-hint
        type: flag
  -
    name: linkinfo-hsr-attrs
    name-prefix: ifla-hsr-
+523 −34

File changed.

Preview size limit exceeded, changes collapsed.

+13 −3
Original line number Diff line number Diff line
@@ -2183,11 +2183,12 @@ static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst,
			   struct vxlan_metadata *md, u32 vxflags,
			   bool udp_sum)
{
	int type = udp_sum ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
	__be16 inner_protocol = htons(ETH_P_TEB);
	struct vxlanhdr *vxh;
	bool double_encap;
	int min_headroom;
	int err;
	int type = udp_sum ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
	__be16 inner_protocol = htons(ETH_P_TEB);

	if ((vxflags & VXLAN_F_REMCSUM_TX) &&
	    skb->ip_summed == CHECKSUM_PARTIAL) {
@@ -2208,6 +2209,7 @@ static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst,
	if (unlikely(err))
		return err;

	double_encap = udp_tunnel_handle_partial(skb);
	err = iptunnel_handle_offloads(skb, type);
	if (err)
		return err;
@@ -2238,7 +2240,7 @@ static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst,
		inner_protocol = skb->protocol;
	}

	skb_set_inner_protocol(skb, inner_protocol);
	udp_tunnel_set_inner_protocol(skb, double_encap, inner_protocol);
	return 0;
}

@@ -3348,10 +3350,18 @@ static void vxlan_setup(struct net_device *dev)
	dev->features   |= NETIF_F_RXCSUM;
	dev->features   |= NETIF_F_GSO_SOFTWARE;

	/* Partial features are disabled by default. */
	dev->vlan_features = dev->features;
	dev->hw_features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_FRAGLIST;
	dev->hw_features |= NETIF_F_RXCSUM;
	dev->hw_features |= NETIF_F_GSO_SOFTWARE;
	dev->hw_features |= UDP_TUNNEL_PARTIAL_FEATURES;
	dev->hw_features |= NETIF_F_GSO_PARTIAL;

	dev->hw_enc_features = dev->hw_features;
	dev->gso_partial_features = UDP_TUNNEL_PARTIAL_FEATURES;
	dev->mangleid_features = NETIF_F_GSO_PARTIAL;

	netif_keep_dst(dev);
	dev->priv_flags |= IFF_NO_QUEUE;
	dev->change_proto_down = true;
+3 −0
Original line number Diff line number Diff line
@@ -1831,6 +1831,8 @@ enum netdev_reg_state {
 *
 *	@mpls_features:	Mask of features inheritable by MPLS
 *	@gso_partial_features: value(s) from NETIF_F_GSO\*
 *	@mangleid_features:	Mask of features requiring MANGLEID, will be
 *				disabled together with the latter.
 *
 *	@ifindex:	interface index
 *	@group:		The group the device belongs to
@@ -2219,6 +2221,7 @@ struct net_device {
	netdev_features_t	vlan_features;
	netdev_features_t	hw_enc_features;
	netdev_features_t	mpls_features;
	netdev_features_t	mangleid_features;

	unsigned int		min_mtu;
	unsigned int		max_mtu;
+32 −0

File changed.

Preview size limit exceeded, changes collapsed.

Loading