Commit bf02ba6d authored by Jakub Kicinski's avatar Jakub Kicinski
Browse files

Merge branch 'netdev-add-per-queue-statistics'

Jakub Kicinski says:

====================
netdev: add per-queue statistics

Per queue stats keep coming up, so it's about time someone laid
the foundation. This series adds the uAPI, a handful of stats
and a sample support for bnxt. It's not very comprehensive in
terms of stat types or driver support. The expectation is that
the support will grow organically. If we have the basic pieces
in place it will be easy for reviewers to request new stats,
or use of the API in place of ethtool -S.

See patch 3 for sample output.

v2: https://lore.kernel.org/all/20240229010221.2408413-1-kuba@kernel.org/
v1: https://lore.kernel.org/all/20240226211015.1244807-1-kuba@kernel.org/
rfc: https://lore.kernel.org/all/20240222223629.158254-1-kuba@kernel.org/
====================

Link: https://lore.kernel.org/r/20240306195509.1502746-1-kuba@kernel.org


Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parents e8bb2ccf af7b3b4a
Loading
Loading
Loading
Loading
+91 −0
Original line number Diff line number Diff line
@@ -74,6 +74,10 @@ definitions:
    name: queue-type
    type: enum
    entries: [ rx, tx ]
  -
    name: qstats-scope
    type: flags
    entries: [ queue ]

attribute-sets:
  -
@@ -265,6 +269,73 @@ attribute-sets:
        doc: ID of the NAPI instance which services this queue.
        type: u32

  -
    name: qstats
    doc: |
      Get device statistics, scoped to a device or a queue.
      These statistics extend (and partially duplicate) statistics available
      in struct rtnl_link_stats64.
      Value of the `scope` attribute determines how statistics are
      aggregated. When aggregated for the entire device the statistics
      represent the total number of events since last explicit reset of
      the device (i.e. not a reconfiguration like changing queue count).
      When reported per-queue, however, the statistics may not add
      up to the total number of events, will only be reported for currently
      active objects, and will likely report the number of events since last
      reconfiguration.
    attributes:
      -
        name: ifindex
        doc: ifindex of the netdevice to which stats belong.
        type: u32
        checks:
          min: 1
      -
        name: queue-type
        doc: Queue type as rx, tx, for queue-id.
        type: u32
        enum: queue-type
      -
        name: queue-id
        doc: Queue ID, if stats are scoped to a single queue instance.
        type: u32
      -
        name: scope
        doc: |
          What object type should be used to iterate over the stats.
        type: uint
        enum: qstats-scope
      -
        name: rx-packets
        doc: |
          Number of wire packets successfully received and passed to the stack.
          For drivers supporting XDP, XDP is considered the first layer
          of the stack, so packets consumed by XDP are still counted here.
        type: uint
        value: 8 # reserve some attr ids in case we need more metadata later
      -
        name: rx-bytes
        doc: Successfully received bytes, see `rx-packets`.
        type: uint
      -
        name: tx-packets
        doc: |
          Number of wire packets successfully sent. Packet is considered to be
          successfully sent once it is in device memory (usually this means
          the device has issued a DMA completion for the packet).
        type: uint
      -
        name: tx-bytes
        doc: Successfully sent bytes, see `tx-packets`.
        type: uint
      -
        name: rx-alloc-fail
        doc: |
          Number of times skb or buffer allocation failed on the Rx datapath.
          Allocation failure may, or may not result in a packet drop, depending
          on driver implementation and whether system recovers quickly.
        type: uint

operations:
  list:
    -
@@ -405,6 +476,26 @@ operations:
          attributes:
            - ifindex
        reply: *napi-get-op
    -
      name: qstats-get
      doc: |
        Get / dump fine grained statistics. Which statistics are reported
        depends on the device and the driver, and whether the driver stores
        software counters per-queue.
      attribute-set: qstats
      dump:
        request:
          attributes:
            - scope
        reply:
          attributes:
            - ifindex
            - queue-type
            - queue-id
            - rx-packets
            - rx-bytes
            - tx-packets
            - tx-bytes

mcast-groups:
  list:
+15 −0
Original line number Diff line number Diff line
@@ -41,6 +41,15 @@ If `-s` is specified once the detailed errors won't be shown.

`ip` supports JSON formatting via the `-j` option.

Queue statistics
~~~~~~~~~~~~~~~~

Queue statistics are accessible via the netdev netlink family.

Currently no widely distributed CLI exists to access those statistics.
Kernel development tools (ynl) can be used to experiment with them,
see `Documentation/userspace-api/netlink/intro-specs.rst`.

Protocol-specific statistics
----------------------------

@@ -147,6 +156,12 @@ Statistics are reported both in the responses to link information
requests (`RTM_GETLINK`) and statistic requests (`RTM_GETSTATS`,
when `IFLA_STATS_LINK_64` bit is set in the `.filter_mask` of the request).

netdev (netlink)
~~~~~~~~~~~~~~~~

`netdev` generic netlink family allows accessing page pool and per queue
statistics.

ethtool
-------

+65 −0
Original line number Diff line number Diff line
@@ -14523,6 +14523,70 @@ static const struct net_device_ops bnxt_netdev_ops = {
	.ndo_bridge_setlink	= bnxt_bridge_setlink,
};

static void bnxt_get_queue_stats_rx(struct net_device *dev, int i,
				    struct netdev_queue_stats_rx *stats)
{
	struct bnxt *bp = netdev_priv(dev);
	struct bnxt_cp_ring_info *cpr;
	u64 *sw;

	cpr = &bp->bnapi[i]->cp_ring;
	sw = cpr->stats.sw_stats;

	stats->packets = 0;
	stats->packets += BNXT_GET_RING_STATS64(sw, rx_ucast_pkts);
	stats->packets += BNXT_GET_RING_STATS64(sw, rx_mcast_pkts);
	stats->packets += BNXT_GET_RING_STATS64(sw, rx_bcast_pkts);

	stats->bytes = 0;
	stats->bytes += BNXT_GET_RING_STATS64(sw, rx_ucast_bytes);
	stats->bytes += BNXT_GET_RING_STATS64(sw, rx_mcast_bytes);
	stats->bytes += BNXT_GET_RING_STATS64(sw, rx_bcast_bytes);

	stats->alloc_fail = cpr->sw_stats.rx.rx_oom_discards;
}

static void bnxt_get_queue_stats_tx(struct net_device *dev, int i,
				    struct netdev_queue_stats_tx *stats)
{
	struct bnxt *bp = netdev_priv(dev);
	struct bnxt_napi *bnapi;
	u64 *sw;

	bnapi = bp->tx_ring[bp->tx_ring_map[i]].bnapi;
	sw = bnapi->cp_ring.stats.sw_stats;

	stats->packets = 0;
	stats->packets += BNXT_GET_RING_STATS64(sw, tx_ucast_pkts);
	stats->packets += BNXT_GET_RING_STATS64(sw, tx_mcast_pkts);
	stats->packets += BNXT_GET_RING_STATS64(sw, tx_bcast_pkts);

	stats->bytes = 0;
	stats->bytes += BNXT_GET_RING_STATS64(sw, tx_ucast_bytes);
	stats->bytes += BNXT_GET_RING_STATS64(sw, tx_mcast_bytes);
	stats->bytes += BNXT_GET_RING_STATS64(sw, tx_bcast_bytes);
}

static void bnxt_get_base_stats(struct net_device *dev,
				struct netdev_queue_stats_rx *rx,
				struct netdev_queue_stats_tx *tx)
{
	struct bnxt *bp = netdev_priv(dev);

	rx->packets = bp->net_stats_prev.rx_packets;
	rx->bytes = bp->net_stats_prev.rx_bytes;
	rx->alloc_fail = bp->ring_err_stats_prev.rx_total_oom_discards;

	tx->packets = bp->net_stats_prev.tx_packets;
	tx->bytes = bp->net_stats_prev.tx_bytes;
}

static const struct netdev_stat_ops bnxt_stat_ops = {
	.get_queue_stats_rx	= bnxt_get_queue_stats_rx,
	.get_queue_stats_tx	= bnxt_get_queue_stats_tx,
	.get_base_stats		= bnxt_get_base_stats,
};

static void bnxt_remove_one(struct pci_dev *pdev)
{
	struct net_device *dev = pci_get_drvdata(pdev);
@@ -14970,6 +15034,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
		goto init_err_free;

	dev->netdev_ops = &bnxt_netdev_ops;
	dev->stat_ops = &bnxt_stat_ops;
	dev->watchdog_timeo = BNXT_TX_TIMEOUT;
	dev->ethtool_ops = &bnxt_ethtool_ops;
	pci_set_drvdata(pdev, dev);
+3 −0
Original line number Diff line number Diff line
@@ -1955,6 +1955,7 @@ enum netdev_reg_state {
 *
 *	@sysfs_rx_queue_group:	Space for optional per-rx queue attributes
 *	@rtnl_link_ops:	Rtnl_link_ops
 *	@stat_ops:	Optional ops for queue-aware statistics
 *
 *	@gso_max_size:	Maximum size of generic segmentation offload
 *	@tso_max_size:	Device (as in HW) limit on the max TSO request size
@@ -2335,6 +2336,8 @@ struct net_device {

	const struct rtnl_link_ops *rtnl_link_ops;

	const struct netdev_stat_ops *stat_ops;

	/* for setting kernel sock attribute on TCP connection setup */
#define GSO_MAX_SEGS		65535u
#define GSO_LEGACY_MAX_SIZE	65536u
+56 −0
Original line number Diff line number Diff line
@@ -4,6 +4,62 @@

#include <linux/netdevice.h>

/* See the netdev.yaml spec for definition of each statistic */
struct netdev_queue_stats_rx {
	u64 bytes;
	u64 packets;
	u64 alloc_fail;
};

struct netdev_queue_stats_tx {
	u64 bytes;
	u64 packets;
};

/**
 * struct netdev_stat_ops - netdev ops for fine grained stats
 * @get_queue_stats_rx:	get stats for a given Rx queue
 * @get_queue_stats_tx:	get stats for a given Tx queue
 * @get_base_stats:	get base stats (not belonging to any live instance)
 *
 * Query stats for a given object. The values of the statistics are undefined
 * on entry (specifically they are *not* zero-initialized). Drivers should
 * assign values only to the statistics they collect. Statistics which are not
 * collected must be left undefined.
 *
 * Queue objects are not necessarily persistent, and only currently active
 * queues are queried by the per-queue callbacks. This means that per-queue
 * statistics will not generally add up to the total number of events for
 * the device. The @get_base_stats callback allows filling in the delta
 * between events for currently live queues and overall device history.
 * When the statistics for the entire device are queried, first @get_base_stats
 * is issued to collect the delta, and then a series of per-queue callbacks.
 * Only statistics which are set in @get_base_stats will be reported
 * at the device level, meaning that unlike in queue callbacks, setting
 * a statistic to zero in @get_base_stats is a legitimate thing to do.
 * This is because @get_base_stats has a second function of designating which
 * statistics are in fact correct for the entire device (e.g. when history
 * for some of the events is not maintained, and reliable "total" cannot
 * be provided).
 *
 * Device drivers can assume that when collecting total device stats,
 * the @get_base_stats and subsequent per-queue calls are performed
 * "atomically" (without releasing the rtnl_lock).
 *
 * Device drivers are encouraged to reset the per-queue statistics when
 * number of queues change. This is because the primary use case for
 * per-queue statistics is currently to detect traffic imbalance.
 */
struct netdev_stat_ops {
	void (*get_queue_stats_rx)(struct net_device *dev, int idx,
				   struct netdev_queue_stats_rx *stats);
	void (*get_queue_stats_tx)(struct net_device *dev, int idx,
				   struct netdev_queue_stats_tx *stats);
	void (*get_base_stats)(struct net_device *dev,
			       struct netdev_queue_stats_rx *rx,
			       struct netdev_queue_stats_tx *tx);
};

/**
 * DOC: Lockless queue stopping / waking helpers.
 *
Loading