Commit 092e3350 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull rdma updates from Jason Gunthorpe:

 - Usual minor updates and fixes for bnxt_re, hfi1, rxe, mana, iser,
   mlx5, vmw_pvrdma, hns

 - Make rxe work on tun devices

 - mana gains more standard verbs as it moves toward supporting
   in-kernel verbs

 - DMABUF support for mana

 - Fix page size calculations when memory registration exceeds 4G

 - On Demand Paging support for rxe

 - mlx5 support for RDMA TRANSPORT flow tables and a new ucap mechanism
   to access control use of them

 - Optional RDMA_TX/RX counters per QP in mlx5

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (73 commits)
  IB/mad: Check available slots before posting receive WRs
  RDMA/mana_ib: Fix integer overflow during queue creation
  RDMA/mlx5: Fix calculation of total invalidated pages
  RDMA/mlx5: Fix mlx5_poll_one() cur_qp update flow
  RDMA/mlx5: Fix page_size variable overflow
  RDMA/mlx5: Drop access_flags from _mlx5_mr_cache_alloc()
  RDMA/mlx5: Fix cache entry update on dereg error
  RDMA/mlx5: Fix MR cache initialization error flow
  RDMA/mlx5: Support optional-counters binding for QPs
  RDMA/mlx5: Compile fs.c regardless of INFINIBAND_USER_ACCESS config
  RDMA/core: Pass port to counter bind/unbind operations
  RDMA/core: Add support to optional-counters binding configuration
  RDMA/core: Create and destroy rdma_counter using rdma_zalloc_drv_obj()
  RDMA/mlx5: Add optional counters for RDMA_TX/RX_packets/bytes
  RDMA/core: Fix use-after-free when rename device name
  RDMA/bnxt_re: Support perf management counters
  RDMA/rxe: Fix incorrect return value of rxe_odp_atomic_op()
  RDMA/uverbs: Propagate errors from rdma_lookup_get_uobject()
  RDMA/mana_ib: Handle net event for pointing to the current netdev
  net: mana: Change the function signature of mana_get_primary_netdev_rcu
  ...
parents 0ccff074 37826f0a
Loading
Loading
Loading
Loading
+1 −0
Original line number Diff line number Diff line
@@ -12,6 +12,7 @@ InfiniBand
   opa_vnic
   sysfs
   tag_matching
   ucaps
   user_mad
   user_verbs

+71 −0
Original line number Diff line number Diff line
=================================
Infiniband Userspace Capabilities
=================================

   User CAPabilities (UCAPs) provide fine-grained control over specific
   firmware features in Infiniband (IB) devices. This approach offers
   more granular capabilities than the existing Linux capabilities,
   which may be too generic for certain FW features.

   Each user capability is represented as a character device with root
   read-write access. Root processes can grant users special privileges
   by allowing access to these character devices (e.g., using chown).

Usage
=====

   UCAPs allow control over specific features of an IB device using file
   descriptors of UCAP character devices. Here is how a user enables
   specific features of an IB device:

      * A root process grants the user access to the UCAP files that
        represents the capabilities (e.g., using chown).
      * The user opens the UCAP files, obtaining file descriptors.
      * When opening an IB device, include an array of the UCAP file
        descriptors as an attribute.
      * The ib_uverbs driver recognizes the UCAP file descriptors and enables
        the corresponding capabilities for the IB device.

Creating UCAPs
==============

   To create a new UCAP, drivers must first define a type in the
   rdma_user_cap enum in rdma/ib_ucaps.h. The name of the UCAP character
   device should be added to the ucap_names array in
   drivers/infiniband/core/ucaps.c. Then, the driver can create the UCAP
   character device by calling the ib_create_ucap API with the UCAP
   type.

   A reference count is stored for each UCAP to track creations and
   removals of the UCAP device. If multiple creation calls are made with
   the same type (e.g., for two IB devices), the UCAP character device
   is created during the first call and subsequent calls increment the
   reference count.

   The UCAP character device is created under /dev/infiniband, and its
   permissions are set to allow root read and write access only.

Removing UCAPs
==============

   Each removal decrements the reference count of the UCAP. The UCAP
   character device is removed from the filesystem only when the
   reference count is decreased to 0.

/dev and /sys/class files
=========================

   The class::

      /sys/class/infiniband_ucaps

   is created when the first UCAP character device is created.

   The UCAP character device is created under /dev/infiniband.

   For example, if mlx5_ib adds the rdma_user_cap
   RDMA_UCAP_MLX5_CTRL_LOCAL with name "mlx5_perm_ctrl_local", this will
   create the device node::

      /dev/infiniband/mlx5_perm_ctrl_local
+2 −1
Original line number Diff line number Diff line
@@ -39,6 +39,7 @@ ib_uverbs-y := uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
				uverbs_std_types_async_fd.o \
				uverbs_std_types_srq.o \
				uverbs_std_types_wq.o \
				uverbs_std_types_qp.o
				uverbs_std_types_qp.o \
				ucaps.o
ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o umem_dmabuf.o
ib_uverbs-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o
+6 −0
Original line number Diff line number Diff line
@@ -1501,6 +1501,12 @@ ib_cache_update(struct ib_device *device, u32 port, bool update_gids,
		device->port_data[port].cache.pkey = pkey_cache;
	}
	device->port_data[port].cache.lmc = tprops->lmc;

	if (device->port_data[port].cache.port_state != IB_PORT_NOP &&
	    device->port_data[port].cache.port_state != tprops->state)
		ibdev_info(device, "Port: %d Link %s\n", port,
			   ib_port_state_to_str(tprops->state));

	device->port_data[port].cache.port_state = tprops->state;

	device->port_data[port].cache.subnet_prefix = tprops->subnet_prefix;
+19 −5
Original line number Diff line number Diff line
@@ -739,6 +739,19 @@ cma_validate_port(struct ib_device *device, u32 port,
		goto out;
	}

	/*
	 * For a RXE device, it should work with TUN device and normal ethernet
	 * devices. Use driver_id to check if a device is a RXE device or not.
	 * ARPHDR_NONE means a TUN device.
	 */
	if (device->ops.driver_id == RDMA_DRIVER_RXE) {
		if ((dev_type == ARPHRD_NONE || dev_type == ARPHRD_ETHER)
			&& rdma_protocol_roce(device, port)) {
			ndev = dev_get_by_index(dev_addr->net, bound_if_index);
			if (!ndev)
				goto out;
		}
	} else {
		if (dev_type == ARPHRD_ETHER && rdma_protocol_roce(device, port)) {
			ndev = dev_get_by_index(dev_addr->net, bound_if_index);
			if (!ndev)
@@ -746,6 +759,7 @@ cma_validate_port(struct ib_device *device, u32 port,
		} else {
			gid_type = IB_GID_TYPE_IB;
		}
	}

	sgid_attr = rdma_find_gid_by_port(device, gid, gid_type, port, ndev);
	dev_put(ndev);
Loading