Commit a353e726 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull virtio updates from Michael Tsirkin:

 - in-order support in virtio core

 - multiple address space support in vduse

 - fixes, cleanups all over the place, notably dma alignment fixes for
   non-cache-coherent systems

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (59 commits)
  vduse: avoid adding implicit padding
  vhost: fix caching attributes of MMIO regions by setting them explicitly
  vdpa/mlx5: update MAC address handling in mlx5_vdpa_set_attr()
  vdpa/mlx5: reuse common function for MAC address updates
  vdpa/mlx5: update mlx_features with driver state check
  crypto: virtio: Replace package id with numa node id
  crypto: virtio: Remove duplicated virtqueue_kick in virtio_crypto_skcipher_crypt_req
  crypto: virtio: Add spinlock protection with virtqueue notification
  Documentation: Add documentation for VDUSE Address Space IDs
  vduse: bump version number
  vduse: add vq group asid support
  vduse: merge tree search logic of IOTLB_GET_FD and IOTLB_GET_INFO ioctls
  vduse: take out allocations from vduse_dev_alloc_coherent
  vduse: remove unused vaddr parameter of vduse_domain_free_coherent
  vduse: refactor vdpa_dev_add for goto err handling
  vhost: forbid change vq groups ASID if DRIVER_OK is set
  vdpa: document set_group_asid thread safety
  vduse: return internal vq group struct as map token
  vduse: add vq group support
  vduse: add v1 API definition
  ...
parents cb557386 ebcff9da
Loading
Loading
Loading
Loading
+52 −0
Original line number Diff line number Diff line
@@ -146,6 +146,58 @@ What about block I/O and networking buffers? The block I/O and
networking subsystems make sure that the buffers they use are valid
for you to DMA from/to.

__dma_from_device_group_begin/end annotations
=============================================

As explained previously, when a structure contains a DMA_FROM_DEVICE /
DMA_BIDIRECTIONAL buffer (device writes to memory) alongside fields that the
CPU writes to, cache line sharing between the DMA buffer and CPU-written fields
can cause data corruption on CPUs with DMA-incoherent caches.

The ``__dma_from_device_group_begin(GROUP)/__dma_from_device_group_end(GROUP)``
macros ensure proper alignment to prevent this::

	struct my_device {
		spinlock_t lock1;
		__dma_from_device_group_begin();
		char dma_buffer1[16];
		char dma_buffer2[16];
		__dma_from_device_group_end();
		spinlock_t lock2;
	};

To isolate a DMA buffer from adjacent fields, use
``__dma_from_device_group_begin(GROUP)`` before the first DMA buffer
field and ``__dma_from_device_group_end(GROUP)`` after the last DMA
buffer field (with the same GROUP name). This protects both the head
and tail of the buffer from cache line sharing.

The GROUP parameter is an optional identifier that names the DMA buffer group
(in case you have several in the same structure)::

	struct my_device {
		spinlock_t lock1;
		__dma_from_device_group_begin(buffer1);
		char dma_buffer1[16];
		__dma_from_device_group_end(buffer1);
		spinlock_t lock2;
		__dma_from_device_group_begin(buffer2);
		char dma_buffer2[16];
		__dma_from_device_group_end(buffer2);
	};

On cache-coherent platforms these macros expand to zero-length array markers.
On non-coherent platforms, they also ensure the minimal DMA alignment, which
can be as large as 128 bytes.

.. note::

        It is allowed (though somewhat fragile) to include extra fields, not
        intended for DMA from the device, within the group (in order to pack the
        structure tightly) - but only as long as the CPU does not write these
        fields while any fields in the group are mapped for DMA_FROM_DEVICE or
        DMA_BIDIRECTIONAL.

DMA addressing capabilities
===========================

+9 −0
Original line number Diff line number Diff line
@@ -148,3 +148,12 @@ DMA_ATTR_MMIO is appropriate.
For architectures that require cache flushing for DMA coherence
DMA_ATTR_MMIO will not perform any cache flushing. The address
provided must never be mapped cacheable into the CPU.

DMA_ATTR_CPU_CACHE_CLEAN
------------------------

This attribute indicates the CPU will not dirty any cacheline overlapping this
DMA_FROM_DEVICE/DMA_BIDIRECTIONAL buffer while it is mapped. This allows
multiple small buffers to safely share a cacheline without risk of data
corruption, suppressing DMA debug warnings about overlapping mappings.
All mappings sharing a cacheline should have this attribute.
+53 −0
Original line number Diff line number Diff line
@@ -230,4 +230,57 @@ able to start the dataplane processing as follows:
5. Inject an interrupt for specific virtqueue with the VDUSE_INJECT_VQ_IRQ ioctl
   after the used ring is filled.

Enabling ASID (API version 1)
------------------------------

VDUSE supports per-address-space identifiers (ASIDs) starting with API
version 1. Set it up with ioctl(VDUSE_SET_API_VERSION) on `/dev/vduse/control`
and pass `VDUSE_API_VERSION_1` before creating a new VDUSE instance with
ioctl(VDUSE_CREATE_DEV).

Afterwards, you can use the member asid of ioctl(VDUSE_VQ_SETUP) argument to
select the address space of the IOTLB you are querying.  The driver could
change the address space of any virtqueue group by using the
VDUSE_SET_VQ_GROUP_ASID VDUSE message type, and the VDUSE instance needs to
reply with VDUSE_REQ_RESULT_OK if it was possible to change it.

Similarly, you can use ioctl(VDUSE_IOTLB_GET_FD2) to obtain the file descriptor
describing an IOVA region of a specific ASID. Example usage:

.. code-block:: c

	static void *iova_to_va(int dev_fd, uint32_t asid, uint64_t iova,
	                        uint64_t *len)
	{
		int fd;
		void *addr;
		size_t size;
		struct vduse_iotlb_entry_v2 entry = { 0 };

		entry.v1.start = iova;
		entry.v1.last = iova;
		entry.asid = asid;

		fd = ioctl(dev_fd, VDUSE_IOTLB_GET_FD2, &entry);
		if (fd < 0)
			return NULL;

		size = entry.v1.last - entry.v1.start + 1;
		*len = entry.v1.last - iova + 1;
		addr = mmap(0, size, perm_to_prot(entry.v1.perm), MAP_SHARED,
			    fd, entry.v1.offset);
		close(fd);
		if (addr == MAP_FAILED)
			return NULL;

		/*
		 * Using some data structures such as linked list to store
		 * the iotlb mapping. The munmap(2) should be called for the
		 * cached mapping when the corresponding VDUSE_UPDATE_IOTLB
		 * message is received or the device is reset.
		 */

		return addr + iova - entry.v1.start;
	}

For more details on the uAPI, please see include/uapi/linux/vduse.h.
+3 −0
Original line number Diff line number Diff line
@@ -11,6 +11,7 @@
#include <linux/spinlock.h>
#include <linux/virtio.h>
#include <linux/virtio_rng.h>
#include <linux/dma-mapping.h>
#include <linux/module.h>
#include <linux/slab.h>

@@ -28,11 +29,13 @@ struct virtrng_info {
	unsigned int data_avail;
	unsigned int data_idx;
	/* minimal size returned by rng_buffer_size() */
	__dma_from_device_group_begin();
#if SMP_CACHE_BYTES < 32
	u8 data[32];
#else
	u8 data[SMP_CACHE_BYTES];
#endif
	__dma_from_device_group_end();
};

static void random_recv_done(struct virtqueue *vq)
+11 −4
Original line number Diff line number Diff line
@@ -10,6 +10,7 @@
 */

#include <linux/completion.h>
#include <linux/dma-mapping.h>
#include <linux/err.h>
#include <linux/gpio/driver.h>
#include <linux/io.h>
@@ -24,9 +25,13 @@
struct virtio_gpio_line {
	struct mutex lock; /* Protects line operation */
	struct completion completion;
	struct virtio_gpio_request req ____cacheline_aligned;
	struct virtio_gpio_response res ____cacheline_aligned;

	unsigned int rxlen;

	__dma_from_device_group_begin();
	struct virtio_gpio_request req;
	struct virtio_gpio_response res;
	__dma_from_device_group_end();
};

struct vgpio_irq_line {
@@ -37,8 +42,10 @@ struct vgpio_irq_line {
	bool update_pending;
	bool queue_pending;

	struct virtio_gpio_irq_request ireq ____cacheline_aligned;
	struct virtio_gpio_irq_response ires ____cacheline_aligned;
	__dma_from_device_group_begin();
	struct virtio_gpio_irq_request ireq;
	struct virtio_gpio_irq_response ires;
	__dma_from_device_group_end();
};

struct virtio_gpio {
Loading