This patch introduces virtqueue ops which is a set of callbacks
that will be called for different queue layout or features. This would
help to avoid branches for split/packed and will ease the future
implementation like in order.
Note that in order to eliminate the indirect calls this patch uses
global array of const ops to allow compiler to avoid indirect
branches.
Tested with CONFIG_MITIGATION_RETPOLINE, no performance differences
were noticed.
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-14-jasowang@redhat.com>
Switch to use unsigned int for virtqueue_poll_packed() to match
virtqueue_poll() and virtqueue_poll_split() and to ease the
abstraction of the virtqueue ops.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-13-jasowang@redhat.com>
Since the return value of vring_unmap_one_split() is exactly
vq->split.desc_extra[i].next, 'i = vq->split.desc_extra[i].next' is
redundant. Assign vring_unmap_one_split() to i instead.
Since vq->split.desc_extra is assigned to extra, use extra[i].next
instead of vq->split.desc_extra[i].next to improve readability.
No change in functionality.
Signed-off-by: zhangdongchuan <zhangdongchuan@eswincomputing.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <202511261140162936986@eswincomputing.com>
Documentation build reported:
WARNING: ./drivers/virtio/virtio_ring.c:3174 function parameter 'vaddr' not described in 'virtqueue_map_free_coherent'
WARNING: ./drivers/virtio/virtio_ring.c:3308 expecting prototype for virtqueue_mapping_error(). Prototype was for virtqueue_map_mapping_error() instead
The kernel-doc block for virtqueue_map_free_coherent() omitted the @vaddr parameter, and
the kernel-doc header for virtqueue_map_mapping_error() used the wrong function name
(virtqueue_mapping_error) instead of the actual function name.
This change updates:
- the function name in the comment to virtqueue_map_mapping_error()
- adds the missing @vaddr description in the comment for virtqueue_map_free_coherent()
Fixes: b41cb3bcf6 ("virtio: rename dma helpers")
Signed-off-by: Kriish Sharma <kriish.sharma2006@gmail.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251110202920.2250244-1-kriish.sharma2006@gmail.com>
Pull virtio updates from Michael Tsirkin:
"Just fixes and cleanups this time around. The mapping cleanups are
preparing the ground for new features, though"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio-vdpa: Drop redundant conversion to bool
vduse: Use fixed 4KB bounce pages for non-4KB page size
vduse: switch to use virtio map API instead of DMA API
vdpa: introduce map ops
vdpa: support virtio_map
virtio: introduce map ops in virtio core
virtio_ring: rename dma_handle to map_handle
virtio: introduce virtio_map container union
virtio: rename dma helpers
virtio_ring: switch to use dma_{map|unmap}_page()
virtio_ring: constify virtqueue pointer for DMA helpers
virtio_balloon: Remove redundant __GFP_NOWARN
vhost: vringh: Fix copy_to_iter return value check
vhost: vringh: Modify the return value check
This patch introduces map operations for virtio device. Virtio used to
use DMA API which is not necessarily the case since some devices
doesn't do DMA. Instead of using tricks and abusing DMA API, let's
simply abstract the current mapping logic into a virtio specific
mapping operations. For the device or transport that doesn't do DMA,
they can implement their own mapping logic without the need to trick
DMA core. In this case the mapping metadata is opaque to the virtio
core that will be passed back to the transport or device specific map
operations. For other devices, DMA API will still be used, so map
token will still be the dma device to minimize the changeset and
performance impact.
The mapping operations are abstracted as a independent structure
instead of reusing virtio_config_ops. This allows the transport can
simply reuse the structure for lower layers like vDPA.
A set of new mapping helpers were introduced for the device that want
to do mapping by themselves.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20250821064641.5025-7-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Following patch will introduce the mapping operations for virtio
device. In order to achieve this, besides the dma device, virtio core
needs to support a transport or device specific mapping metadata as well.
So this patch introduces a union container of a dma device. The idea
is the allow the transport layer to pass device specific mapping
metadata which will be used as a parameter for the virtio mapping
operations. For the transport or device that is using DMA, dma device
is still being used.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20250821064641.5025-5-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
The virtqueue_resize() function was not correctly propagating error codes
from its internal resize helper functions, specifically
virtqueue_resize_packet() and virtqueue_resize_split(). If these helpers
returned an error, but the subsequent call to virtqueue_enable_after_reset()
succeeded, the original error from the resize operation would be masked.
Consequently, virtqueue_resize() could incorrectly report success to its
caller despite an underlying resize failure.
This change restores the original code behavior:
if (vdev->config->enable_vq_after_reset(_vq))
return -EBUSY;
return err;
Fix: commit ad48d53b5b ("virtio_ring: separate the logic of reset/enable from virtqueue_resize")
Cc: xuanzhuo@linux.alibaba.com
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20250521092236.661410-2-lvivier@redhat.com
Tested-by: Lei Yang <leiyang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
syzbot reports a data-race when accessing the event_triggered, here is the
simplified stack when the issue occurred:
==================================================================
BUG: KCSAN: data-race in virtqueue_disable_cb / virtqueue_enable_cb_delayed
write to 0xffff8881025bc452 of 1 bytes by task 3288 on cpu 0:
virtqueue_enable_cb_delayed+0x42/0x3c0 drivers/virtio/virtio_ring.c:2653
start_xmit+0x230/0x1310 drivers/net/virtio_net.c:3264
__netdev_start_xmit include/linux/netdevice.h:5151 [inline]
netdev_start_xmit include/linux/netdevice.h:5160 [inline]
xmit_one net/core/dev.c:3800 [inline]
read to 0xffff8881025bc452 of 1 bytes by interrupt on cpu 1:
virtqueue_disable_cb_split drivers/virtio/virtio_ring.c:880 [inline]
virtqueue_disable_cb+0x92/0x180 drivers/virtio/virtio_ring.c:2566
skb_xmit_done+0x5f/0x140 drivers/net/virtio_net.c:777
vring_interrupt+0x161/0x190 drivers/virtio/virtio_ring.c:2715
__handle_irq_event_percpu+0x95/0x490 kernel/irq/handle.c:158
handle_irq_event_percpu kernel/irq/handle.c:193 [inline]
value changed: 0x01 -> 0x00
==================================================================
When the data race occurs, the function virtqueue_enable_cb_delayed() sets
event_triggered to false, and virtqueue_disable_cb_split/packed() reads it
as false due to the race condition. Since event_triggered is an unreliable
hint used for optimization, this should only cause the driver temporarily
suggest that the device not send an interrupt notification when the event
index is used.
Fix this KCSAN reported data-race issue by explicitly tagging the access as
data_racy.
Reported-by: syzbot+efe683d57990864b8c8e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/67c7761a.050a0220.15b4b9.0018.GAE@google.com/
Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com>
Message-Id: <20250312130412.3516307-1-quic_zhonhan@quicinc.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
When virtqueue_reset() has actually recycled all unused buffers,
additional work may be required in some cases. Relying solely on its
return status is fragile, so introduce a new function argument
'recycle_done', which is invoked when it really occurs.
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When virtqueue_resize() has actually recycled all unused buffers,
additional work may be required in some cases. Relying solely on its
return status is fragile, so introduce a new function argument
'recycle_done', which is invoked when the recycle really occurs.
Cc: <stable@vger.kernel.org> # v6.11+
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pull virtio updates from Michael Tsirkin:
"A small number of improvements all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_vdpa: remove redundant check on desc
virtio_fs: store actual queue index in mq_map
virtio_fs: add informative log for new tag discovery
virtio: Make vring_new_virtqueue support packed vring
virtio_pmem: Add freeze/restore callbacks
vdpa/mlx5: Fix suboptimal range on iotlb iteration
The current configuration sets the virtqueue (vq) to premapped mode,
implying that all buffers submitted to this queue must be mapped ahead
of time. This presents a challenge for the virtnet send queue (sq): the
virtnet driver would be required to keep track of dma information for vq
size * 17, which can be substantial. However, if the premapped mode were
applied on a per-buffer basis, the complexity would be greatly reduced.
With AF_XDP enabled, AF_XDP buffers would become premapped, while kernel
skb buffers could remain unmapped.
And consider that some sgs are not generated by the virtio driver,
that may be passed from the block stack. So we can not change the
sgs, new APIs are the better way.
So we pass the new argument 'premapped' to indicate the buffers
submitted to virtio are premapped in advance. Additionally,
DMA unmap operations for these buffers will be bypassed.
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20241112012928.102478-5-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Setting event_triggered from the interrupt handler
is fundamentally racy. There are races of 2 types:
1. vq processing can read false value while interrupt
triggered and set it to true.
result will be a bit of extra work when disabling cbs, no big deal.
1. vq processing can set false value then interrupt
immediately sets true value
since interrupt then triggers a callback which will
process buffers, this is also not an issue.
However, looks like KCSAN can not figure all this out, and warns about
the race between the write and the read. Tag the access data_racy for
now. We should probably look at ways to make this more
straight-forwardly correct.
Message-ID: <6bdd771a4fb7625a9227971b3cf4745c34c31a32.1726153334.git.mst@redhat.com>
Reported-by: syzbot+8a02104389c2e0ef5049@syzkaller.appspotmail.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Marco Elver <elver@google.com>
With ARCH=sh, make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/virtio/virtio.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/virtio/virtio_ring.o
Add the missing invocations of the MODULE_DESCRIPTION() macro.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Message-Id: <20240702-md-sh-drivers-virtio-v1-1-cf7325ab6ccc@quicinc.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When use_dma_api and premapped are true, then the do_unmap is false.
Because the do_unmap is false, vring_unmap_extra_packed is not called by
detach_buf_packed.
if (unlikely(vq->do_unmap)) {
curr = id;
for (i = 0; i < state->num; i++) {
vring_unmap_extra_packed(vq,
&vq->packed.desc_extra[curr]);
curr = vq->packed.desc_extra[curr].next;
}
}
So the indirect desc table is not unmapped. This causes the unmap leak.
So here, we check vq->use_dma_api instead. Synchronously, dma info is
updated based on use_dma_api judgment
This bug does not occur, because no driver use the premapped with
indirect.
Fixes: b319940f83 ("virtio_ring: skip unmap for premapped")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Message-Id: <20240223071833.26095-1-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In current packed virtqueue implementation, the avail_wrap_counter won't
flip, in the case when the driver supplies a descriptor chain with a
length equals to the queue size; total_sg == vq->packed.vring.num.
Let’s assume the following situation:
vq->packed.vring.num=4
vq->packed.next_avail_idx: 1
vq->packed.avail_wrap_counter: 0
Then the driver adds a descriptor chain containing 4 descriptors.
We expect the following result with avail_wrap_counter flipped:
vq->packed.next_avail_idx: 1
vq->packed.avail_wrap_counter: 1
But, the current implementation gives the following result:
vq->packed.next_avail_idx: 1
vq->packed.avail_wrap_counter: 0
To reproduce the bug, you can set a packed queue size as small as
possible, so that the driver is more likely to provide a descriptor
chain with a length equal to the packed queue size. For example, in
qemu run following commands:
sudo qemu-system-x86_64 \
-enable-kvm \
-nographic \
-kernel "path/to/kernel_image" \
-m 1G \
-drive file="path/to/rootfs",if=none,id=disk \
-device virtio-blk,drive=disk \
-drive file="path/to/disk_image",if=none,id=rwdisk \
-device virtio-blk,drive=rwdisk,packed=on,queue-size=4,\
indirect_desc=off \
-append "console=ttyS0 root=/dev/vda rw init=/bin/bash"
Inside the VM, create a directory and mount the rwdisk device on it. The
rwdisk will hang and mount operation will not complete.
This commit fixes the wrap counter error by flipping the
packed.avail_wrap_counter, when start of descriptor chain equals to the
end of descriptor chain (head == i).
Fixes: 1ce9e6055f ("virtio_ring: introduce packed ring support")
Signed-off-by: Yuan Yao <yuanyaogoog@chromium.org>
Message-Id: <20230808051110.3492693-1-yuanyaogoog@chromium.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
These API has been introduced:
* virtqueue_dma_need_sync
* virtqueue_dma_sync_single_range_for_cpu
* virtqueue_dma_sync_single_range_for_device
These APIs can be used together with the premapped mechanism to sync the
DMA address.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Message-Id: <20230810123057.43407-12-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Now we add a case where we skip dma unmap, the vq->premapped is true.
We can't just rely on use_dma_api to determine whether to skip the dma
operation. For convenience, I introduced the "do_unmap". By default, it
is the same as use_dma_api. If the driver is configured with premapped,
then do_unmap is false.
So as long as do_unmap is false, for addr of desc, we should skip dma
unmap operation.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Message-Id: <20230810123057.43407-7-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>