Commit 0c00ed30 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge tag 'for-7.0/block-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block updates from Jens Axboe:

 - Support for batch request processing for ublk, improving the
   efficiency of the kernel/ublk server communication. This can yield
   nice 7-12% performance improvements

 - Support for integrity data for ublk

 - Various other ublk improvements and additions, including a ton of
   selftests additions and updated

 - Move the handling of blk-crypto software fallback from below the
   block layer to above it. This reduces the complexity of dealing with
   bio splitting

 - Series fixing a number of potential deadlocks in blk-mq related to
   the queue usage counter and writeback throttling and rq-qos debugfs
   handling

 - Add an async_depth queue attribute, to resolve a performance
   regression that's been around for a qhilw related to the scheduler
   depth handling

 - Only use task_work for IOPOLL completions on NVMe, if it is necessary
   to do so. An earlier fix for an issue resulted in all these
   completions being punted to task_work, to guarantee that completions
   were only run for a given io_uring ring when it was local to that
   ring. With the new changes, we can detect if it's necessary to use
   task_work or not, and avoid it if possible.

 - rnbd fixes:
      - Fix refcount underflow in device unmap path
      - Handle PREFLUSH and NOUNMAP flags properly in protocol
      - Fix server-side bi_size for special IOs
      - Zero response buffer before use
      - Fix trace format for flags
      - Add .release to rnbd_dev_ktype

 - MD pull requests via Yu Kuai
      - Fix raid5_run() to return error when log_init() fails
      - Fix IO hang with degraded array with llbitmap
      - Fix percpu_ref not resurrected on suspend timeout in llbitmap
      - Fix GPF in write_page caused by resize race
      - Fix NULL pointer dereference in process_metadata_update
      - Fix hang when stopping arrays with metadata through dm-raid
      - Fix any_working flag handling in raid10_sync_request
      - Refactor sync/recovery code path, improve error handling for
        badblocks, and remove unused recovery_disabled field
      - Consolidate mddev boolean fields into mddev_flags
      - Use mempool to allocate stripe_request_ctx and make sure
        max_sectors is not less than io_opt in raid5
      - Fix return value of mddev_trylock
      - Fix memory leak in raid1_run()
      - Add Li Nan as mdraid reviewer

 - Move phys_vec definitions to the kernel types, mostly in preparation
   for some VFIO and RDMA changes

 - Improve the speed for secure erase for some devices

 - Various little rust updates

 - Various other minor fixes, improvements, and cleanups

* tag 'for-7.0/block-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (162 commits)
  blk-mq: ABI/sysfs-block: fix docs build warnings
  selftests: ublk: organize test directories by test ID
  block: decouple secure erase size limit from discard size limit
  block: remove redundant kill_bdev() call in set_blocksize()
  blk-mq: add documentation for new queue attribute async_dpeth
  block, bfq: convert to use request_queue->async_depth
  mq-deadline: covert to use request_queue->async_depth
  kyber: covert to use request_queue->async_depth
  blk-mq: add a new queue sysfs attribute async_depth
  blk-mq: factor out a helper blk_mq_limit_depth()
  blk-mq-sched: unify elevators checking for async requests
  block: convert nr_requests to unsigned int
  block: don't use strcpy to copy blockdev name
  blk-mq-debugfs: warn about possible deadlock
  blk-mq-debugfs: add missing debugfs_mutex in blk_mq_debugfs_register_hctxs()
  blk-mq-debugfs: remove blk_mq_debugfs_unregister_rqos()
  blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static
  blk-rq-qos: fix possible debugfs_mutex deadlock
  blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos
  blk-wbt: fix possible deadlock to nest pcpu_alloc_mutex under q_usage_counter
  ...
parents 591beb0e 72f4d6fc
Loading
Loading
Loading
Loading
+45 −0
Original line number Diff line number Diff line
@@ -609,6 +609,51 @@ Description:
		enabled, and whether tags are shared.


What:		/sys/block/<disk>/queue/async_depth
Date:		August 2025
Contact:	linux-block@vger.kernel.org
Description:
		[RW] Controls how many asynchronous requests may be allocated
		in the block layer. The value is always capped at nr_requests.

		When no elevator is active (none):

		- async_depth is always equal to nr_requests.

		For bfq scheduler:

		- By default, async_depth is set to 75% of nr_requests.
		  Internal limits are then derived from this value:

		  * Sync writes: limited to async_depth (≈75% of nr_requests).
		  * Async I/O: limited to ~2/3 of async_depth (≈50% of
		    nr_requests).

		  If a bfq_queue is weight-raised:

		  * Sync writes: limited to ~1/2 of async_depth (≈37% of
		    nr_requests).
		  * Async I/O: limited to ~1/4 of async_depth (≈18% of
		    nr_requests).

		- If the user writes a custom value to async_depth, BFQ will
		  recompute these limits proportionally based on the new value.

		For Kyber:

		- By default async_depth is set to 75% of nr_requests.
		- If the user writes a custom value to async_depth, then it
		  overrides the default and directly controls the limit for
		  writes and async I/O.

		For mq-deadline:

		- By default async_depth is set to nr_requests.
		- If the user writes a custom value to async_depth, then it
		  overrides the default and directly controls the limit for
		  writes and async I/O.


What:		/sys/block/<disk>/queue/nr_zones
Date:		November 2018
Contact:	Damien Le Moal <damien.lemoal@wdc.com>
+0 −1
Original line number Diff line number Diff line
@@ -135,7 +135,6 @@ Usage of helpers:
	bio_first_bvec_all()
	bio_first_page_all()
	bio_first_folio_all()
	bio_last_bvec_all()

* The following helpers iterate over single-page segment. The passed 'struct
  bio_vec' will contain a single-page IO vector during the iteration::
+6 −0
Original line number Diff line number Diff line
@@ -206,6 +206,12 @@ it to a bio, given the blk_crypto_key and the data unit number that will be used
for en/decryption.  Users don't need to worry about freeing the bio_crypt_ctx
later, as that happens automatically when the bio is freed or reset.

To submit a bio that uses inline encryption, users must call
``blk_crypto_submit_bio()`` instead of the usual ``submit_bio()``.  This will
submit the bio to the underlying driver if it supports inline crypto, or else
call the blk-crypto fallback routines before submitting normal bios to the
underlying drivers.

Finally, when done using inline encryption with a blk_crypto_key on a
block_device, users must call ``blk_crypto_evict_key()``.  This ensures that
the key is evicted from all keyslots it may be programmed into and unlinked from
+60 −4
Original line number Diff line number Diff line
@@ -260,9 +260,12 @@ The following IO commands are communicated via io_uring passthrough command,
and each command is only for forwarding the IO and committing the result
with specified IO tag in the command data:

- ``UBLK_IO_FETCH_REQ``
Traditional Per-I/O Commands
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  Sent from the server IO pthread for fetching future incoming IO requests
- ``UBLK_U_IO_FETCH_REQ``

  Sent from the server I/O pthread for fetching future incoming I/O requests
  destined to ``/dev/ublkb*``. This command is sent only once from the server
  IO pthread for ublk driver to setup IO forward environment.

@@ -278,7 +281,7 @@ with specified IO tag in the command data:
  supported by the driver, daemons must be per-queue instead - i.e. all I/Os
  associated to a single qid must be handled by the same task.

- ``UBLK_IO_COMMIT_AND_FETCH_REQ``
- ``UBLK_U_IO_COMMIT_AND_FETCH_REQ``

  When an IO request is destined to ``/dev/ublkb*``, the driver stores
  the IO's ``ublksrv_io_desc`` to the specified mapped area; then the
@@ -293,7 +296,7 @@ with specified IO tag in the command data:
  requests with the same IO tag. That is, ``UBLK_IO_COMMIT_AND_FETCH_REQ``
  is reused for both fetching request and committing back IO result.

- ``UBLK_IO_NEED_GET_DATA``
- ``UBLK_U_IO_NEED_GET_DATA``

  With ``UBLK_F_NEED_GET_DATA`` enabled, the WRITE request will be firstly
  issued to ublk server without data copy. Then, IO backend of ublk server
@@ -322,6 +325,59 @@ with specified IO tag in the command data:
  ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy
  the server buffer (pages) read to the IO request pages.

Batch I/O Commands (UBLK_F_BATCH_IO)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``UBLK_F_BATCH_IO`` feature provides an alternative high-performance
I/O handling model that replaces the traditional per-I/O commands with
per-queue batch commands. This significantly reduces communication overhead
and enables better load balancing across multiple server tasks.

Key differences from traditional mode:

- **Per-queue vs Per-I/O**: Commands operate on queues rather than individual I/Os
- **Batch processing**: Multiple I/Os are handled in single operations
- **Multishot commands**: Use io_uring multishot for reduced submission overhead
- **Flexible task assignment**: Any task can handle any I/O (no per-I/O daemons)
- **Better load balancing**: Tasks can adjust their workload dynamically

Batch I/O Commands:

- ``UBLK_U_IO_PREP_IO_CMDS``

  Prepares multiple I/O commands in batch. The server provides a buffer
  containing multiple I/O descriptors that will be processed together.
  This reduces the number of individual command submissions required.

- ``UBLK_U_IO_COMMIT_IO_CMDS``

  Commits results for multiple I/O operations in batch, and prepares the
  I/O descriptors to accept new requests. The server provides a buffer
  containing the results of multiple completed I/Os, allowing efficient
  bulk completion of requests.

- ``UBLK_U_IO_FETCH_IO_CMDS``

  **Multishot command** for fetching I/O commands in batch. This is the key
  command that enables high-performance batch processing:

  * Uses io_uring multishot capability for reduced submission overhead
  * Single command can fetch multiple I/O requests over time
  * Buffer size determines maximum batch size per operation
  * Multiple fetch commands can be submitted for load balancing
  * Only one fetch command is active at any time per queue
  * Supports dynamic load balancing across multiple server tasks

  It is one typical multishot io_uring request with provided buffer, and it
  won't be completed until any failure is triggered.

  Each task can submit ``UBLK_U_IO_FETCH_IO_CMDS`` with different buffer
  sizes to control how much work it handles. This enables sophisticated
  load balancing strategies in multi-threaded servers.

Migration: Applications using traditional commands (``UBLK_U_IO_FETCH_REQ``,
``UBLK_U_IO_COMMIT_AND_FETCH_REQ``) cannot use batch mode simultaneously.

Zero copy
---------

+1 −0
Original line number Diff line number Diff line
@@ -24276,6 +24276,7 @@ F: include/linux/property.h
SOFTWARE RAID (Multiple Disks) SUPPORT
M:	Song Liu <song@kernel.org>
M:	Yu Kuai <yukuai@fnnas.com>
R:	Li Nan <linan122@huawei.com>
L:	linux-raid@vger.kernel.org
S:	Supported
Q:	https://patchwork.kernel.org/project/linux-raid/list/
Loading