Commit 6f59de9b authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - ublk updates:
      - Add support for updating the size of a ublk instance
      - Zero-copy improvements
      - Auto-registering of buffers for zero-copy
      - Series simplifying and improving GET_DATA and request lookup
      - Series adding quiesce support
      - Lots of selftests additions
      - Various cleanups

 - NVMe updates via Christoph:
      - add per-node DMA pools and use them for PRP/SGL allocations
        (Caleb Sander Mateos, Keith Busch)
      - nvme-fcloop refcounting fixes (Daniel Wagner)
      - support delayed removal of the multipath node and optionally
        support the multipath node for private namespaces (Nilay Shroff)
      - support shared CQs in the PCI endpoint target code (Wilfred
        Mallawa)
      - support admin-queue only authentication (Hannes Reinecke)
      - use the crc32c library instead of the crypto API (Eric Biggers)
      - misc cleanups (Christoph Hellwig, Marcelo Moreira, Hannes
        Reinecke, Leon Romanovsky, Gustavo A. R. Silva)

 - MD updates via Yu:
      - Fix that normal IO can be starved by sync IO, found by mkfs on
        newly created large raid5, with some clean up patches for bdev
        inflight counters

 - Clean up brd, getting rid of atomic kmaps and bvec poking

 - Add loop driver specifically for zoned IO testing

 - Eliminate blk-rq-qos calls with a static key, if not enabled

 - Improve hctx locking for when a plug has IO for multiple queues
   pending

 - Remove block layer bouncing support, which in turn means we can
   remove the per-node bounce stat as well

 - Improve blk-throttle support

 - Improve delay support for blk-throttle

 - Improve brd discard support

 - Unify IO scheduler switching. This should also fix a bunch of lockdep
   warnings we've been seeing, after enabling lockdep support for queue
   freezing/unfreezeing

 - Add support for block write streams via FDP (flexible data placement)
   on NVMe

 - Add a bunch of block helpers, facilitating the removal of a bunch of
   duplicated boilerplate code

 - Remove obsolete BLK_MQ pci and virtio Kconfig options

 - Add atomic/untorn write support to blktrace

 - Various little cleanups and fixes

* tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux: (186 commits)
  selftests: ublk: add test for UBLK_F_QUIESCE
  ublk: add feature UBLK_F_QUIESCE
  selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZE
  traceevent/block: Add REQ_ATOMIC flag to block trace events
  ublk: run auto buf unregisgering in same io_ring_ctx with registering
  io_uring: add helper io_uring_cmd_ctx_handle()
  ublk: remove io argument from ublk_auto_buf_reg_fallback()
  ublk: handle ublk_set_auto_buf_reg() failure correctly in ublk_fetch()
  selftests: ublk: add test for covering UBLK_AUTO_BUF_REG_FALLBACK
  selftests: ublk: support UBLK_F_AUTO_BUF_REG
  ublk: support UBLK_AUTO_BUF_REG_FALLBACK
  ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG
  ublk: prepare for supporting to register request buffer automatically
  ublk: convert to refcount_t
  selftests: ublk: make IO & device removal test more stressful
  nvme: rename nvme_mpath_shutdown_disk to nvme_mpath_remove_disk
  nvme: introduce multipath_always_on module param
  nvme-multipath: introduce delayed removal of the multipath head node
  nvme-pci: derive and better document max segments limits
  nvme-pci: use struct_size for allocation struct nvme_dev
  ...
parents 3e406741 533c87e2
Loading
Loading
Loading
Loading
+15 −0
Original line number Diff line number Diff line
@@ -547,6 +547,21 @@ Description:
		[RO] Maximum size in bytes of a single element in a DMA
		scatter/gather list.

What:		/sys/block/<disk>/queue/max_write_streams
Date:		November 2024
Contact:	linux-block@vger.kernel.org
Description:
		[RO] Maximum number of write streams supported, 0 if not
		supported. If supported, valid values are 1 through
		max_write_streams, inclusive.

What:		/sys/block/<disk>/queue/write_stream_granularity
Date:		November 2024
Contact:	linux-block@vger.kernel.org
Description:
		[RO] Granularity of a write stream in bytes.  The granularity
		of a write stream is the size that should be discarded or
		overwritten together to avoid write amplification in the device.

What:		/sys/block/<disk>/queue/max_segments
Date:		March 2010
+1 −0
Original line number Diff line number Diff line
@@ -11,6 +11,7 @@ Block Devices
   nbd
   paride
   ramdisk
   zoned_loop
   zram

   drbd/index
+169 −0
Original line number Diff line number Diff line
.. SPDX-License-Identifier: GPL-2.0

=======================
Zoned Loop Block Device
=======================

.. Contents:

	1) Overview
	2) Creating a Zoned Device
	3) Deleting a Zoned Device
	4) Example


1) Overview
-----------

The zoned loop block device driver (zloop) allows a user to create a zoned block
device using one regular file per zone as backing storage. This driver does not
directly control any hardware and uses read, write and truncate operations to
regular files of a file system to emulate a zoned block device.

Using zloop, zoned block devices with a configurable capacity, zone size and
number of conventional zones can be created. The storage for each zone of the
device is implemented using a regular file with a maximum size equal to the zone
size. The size of a file backing a conventional zone is always equal to the zone
size. The size of a file backing a sequential zone indicates the amount of data
sequentially written to the file, that is, the size of the file directly
indicates the position of the write pointer of the zone.

When resetting a sequential zone, its backing file size is truncated to zero.
Conversely, for a zone finish operation, the backing file is truncated to the
zone size. With this, the maximum capacity of a zloop zoned block device created
can be larger configured to be larger than the storage space available on the
backing file system. Of course, for such configuration, writing more data than
the storage space available on the backing file system will result in write
errors.

The zoned loop block device driver implements a complete zone transition state
machine. That is, zones can be empty, implicitly opened, explicitly opened,
closed or full. The current implementation does not support any limits on the
maximum number of open and active zones.

No user tools are necessary to create and delete zloop devices.

2) Creating a Zoned Device
--------------------------

Once the zloop module is loaded (or if zloop is compiled in the kernel), the
character device file /dev/zloop-control can be used to add a zloop device.
This is done by writing an "add" command directly to the /dev/zloop-control
device::

	$ modprobe zloop
        $ ls -l /dev/zloop*
        crw-------. 1 root root 10, 123 Jan  6 19:18 /dev/zloop-control

        $ mkdir -p <base directory/<device ID>
        $ echo "add [options]" > /dev/zloop-control

The options available for the add command can be listed by reading the
/dev/zloop-control device::

	$ cat /dev/zloop-control
        add id=%d,capacity_mb=%u,zone_size_mb=%u,zone_capacity_mb=%u,conv_zones=%u,base_dir=%s,nr_queues=%u,queue_depth=%u,buffered_io
        remove id=%d

In more details, the options that can be used with the "add" command are as
follows.

================   ===========================================================
id                 Device number (the X in /dev/zloopX).
                   Default: automatically assigned.
capacity_mb        Device total capacity in MiB. This is always rounded up to
                   the nearest higher multiple of the zone size.
                   Default: 16384 MiB (16 GiB).
zone_size_mb       Device zone size in MiB. Default: 256 MiB.
zone_capacity_mb   Device zone capacity (must always be equal to or lower than
                   the zone size. Default: zone size.
conv_zones         Total number of conventioanl zones starting from sector 0.
                   Default: 8.
base_dir           Path to the base directoy where to create the directory
                   containing the zone files of the device.
                   Default=/var/local/zloop.
                   The device directory containing the zone files is always
                   named with the device ID. E.g. the default zone file
                   directory for /dev/zloop0 is /var/local/zloop/0.
nr_queues          Number of I/O queues of the zoned block device. This value is
                   always capped by the number of online CPUs
                   Default: 1
queue_depth        Maximum I/O queue depth per I/O queue.
                   Default: 64
buffered_io        Do buffered IOs instead of direct IOs (default: false)
================   ===========================================================

3) Deleting a Zoned Device
--------------------------

Deleting an unused zoned loop block device is done by issuing the "remove"
command to /dev/zloop-control, specifying the ID of the device to remove::

        $ echo "remove id=X" > /dev/zloop-control

The remove command does not have any option.

A zoned device that was removed can be re-added again without any change to the
state of the device zones: the device zones are restored to their last state
before the device was removed. Adding again a zoned device after it was removed
must always be done using the same configuration as when the device was first
added. If a zone configuration change is detected, an error will be returned and
the zoned device will not be created.

To fully delete a zoned device, after executing the remove operation, the device
base directory containing the backing files of the device zones must be deleted.

4) Example
----------

The following sequence of commands creates a 2GB zoned device with zones of 64
MB and a zone capacity of 63 MB::

        $ modprobe zloop
        $ mkdir -p /var/local/zloop/0
        $ echo "add capacity_mb=2048,zone_size_mb=64,zone_capacity=63MB" > /dev/zloop-control

For the device created (/dev/zloop0), the zone backing files are all created
under the default base directory (/var/local/zloop)::

        $ ls -l /var/local/zloop/0
        total 0
        -rw-------. 1 root root 67108864 Jan  6 22:23 cnv-000000
        -rw-------. 1 root root 67108864 Jan  6 22:23 cnv-000001
        -rw-------. 1 root root 67108864 Jan  6 22:23 cnv-000002
        -rw-------. 1 root root 67108864 Jan  6 22:23 cnv-000003
        -rw-------. 1 root root 67108864 Jan  6 22:23 cnv-000004
        -rw-------. 1 root root 67108864 Jan  6 22:23 cnv-000005
        -rw-------. 1 root root 67108864 Jan  6 22:23 cnv-000006
        -rw-------. 1 root root 67108864 Jan  6 22:23 cnv-000007
        -rw-------. 1 root root        0 Jan  6 22:23 seq-000008
        -rw-------. 1 root root        0 Jan  6 22:23 seq-000009
        ...

The zoned device created (/dev/zloop0) can then be used normally::

        $ lsblk -z
        NAME   ZONED        ZONE-SZ ZONE-NR ZONE-AMAX ZONE-OMAX ZONE-APP ZONE-WGRAN
        zloop0 host-managed     64M      32         0         0       1M         4K
        $ blkzone report /dev/zloop0
          start: 0x000000000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
          start: 0x000020000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
          start: 0x000040000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
          start: 0x000060000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
          start: 0x000080000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
          start: 0x0000a0000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
          start: 0x0000c0000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
          start: 0x0000e0000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
          start: 0x000100000, len 0x020000, cap 0x01f800, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
          start: 0x000120000, len 0x020000, cap 0x01f800, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
          ...

Deleting this device is done using the command::

        $ echo "remove id=0" > /dev/zloop-control

The removed device can be re-added again using the same "add" command as when
the device was first created. To fully delete a zoned device, its backing files
should also be deleted after executing the remove command::

        $ rm -r /var/local/zloop/0
+8 −0
Original line number Diff line number Diff line
@@ -26894,6 +26894,14 @@ L: linux-kernel@vger.kernel.org
S:	Maintained
F:	arch/x86/kernel/cpu/zhaoxin.c
ZONED LOOP DEVICE
M:	Damien Le Moal <dlemoal@kernel.org>
R:	Christoph Hellwig <hch@lst.de>
L:	linux-block@vger.kernel.org
S:	Maintained
F:	Documentation/admin-guide/blockdev/zoned_loop.rst
F:	drivers/block/zloop.c
ZONEFS FILESYSTEM
M:	Damien Le Moal <dlemoal@kernel.org>
M:	Naohiro Aota <naohiro.aota@wdc.com>
+0 −1
Original line number Diff line number Diff line
@@ -13,7 +13,6 @@ CONFIG_MIPS_CMDLINE_DTB_EXTEND=y
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_BLK_DEV_BSG is not set
# CONFIG_BOUNCE is not set
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
Loading