Commit 3e781988 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge tag 'for-6.11/block-20240710' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - NVMe updates via Keith:
     - Device initialization memory leak fixes (Keith)
     - More constants defined (Weiwen)
     - Target debugfs support (Hannes)
     - PCIe subsystem reset enhancements (Keith)
     - Queue-depth multipath policy (Redhat and PureStorage)
     - Implement get_unique_id (Christoph)
     - Authentication error fixes (Gaosheng)

 - MD updates via Song
     - sync_action fix and refactoring (Yu Kuai)
     - Various small fixes (Christoph Hellwig, Li Nan, and Ofir Gal, Yu
       Kuai, Benjamin Marzinski, Christophe JAILLET, Yang Li)

 - Fix loop detach/open race (Gulam)

 - Fix lower control limit for blk-throttle (Yu)

 - Add module descriptions to various drivers (Jeff)

 - Add support for atomic writes for block devices, and statx reporting
   for same. Includes SCSI and NVMe (John, Prasad, Alan)

 - Add IO priority information to block trace points (Dongliang)

 - Various zone improvements and tweaks (Damien)

 - mq-deadline tag reservation improvements (Bart)

 - Ignore direct reclaim swap writes in writeback throttling (Baokun)

 - Block integrity improvements and fixes (Anuj)

 - Add basic support for rust based block drivers. Has a dummy null_blk
   variant for now (Andreas)

 - Series converting driver settings to queue limits, and cleanups and
   fixes related to that (Christoph)

 - Cleanup for poking too deeply into the bvec internals, in preparation
   for DMA mapping API changes (Christoph)

 - Various minor tweaks and fixes (Jiapeng, John, Kanchan, Mikulas,
   Ming, Zhu, Damien, Christophe, Chaitanya)

* tag 'for-6.11/block-20240710' of git://git.kernel.dk/linux: (206 commits)
  floppy: add missing MODULE_DESCRIPTION() macro
  loop: add missing MODULE_DESCRIPTION() macro
  ublk_drv: add missing MODULE_DESCRIPTION() macro
  xen/blkback: add missing MODULE_DESCRIPTION() macro
  block/rnbd: Constify struct kobj_type
  block: take offset into account in blk_bvec_map_sg again
  block: fix get_max_segment_size() warning
  loop: Don't bother validating blocksize
  virtio_blk: Don't bother validating blocksize
  null_blk: Don't bother validating blocksize
  block: Validate logical block size in blk_validate_limits()
  virtio_blk: Fix default logical block size fallback
  nvmet-auth: fix nvmet_auth hash error handling
  nvme: implement ->get_unique_id
  block: pass a phys_addr_t to get_max_segment_size
  block: add a bvec_phys helper
  blk-lib: check for kill signal in ioctl BLKZEROOUT
  block: limit the Write Zeroes to manually writing zeroes fallback
  block: refacto blkdev_issue_zeroout
  block: move read-only and supported checks into (__)blkdev_issue_zeroout
  ...
parents 3a56e241 3c1743a6
Loading
Loading
Loading
Loading
+1 −0
Original line number Diff line number Diff line
@@ -690,6 +690,7 @@ Vivien Didelot <vivien.didelot@gmail.com> <vivien.didelot@savoirfairelinux.com>
Vlad Dogaru <ddvlad@gmail.com> <vlad.dogaru@intel.com>
Vladimir Davydov <vdavydov.dev@gmail.com> <vdavydov@parallels.com>
Vladimir Davydov <vdavydov.dev@gmail.com> <vdavydov@virtuozzo.com>
Weiwen Hu <huweiwen@linux.alibaba.com> <sehuww@mail.scut.edu.cn>
WeiXiong Liao <gmpy.liaowx@gmail.com> <liaoweixiong@allwinnertech.com>
Wen Gong <quic_wgong@quicinc.com> <wgong@codeaurora.org>
Wesley Cheng <quic_wcheng@quicinc.com> <wcheng@codeaurora.org>
+53 −0
Original line number Diff line number Diff line
@@ -21,6 +21,59 @@ Description:
		device is offset from the internal allocation unit's
		natural alignment.

What:		/sys/block/<disk>/atomic_write_max_bytes
Date:		February 2024
Contact:	Himanshu Madhani <himanshu.madhani@oracle.com>
Description:
		[RO] This parameter specifies the maximum atomic write
		size reported by the device. This parameter is relevant
		for merging of writes, where a merged atomic write
		operation must not exceed this number of bytes.
		This parameter may be greater than the value in
		atomic_write_unit_max_bytes as
		atomic_write_unit_max_bytes will be rounded down to a
		power-of-two and atomic_write_unit_max_bytes may also be
		limited by some other queue limits, such as max_segments.
		This parameter - along with atomic_write_unit_min_bytes
		and atomic_write_unit_max_bytes - will not be larger than
		max_hw_sectors_kb, but may be larger than max_sectors_kb.


What:		/sys/block/<disk>/atomic_write_unit_min_bytes
Date:		February 2024
Contact:	Himanshu Madhani <himanshu.madhani@oracle.com>
Description:
		[RO] This parameter specifies the smallest block which can
		be written atomically with an atomic write operation. All
		atomic write operations must begin at a
		atomic_write_unit_min boundary and must be multiples of
		atomic_write_unit_min. This value must be a power-of-two.


What:		/sys/block/<disk>/atomic_write_unit_max_bytes
Date:		February 2024
Contact:	Himanshu Madhani <himanshu.madhani@oracle.com>
Description:
		[RO] This parameter defines the largest block which can be
		written atomically with an atomic write operation. This
		value must be a multiple of atomic_write_unit_min and must
		be a power-of-two. This value will not be larger than
		atomic_write_max_bytes.


What:		/sys/block/<disk>/atomic_write_boundary_bytes
Date:		February 2024
Contact:	Himanshu Madhani <himanshu.madhani@oracle.com>
Description:
		[RO] A device may need to internally split an atomic write I/O
		which straddles a given logical block address boundary. This
		parameter specifies the size in bytes of the atomic boundary if
		one is reported by the device. This value must be a
		power-of-two and at least the size as in
		atomic_write_unit_max_bytes.
		Any attempt to merge atomic write I/Os must not result in a
		merged I/O which crosses this boundary (if any).


What:		/sys/block/<disk>/diskseq
Date:		February 2021
+3 −46
Original line number Diff line number Diff line
@@ -153,18 +153,11 @@ bio_free() will automatically free the bip.
4.2 Block Device
----------------

Because the format of the protection data is tied to the physical
disk, each block device has been extended with a block integrity
profile (struct blk_integrity).  This optional profile is registered
with the block layer using blk_integrity_register().

The profile contains callback functions for generating and verifying
the protection data, as well as getting and setting application tags.
The profile also contains a few constants to aid in completing,
merging and splitting the integrity metadata.
Block devices can set up the integrity information in the integrity
sub-struture of the queue_limits structure.

Layered block devices will need to pick a profile that's appropriate
for all subdevices.  blk_integrity_compare() can help with that.  DM
for all subdevices.  queue_limits_stack_integrity() can help with that.  DM
and MD linear, RAID0 and RAID1 are currently supported.  RAID4/5/6
will require extra work due to the application tag.

@@ -250,42 +243,6 @@ will require extra work due to the application tag.
      integrity upon completion.


5.4 Registering A Block Device As Capable Of Exchanging Integrity Metadata
--------------------------------------------------------------------------

    To enable integrity exchange on a block device the gendisk must be
    registered as capable:

    `int blk_integrity_register(gendisk, blk_integrity);`

      The blk_integrity struct is a template and should contain the
      following::

        static struct blk_integrity my_profile = {
            .name                   = "STANDARDSBODY-TYPE-VARIANT-CSUM",
            .generate_fn            = my_generate_fn,
	    .verify_fn              = my_verify_fn,
	    .tuple_size             = sizeof(struct my_tuple_size),
	    .tag_size               = <tag bytes per hw sector>,
        };

      'name' is a text string which will be visible in sysfs.  This is
      part of the userland API so chose it carefully and never change
      it.  The format is standards body-type-variant.
      E.g. T10-DIF-TYPE1-IP or T13-EPP-0-CRC.

      'generate_fn' generates appropriate integrity metadata (for WRITE).

      'verify_fn' verifies that the data buffer matches the integrity
      metadata.

      'tuple_size' must be set to match the size of the integrity
      metadata per sector.  I.e. 8 for DIF and EPP.

      'tag_size' must be set to identify how many bytes of tag space
      are available per hardware sector.  For DIF this is either 2 or
      0 depending on the value of the Control Mode Page ATO bit.

----------------------------------------------------------------------

2007-12-24 Martin K. Petersen <martin.petersen@oracle.com>
+38 −29
Original line number Diff line number Diff line
@@ -46,41 +46,50 @@ worry if the underlying devices need any explicit cache flushing and how
the Forced Unit Access is implemented.  The REQ_PREFLUSH and REQ_FUA flags
may both be set on a single bio.

Feature settings for block drivers
----------------------------------

Implementation details for bio based block drivers
--------------------------------------------------------------
For devices that do not support volatile write caches there is no driver
support required, the block layer completes empty REQ_PREFLUSH requests before
entering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from
requests that have a payload.

These drivers will always see the REQ_PREFLUSH and REQ_FUA bits as they sit
directly below the submit_bio interface.  For remapping drivers the REQ_FUA
bits need to be propagated to underlying devices, and a global flush needs
to be implemented for bios with the REQ_PREFLUSH bit set.  For real device
drivers that do not have a volatile cache the REQ_PREFLUSH and REQ_FUA bits
on non-empty bios can simply be ignored, and REQ_PREFLUSH requests without
data can be completed successfully without doing any work.  Drivers for
devices with volatile caches need to implement the support for these
flags themselves without any help from the block layer.
For devices with volatile write caches the driver needs to tell the block layer
that it supports flushing caches by setting the

   BLK_FEAT_WRITE_CACHE

Implementation details for request_fn based block drivers
---------------------------------------------------------
flag in the queue_limits feature field.  For devices that also support the FUA
bit the block layer needs to be told to pass on the REQ_FUA bit by also setting
the

For devices that do not support volatile write caches there is no driver
support required, the block layer completes empty REQ_PREFLUSH requests before
entering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from
requests that have a payload.  For devices with volatile write caches the
driver needs to tell the block layer that it supports flushing caches by
doing::
   BLK_FEAT_FUA

flag in the features field of the queue_limits structure.

Implementation details for bio based block drivers
--------------------------------------------------

For bio based drivers the REQ_PREFLUSH and REQ_FUA bit are simply passed on to
the driver if the driver sets the BLK_FEAT_WRITE_CACHE flag and the driver
needs to handle them.

*NOTE*: The REQ_FUA bit also gets passed on when the BLK_FEAT_FUA flags is
_not_ set.  Any bio based driver that sets BLK_FEAT_WRITE_CACHE also needs to
handle REQ_FUA.

	blk_queue_write_cache(sdkp->disk->queue, true, false);
For remapping drivers the REQ_FUA bits need to be propagated to underlying
devices, and a global flush needs to be implemented for bios with the
REQ_PREFLUSH bit set.

and handle empty REQ_OP_FLUSH requests in its prep_fn/request_fn.  Note that
REQ_PREFLUSH requests with a payload are automatically turned into a sequence
of an empty REQ_OP_FLUSH request followed by the actual write by the block
layer.  For devices that also support the FUA bit the block layer needs
to be told to pass through the REQ_FUA bit using::
Implementation details for blk-mq drivers
-----------------------------------------

	blk_queue_write_cache(sdkp->disk->queue, true, true);
When the BLK_FEAT_WRITE_CACHE flag is set, REQ_OP_WRITE | REQ_PREFLUSH requests
with a payload are automatically turned into a sequence of a REQ_OP_FLUSH
request followed by the actual write by the block layer.

and the driver must handle write requests that have the REQ_FUA bit set
in prep_fn/request_fn.  If the FUA bit is not natively supported the block
layer turns it into an empty REQ_OP_FLUSH request after the actual write.
When the BLK_FEAT_FUA flags is set, the REQ_FUA bit is simply passed on for the
REQ_OP_WRITE request, else a REQ_OP_FLUSH request is sent by the block layer
after the completion of the write request for bio submissions with the REQ_FUA
bit set.
+14 −0
Original line number Diff line number Diff line
@@ -3759,6 +3759,20 @@ F: include/linux/blk*
F:	kernel/trace/blktrace.c
F:	lib/sbitmap.c
BLOCK LAYER DEVICE DRIVER API [RUST]
M:	Andreas Hindborg <a.hindborg@samsung.com>
R:	Boqun Feng <boqun.feng@gmail.com>
L:	linux-block@vger.kernel.org
L:	rust-for-linux@vger.kernel.org
S:	Supported
W:	https://rust-for-linux.com
B:	https://github.com/Rust-for-Linux/linux/issues
C:	https://rust-for-linux.zulipchat.com/#narrow/stream/Block
T:	git https://github.com/Rust-for-Linux/linux.git rust-block-next
F:	drivers/block/rnull.rs
F:	rust/kernel/block.rs
F:	rust/kernel/block/
BLOCK2MTD DRIVER
M:	Joern Engel <joern@lazybastard.org>
L:	linux-mtd@lists.infradead.org
Loading