Commit 1122c0c1 authored by Christoph Hellwig's avatar Christoph Hellwig Committed by Jens Axboe
Browse files

block: move cache control settings out of queue->flags



Move the cache control settings into the queue_limits so that the flags
can be set atomically with the device queue frozen.

Add new features and flags field for the driver set flags, and internal
(usually sysfs-controlled) flags in the block layer.  Note that we'll
eventually remove enough field from queue_limits to bring it back to the
previous size.

The disable flag is inverted compared to the previous meaning, which
means it now survives a rescan, similar to the max_sectors and
max_discard_sectors user limits.

The FLUSH and FUA flags are now inherited by blk_stack_limits, which
simplified the code in dm a lot, but also causes a slight behavior
change in that dm-switch and dm-unstripe now advertise a write cache
despite setting num_flush_bios to 0.  The I/O path will handle this
gracefully, but as far as I can tell the lack of num_flush_bios
and thus flush support is a pre-existing data integrity bug in those
targets that really needs fixing, after which a non-zero num_flush_bios
should be required in dm for targets that map to underlying devices.

Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
Acked-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
Reviewed-by: default avatarDamien Le Moal <dlemoal@kernel.org>
Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-14-hch@lst.de


Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
parent 70905f87
Loading
Loading
Loading
Loading
+38 −29
Original line number Diff line number Diff line
@@ -46,41 +46,50 @@ worry if the underlying devices need any explicit cache flushing and how
the Forced Unit Access is implemented.  The REQ_PREFLUSH and REQ_FUA flags
may both be set on a single bio.

Feature settings for block drivers
----------------------------------

Implementation details for bio based block drivers
--------------------------------------------------------------
For devices that do not support volatile write caches there is no driver
support required, the block layer completes empty REQ_PREFLUSH requests before
entering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from
requests that have a payload.

These drivers will always see the REQ_PREFLUSH and REQ_FUA bits as they sit
directly below the submit_bio interface.  For remapping drivers the REQ_FUA
bits need to be propagated to underlying devices, and a global flush needs
to be implemented for bios with the REQ_PREFLUSH bit set.  For real device
drivers that do not have a volatile cache the REQ_PREFLUSH and REQ_FUA bits
on non-empty bios can simply be ignored, and REQ_PREFLUSH requests without
data can be completed successfully without doing any work.  Drivers for
devices with volatile caches need to implement the support for these
flags themselves without any help from the block layer.
For devices with volatile write caches the driver needs to tell the block layer
that it supports flushing caches by setting the

   BLK_FEAT_WRITE_CACHE

Implementation details for request_fn based block drivers
---------------------------------------------------------
flag in the queue_limits feature field.  For devices that also support the FUA
bit the block layer needs to be told to pass on the REQ_FUA bit by also setting
the

For devices that do not support volatile write caches there is no driver
support required, the block layer completes empty REQ_PREFLUSH requests before
entering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from
requests that have a payload.  For devices with volatile write caches the
driver needs to tell the block layer that it supports flushing caches by
doing::
   BLK_FEAT_FUA

flag in the features field of the queue_limits structure.

Implementation details for bio based block drivers
--------------------------------------------------

For bio based drivers the REQ_PREFLUSH and REQ_FUA bit are simplify passed on
to the driver if the drivers sets the BLK_FEAT_WRITE_CACHE flag and the drivers
needs to handle them.

*NOTE*: The REQ_FUA bit also gets passed on when the BLK_FEAT_FUA flags is
_not_ set.  Any bio based driver that sets BLK_FEAT_WRITE_CACHE also needs to
handle REQ_FUA.

	blk_queue_write_cache(sdkp->disk->queue, true, false);
For remapping drivers the REQ_FUA bits need to be propagated to underlying
devices, and a global flush needs to be implemented for bios with the
REQ_PREFLUSH bit set.

and handle empty REQ_OP_FLUSH requests in its prep_fn/request_fn.  Note that
REQ_PREFLUSH requests with a payload are automatically turned into a sequence
of an empty REQ_OP_FLUSH request followed by the actual write by the block
layer.  For devices that also support the FUA bit the block layer needs
to be told to pass through the REQ_FUA bit using::
Implementation details for blk-mq drivers
-----------------------------------------

	blk_queue_write_cache(sdkp->disk->queue, true, true);
When the BLK_FEAT_WRITE_CACHE flag is set, REQ_OP_WRITE | REQ_PREFLUSH requests
with a payload are automatically turned into a sequence of a REQ_OP_FLUSH
request followed by the actual write by the block layer.

and the driver must handle write requests that have the REQ_FUA bit set
in prep_fn/request_fn.  If the FUA bit is not natively supported the block
layer turns it into an empty REQ_OP_FLUSH request after the actual write.
When the BLK_FEAT_FUA flags is set, the REQ_FUA bit simplify passed on for the
REQ_OP_WRITE request, else a REQ_OP_FLUSH request is sent by the block layer
after the completion of the write request for bio submissions with the REQ_FUA
bit set.
+1 −1
Original line number Diff line number Diff line
@@ -835,6 +835,7 @@ static int ubd_add(int n, char **error_out)
	struct queue_limits lim = {
		.max_segments		= MAX_SG,
		.seg_boundary_mask	= PAGE_SIZE - 1,
		.features		= BLK_FEAT_WRITE_CACHE,
	};
	struct gendisk *disk;
	int err = 0;
@@ -882,7 +883,6 @@ static int ubd_add(int n, char **error_out)
	}

	blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue);
	blk_queue_write_cache(disk->queue, true, false);
	disk->major = UBD_MAJOR;
	disk->first_minor = n << UBD_SHIFT;
	disk->minors = 1 << UBD_SHIFT;
+1 −1
Original line number Diff line number Diff line
@@ -782,7 +782,7 @@ void submit_bio_noacct(struct bio *bio)
		if (WARN_ON_ONCE(bio_op(bio) != REQ_OP_WRITE &&
				 bio_op(bio) != REQ_OP_ZONE_APPEND))
			goto end_io;
		if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags)) {
		if (!bdev_write_cache(bdev)) {
			bio->bi_opf &= ~(REQ_PREFLUSH | REQ_FUA);
			if (!bio_sectors(bio)) {
				status = BLK_STS_OK;
+4 −5
Original line number Diff line number Diff line
@@ -381,8 +381,8 @@ static void blk_rq_init_flush(struct request *rq)
bool blk_insert_flush(struct request *rq)
{
	struct request_queue *q = rq->q;
	unsigned long fflags = q->queue_flags;	/* may change, cache */
	struct blk_flush_queue *fq = blk_get_flush_queue(q, rq->mq_ctx);
	bool supports_fua = q->limits.features & BLK_FEAT_FUA;
	unsigned int policy = 0;

	/* FLUSH/FUA request must never be merged */
@@ -394,11 +394,10 @@ bool blk_insert_flush(struct request *rq)
	/*
	 * Check which flushes we need to sequence for this operation.
	 */
	if (fflags & (1UL << QUEUE_FLAG_WC)) {
	if (blk_queue_write_cache(q)) {
		if (rq->cmd_flags & REQ_PREFLUSH)
			policy |= REQ_FSEQ_PREFLUSH;
		if (!(fflags & (1UL << QUEUE_FLAG_FUA)) &&
		    (rq->cmd_flags & REQ_FUA))
		if ((rq->cmd_flags & REQ_FUA) && !supports_fua)
			policy |= REQ_FSEQ_POSTFLUSH;
	}

@@ -407,7 +406,7 @@ bool blk_insert_flush(struct request *rq)
	 * REQ_PREFLUSH and FUA for the driver.
	 */
	rq->cmd_flags &= ~REQ_PREFLUSH;
	if (!(fflags & (1UL << QUEUE_FLAG_FUA)))
	if (!supports_fua)
		rq->cmd_flags &= ~REQ_FUA;

	/*
+0 −2
Original line number Diff line number Diff line
@@ -93,8 +93,6 @@ static const char *const blk_queue_flag_name[] = {
	QUEUE_FLAG_NAME(INIT_DONE),
	QUEUE_FLAG_NAME(STABLE_WRITES),
	QUEUE_FLAG_NAME(POLL),
	QUEUE_FLAG_NAME(WC),
	QUEUE_FLAG_NAME(FUA),
	QUEUE_FLAG_NAME(DAX),
	QUEUE_FLAG_NAME(STATS),
	QUEUE_FLAG_NAME(REGISTERED),
Loading