Commit a5f99809 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge tag 'for-7.1/dm-changes' of...

Merge tag 'for-7.1/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Benjamin Marzinski:
 "There are fixes for some corner case crashes in dm-cache and
  dm-mirror, new setup functionality for dm-vdo, and miscellaneous minor
  fixes and cleanups, especially to dm-verity.

  dm-vdo:
   - Make dm-vdo able to format the device itself, like other dm
     targets, instead of needing a userspace formating program
   - Add some sanity checks and code cleanup

  dm-cache:
   - Fix crashes and hangs when operating in passthrough mode (which
     have been around, unnoticed, since 4.12), as well as a late
     arriving fix for an error path bug in the passthrough fix
   - Fix a corner case memory leak

  dm-verity:
   - Another set of minor bugfixes and code cleanups to the forward
     error correction code

  dm-mirror
   - Fix minor initialization bug
   - Fix overflow crash on a large devices with small region sizes

  dm-crypt
   - Reimplement elephant diffuser using AES library and minor cleanups

  dm-core:
   - Claude found a buffer overflow in /dev/mapper/contrl ioctl handling
   - make dm_mod.wait_for correctly wait for partitions
   - minor code fixes and cleanups"

* tag 'for-7.1/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (62 commits)
  dm cache: fix missing return in invalidate_committed's error path
  dm: fix a buffer overflow in ioctl processing
  dm-crypt: Make crypt_iv_operations::post return void
  dm vdo: Fix spelling mistake "postive" -> "positive"
  dm: provide helper to set stacked limits
  dm-integrity: always set the io hints
  dm-integrity: fix mismatched queue limits
  dm-bufio: use kzalloc_flex
  dm vdo: save the formatted metadata to disk
  dm vdo: add formatting logic and initialization
  dm vdo: add synchronous metadata I/O submission helper
  dm vdo: add geometry block structure
  dm vdo: add geometry block encoding
  dm vdo: add upfront validation for logical size
  dm vdo: add formatting parameters to table line
  dm vdo: add super block initialization to encodings.c
  dm vdo: add geometry block initialization to encodings.c
  dm-crypt: Make crypt_iv_operations::wipe return void
  dm-crypt: Reimplement elephant diffuser using AES library
  dm-verity-fec: warn even when there were no errors
  ...
parents f1d26d72 8c0ee19d
Loading
Loading
Loading
Loading
+102 −20
Original line number Diff line number Diff line
@@ -102,29 +102,42 @@ ignore_zero_blocks
    that are not guaranteed to contain zeroes.

use_fec_from_device <fec_dev>
    Use forward error correction (FEC) to recover from corruption if hash
    verification fails. Use encoding data from the specified device. This
    may be the same device where data and hash blocks reside, in which case
    fec_start must be outside data and hash areas.
    Use forward error correction (FEC) parity data from the specified device to
    try to automatically recover from corruption and I/O errors.

    If the encoding data covers additional metadata, it must be accessible
    on the hash device after the hash blocks.
    If this option is given, then <fec_roots> and <fec_blocks> must also be
    given.  <hash_block_size> must also be equal to <data_block_size>.

    Note: block sizes for data and hash devices must match. Also, if the
    verity <dev> is encrypted the <fec_dev> should be too.
    <fec_dev> can be the same as <dev>, in which case <fec_start> must be
    outside the data area.  It can also be the same as <hash_dev>, in which case
    <fec_start> must be outside the hash and optional additional metadata areas.

    If the data <dev> is encrypted, the <fec_dev> should be too.

    For more information, see `Forward error correction`_.

fec_roots <num>
    Number of generator roots. This equals to the number of parity bytes in
    the encoding data. For example, in RS(M, N) encoding, the number of roots
    is M-N.
    The number of parity bytes in each 255-byte Reed-Solomon codeword.  The
    Reed-Solomon code used will be an RS(255, k) code where k = 255 - fec_roots.

    The supported values are 2 through 24 inclusive.  Higher values provide
    stronger error correction.  However, the minimum value of 2 already provides
    strong error correction due to the use of interleaving, so 2 is the
    recommended value for most users.  fec_roots=2 corresponds to an
    RS(255, 253) code, which has a space overhead of about 0.8%.

fec_blocks <num>
    The number of encoding data blocks on the FEC device. The block size for
    the FEC device is <data_block_size>.
    The total number of <data_block_size> blocks that are error-checked using
    FEC.  This must be at least the sum of <num_data_blocks> and the number of
    blocks needed by the hash tree.  It can include additional metadata blocks,
    which are assumed to be accessible on <hash_dev> following the hash blocks.

    Note that this is *not* the number of parity blocks.  The number of parity
    blocks is inferred from <fec_blocks>, <fec_roots>, and <data_block_size>.

fec_start <offset>
    This is the offset, in <data_block_size> blocks, from the start of the
    FEC device to the beginning of the encoding data.
    This is the offset, in <data_block_size> blocks, from the start of <fec_dev>
    to the beginning of the parity data.

check_at_most_once
    Verify data blocks only the first time they are read from the data device,
@@ -180,11 +193,6 @@ per-block basis. This allows for a lightweight hash computation on first read
into the page cache. Block hashes are stored linearly, aligned to the nearest
block size.

If forward error correction (FEC) support is enabled any recovery of
corrupted data will be verified using the cryptographic hash of the
corresponding data. This is why combining error correction with
integrity checking is essential.

Hash Tree
---------

@@ -212,6 +220,80 @@ The tree looks something like:
           / ... \             /   . . .  \             /           \
     blk_0 ... blk_127  blk_16256   blk_16383      blk_32640 . . . blk_32767

Forward error correction
------------------------

dm-verity's optional forward error correction (FEC) support adds strong error
correction capabilities to dm-verity.  It allows systems that would be rendered
inoperable by errors to continue operating, albeit with reduced performance.

FEC uses Reed-Solomon (RS) codes that are interleaved across the entire
device(s), allowing long bursts of corrupt or unreadable blocks to be recovered.

dm-verity validates any FEC-corrected block against the wanted hash before using
it.  Therefore, FEC doesn't affect the security properties of dm-verity.

The integration of FEC with dm-verity provides significant benefits over a
separate error correction layer:

- dm-verity invokes FEC only when a block's hash doesn't match the wanted hash
  or the block cannot be read at all.  As a result, FEC doesn't add overhead to
  the common case where no error occurs.

- dm-verity hashes are also used to identify erasure locations for RS decoding.
  This allows correcting twice as many errors.

FEC uses an RS(255, k) code where k = 255 - fec_roots.  fec_roots is usually 2.
This means that each k (usually 253) message bytes have fec_roots (usually 2)
bytes of parity data added to get a 255-byte codeword.  (Many external sources
call RS codewords "blocks".  Since dm-verity already uses the term "block" to
mean something else, we'll use the clearer term "RS codeword".)

FEC checks fec_blocks blocks of message data in total, consisting of:

1. The data blocks from the data device
2. The hash blocks from the hash device
3. Optional additional metadata that follows the hash blocks on the hash device

dm-verity assumes that the FEC parity data was computed as if the following
procedure were followed:

1. Concatenate the message data from the above sources.
2. Zero-pad to the next multiple of k blocks.  Let msg be the resulting byte
   array, and msglen its length in bytes.
3. For 0 <= i < msglen / k (for each RS codeword):
     a. Select msg[i + j * msglen / k] for 0 <= j < k.
        Consider these to be the 'k' message bytes of an RS codeword.
     b. Compute the corresponding 'fec_roots' parity bytes of the RS codeword,
        and concatenate them to the FEC parity data.

Step 3a interleaves the RS codewords across the entire device using an
interleaving degree of data_block_size * ceil(fec_blocks / k).  This is the
maximal interleaving, such that the message data consists of a region containing
byte 0 of all the RS codewords, then a region containing byte 1 of all the RS
codewords, and so on up to the region for byte 'k - 1'.  Note that the number of
codewords is set to a multiple of data_block_size; thus, the regions are
block-aligned, and there is an implicit zero padding of up to 'k - 1' blocks.

This interleaving allows long bursts of errors to be corrected.  It provides
much stronger error correction than storage devices typically provide, while
keeping the space overhead low.

The cost is slow decoding: correcting a single block usually requires reading
254 extra blocks spread evenly across the device(s).  However, that is
acceptable because dm-verity uses FEC only when there is actually an error.

The list below contains additional details about the RS codes used by
dm-verity's FEC.  Userspace programs that generate the parity data need to use
these parameters for the parity data to match exactly:

- Field used is GF(256)
- Bytes are mapped to/from GF(256) elements in the natural way, where bits 0
  through 7 (low-order to high-order) map to the coefficients of x^0 through x^7
- Field generator polynomial is x^8 + x^4 + x^3 + x^2 + 1
- The codes used are systematic, BCH-view codes
- Primitive element alpha is 'x'
- First consecutive root of code generator polynomial is 'x^0'

On-disk format
==============
+2 −0
Original line number Diff line number Diff line
@@ -226,6 +226,7 @@ config BLK_DEV_DM
	select BLOCK_HOLDER_DEPRECATED if SYSFS
	select BLK_DEV_DM_BUILTIN
	select BLK_MQ_STACKING
	select CRYPTO_LIB_SHA256 if IMA
	depends on DAX || DAX=n
	help
	  Device-mapper is a low level volume manager.  It works by allowing
@@ -299,6 +300,7 @@ config DM_CRYPT
	select CRYPTO
	select CRYPTO_CBC
	select CRYPTO_ESSIV
	select CRYPTO_LIB_AES
	select CRYPTO_LIB_MD5 # needed by lmk IV mode
	help
	  This device-mapper target allows you to create a device that
+2 −2
Original line number Diff line number Diff line
@@ -391,7 +391,7 @@ struct dm_buffer_cache {
	 */
	unsigned int num_locks;
	bool no_sleep;
	struct buffer_tree trees[];
	struct buffer_tree trees[] __counted_by(num_locks);
};

static DEFINE_STATIC_KEY_FALSE(no_sleep_enabled);
@@ -2511,7 +2511,7 @@ struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsign
	}

	num_locks = dm_num_hash_locks();
	c = kzalloc(sizeof(*c) + (num_locks * sizeof(struct buffer_tree)), GFP_KERNEL);
	c = kzalloc_flex(*c, cache.trees, num_locks);
	if (!c) {
		r = -ENOMEM;
		goto bad_client;
+17 −16
Original line number Diff line number Diff line
@@ -1023,6 +1023,12 @@ static bool cmd_write_lock(struct dm_cache_metadata *cmd)
			return;			\
	} while (0)

#define WRITE_LOCK_OR_GOTO(cmd, label)		\
	do {					\
		if (!cmd_write_lock((cmd)))	\
			goto label;		\
	} while (0)

#define WRITE_UNLOCK(cmd) \
	up_write(&(cmd)->root_lock)

@@ -1714,17 +1720,6 @@ int dm_cache_write_hints(struct dm_cache_metadata *cmd, struct dm_cache_policy *
	return r;
}

int dm_cache_metadata_all_clean(struct dm_cache_metadata *cmd, bool *result)
{
	int r;

	READ_LOCK(cmd);
	r = blocks_are_unmapped_or_clean(cmd, 0, cmd->cache_blocks, result);
	READ_UNLOCK(cmd);

	return r;
}

void dm_cache_metadata_set_read_only(struct dm_cache_metadata *cmd)
{
	WRITE_LOCK_VOID(cmd);
@@ -1791,11 +1786,8 @@ int dm_cache_metadata_abort(struct dm_cache_metadata *cmd)
	new_bm = dm_block_manager_create(cmd->bdev, DM_CACHE_METADATA_BLOCK_SIZE << SECTOR_SHIFT,
					 CACHE_MAX_CONCURRENT_LOCKS);

	WRITE_LOCK(cmd);
	if (cmd->fail_io) {
		WRITE_UNLOCK(cmd);
		goto out;
	}
	/* cmd_write_lock() already checks fail_io with cmd->root_lock held */
	WRITE_LOCK_OR_GOTO(cmd, out);

	__destroy_persistent_data_objects(cmd, false);
	old_bm = cmd->bm;
@@ -1824,3 +1816,12 @@ int dm_cache_metadata_abort(struct dm_cache_metadata *cmd)

	return r;
}

int dm_cache_metadata_clean_when_opened(struct dm_cache_metadata *cmd, bool *result)
{
	READ_LOCK(cmd);
	*result = cmd->clean_when_opened;
	READ_UNLOCK(cmd);

	return 0;
}
+5 −5
Original line number Diff line number Diff line
@@ -135,17 +135,17 @@ int dm_cache_get_metadata_dev_size(struct dm_cache_metadata *cmd,
 */
int dm_cache_write_hints(struct dm_cache_metadata *cmd, struct dm_cache_policy *p);

/*
 * Query method.  Are all the blocks in the cache clean?
 */
int dm_cache_metadata_all_clean(struct dm_cache_metadata *cmd, bool *result);

int dm_cache_metadata_needs_check(struct dm_cache_metadata *cmd, bool *result);
int dm_cache_metadata_set_needs_check(struct dm_cache_metadata *cmd);
void dm_cache_metadata_set_read_only(struct dm_cache_metadata *cmd);
void dm_cache_metadata_set_read_write(struct dm_cache_metadata *cmd);
int dm_cache_metadata_abort(struct dm_cache_metadata *cmd);

/*
 * Query method.  Was the metadata cleanly shut down when opened?
 */
int dm_cache_metadata_clean_when_opened(struct dm_cache_metadata *cmd, bool *result);

/*----------------------------------------------------------------*/

#endif /* DM_CACHE_METADATA_H */
Loading