Commit 62deb67f authored by Jakub Kicinski's avatar Jakub Kicinski
Browse files


Cross-merge networking fixes after downstream PR (net-6.16-rc3).

No conflicts or adjacent changes.

Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parents afc783fa 5c8013ae
Loading
Loading
Loading
Loading
+6 −3
Original line number Diff line number Diff line
@@ -197,6 +197,7 @@ Daniel Borkmann <daniel@iogearbox.net> <daniel.borkmann@tik.ee.ethz.ch>
Daniel Borkmann <daniel@iogearbox.net> <dborkmann@redhat.com>
Daniel Borkmann <daniel@iogearbox.net> <dborkman@redhat.com>
Daniel Borkmann <daniel@iogearbox.net> <dxchgb@gmail.com>
Danilo Krummrich <dakr@kernel.org> <dakr@redhat.com>
David Brownell <david-b@pacbell.net>
David Collins <quic_collinsd@quicinc.com> <collinsd@codeaurora.org>
David Heidelberg <david@ixit.cz> <d.okias@gmail.com>
@@ -282,6 +283,7 @@ Gustavo Padovan <gustavo@las.ic.unicamp.br>
Gustavo Padovan <padovan@profusion.mobi>
Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> <hamza.mahfooz@amd.com>
Hanjun Guo <guohanjun@huawei.com> <hanjun.guo@linaro.org>
Hans de Goede <hansg@kernel.org> <hdegoede@redhat.com>
Hans Verkuil <hverkuil@xs4all.nl> <hansverk@cisco.com>
Hans Verkuil <hverkuil@xs4all.nl> <hverkuil-cisco@xs4all.nl>
Harry Yoo <harry.yoo@oracle.com> <42.hyeyoo@gmail.com>
@@ -691,9 +693,10 @@ Serge Hallyn <sergeh@kernel.org> <serge.hallyn@canonical.com>
Serge Hallyn <sergeh@kernel.org> <serue@us.ibm.com>
Seth Forshee <sforshee@kernel.org> <seth.forshee@canonical.com>
Shakeel Butt <shakeel.butt@linux.dev> <shakeelb@google.com>
Shannon Nelson <shannon.nelson@amd.com> <snelson@pensando.io>
Shannon Nelson <shannon.nelson@amd.com> <shannon.nelson@intel.com>
Shannon Nelson <shannon.nelson@amd.com> <shannon.nelson@oracle.com>
Shannon Nelson <sln@onemain.com> <shannon.nelson@amd.com>
Shannon Nelson <sln@onemain.com> <snelson@pensando.io>
Shannon Nelson <sln@onemain.com> <shannon.nelson@intel.com>
Shannon Nelson <sln@onemain.com> <shannon.nelson@oracle.com>
Sharath Chandra Vurukala <quic_sharathv@quicinc.com> <sharathv@codeaurora.org>
Shiraz Hashim <shiraz.linux.kernel@gmail.com> <shiraz.hashim@st.com>
Shuah Khan <shuah@kernel.org> <shuahkhan@gmail.com>
+2 −0
Original line number Diff line number Diff line
@@ -270,6 +270,8 @@ configured for Unix Extensions (and the client has not disabled
illegal Windows/NTFS/SMB characters to a remap range (this mount parameter
is the default for SMB3). This remap (``mapposix``) range is also
compatible with Mac (and "Services for Mac" on some older Windows).
When POSIX Extensions for SMB 3.1.1 are negotiated, remapping is automatically
disabled.

CIFS VFS Mount Options
======================
+77 −0
Original line number Diff line number Diff line
@@ -352,6 +352,83 @@ For reaching best IO performance, ublk server should align its segment
parameter of `struct ublk_param_segment` with backend for avoiding
unnecessary IO split, which usually hurts io_uring performance.

Auto Buffer Registration
------------------------

The ``UBLK_F_AUTO_BUF_REG`` feature automatically handles buffer registration
and unregistration for I/O requests, which simplifies the buffer management
process and reduces overhead in the ublk server implementation.

This is another feature flag for using zero copy, and it is compatible with
``UBLK_F_SUPPORT_ZERO_COPY``.

Feature Overview
~~~~~~~~~~~~~~~~

This feature automatically registers request buffers to the io_uring context
before delivering I/O commands to the ublk server and unregisters them when
completing I/O commands. This eliminates the need for manual buffer
registration/unregistration via ``UBLK_IO_REGISTER_IO_BUF`` and
``UBLK_IO_UNREGISTER_IO_BUF`` commands, then IO handling in ublk server
can avoid dependency on the two uring_cmd operations.

IOs can't be issued concurrently to io_uring if there is any dependency
among these IOs. So this way not only simplifies ublk server implementation,
but also makes concurrent IO handling becomes possible by removing the
dependency on buffer registration & unregistration commands.

Usage Requirements
~~~~~~~~~~~~~~~~~~

1. The ublk server must create a sparse buffer table on the same ``io_ring_ctx``
   used for ``UBLK_IO_FETCH_REQ`` and ``UBLK_IO_COMMIT_AND_FETCH_REQ``. If
   uring_cmd is issued on a different ``io_ring_ctx``, manual buffer
   unregistration is required.

2. Buffer registration data must be passed via uring_cmd's ``sqe->addr`` with the
   following structure::

    struct ublk_auto_buf_reg {
        __u16 index;      /* Buffer index for registration */
        __u8 flags;       /* Registration flags */
        __u8 reserved0;   /* Reserved for future use */
        __u32 reserved1;  /* Reserved for future use */
    };

   ublk_auto_buf_reg_to_sqe_addr() is for converting the above structure into
   ``sqe->addr``.

3. All reserved fields in ``ublk_auto_buf_reg`` must be zeroed.

4. Optional flags can be passed via ``ublk_auto_buf_reg.flags``.

Fallback Behavior
~~~~~~~~~~~~~~~~~

If auto buffer registration fails:

1. When ``UBLK_AUTO_BUF_REG_FALLBACK`` is enabled:

   - The uring_cmd is completed
   - ``UBLK_IO_F_NEED_REG_BUF`` is set in ``ublksrv_io_desc.op_flags``
   - The ublk server must manually deal with the failure, such as, register
     the buffer manually, or using user copy feature for retrieving the data
     for handling ublk IO

2. If fallback is not enabled:

   - The ublk I/O request fails silently
   - The uring_cmd won't be completed

Limitations
~~~~~~~~~~~

- Requires same ``io_ring_ctx`` for all operations
- May require manual buffer management in fallback cases
- io_ring_ctx buffer table has a max size of 16K, which may not be enough
  in case that too many ublk devices are handled by this single io_ring_ctx
  and each one has very large queue depth

References
==========

+0 −65
Original line number Diff line number Diff line
Device-tree bindings for persistent memory regions
-----------------------------------------------------

Persistent memory refers to a class of memory devices that are:

	a) Usable as main system memory (i.e. cacheable), and
	b) Retain their contents across power failure.

Given b) it is best to think of persistent memory as a kind of memory mapped
storage device. To ensure data integrity the operating system needs to manage
persistent regions separately to the normal memory pool. To aid with that this
binding provides a standardised interface for discovering where persistent
memory regions exist inside the physical address space.

Bindings for the region nodes:
-----------------------------

Required properties:
	- compatible = "pmem-region"

	- reg = <base, size>;
		The reg property should specify an address range that is
		translatable to a system physical address range. This address
		range should be mappable as normal system memory would be
		(i.e cacheable).

		If the reg property contains multiple address ranges
		each address range will be treated as though it was specified
		in a separate device node. Having multiple address ranges in a
		node implies no special relationship between the two ranges.

Optional properties:
	- Any relevant NUMA associativity properties for the target platform.

	- volatile; This property indicates that this region is actually
	  backed by non-persistent memory. This lets the OS know that it
	  may skip the cache flushes required to ensure data is made
	  persistent after a write.

	  If this property is absent then the OS must assume that the region
	  is backed by non-volatile memory.

Examples:
--------------------

	/*
	 * This node specifies one 4KB region spanning from
	 * 0x5000 to 0x5fff that is backed by non-volatile memory.
	 */
	pmem@5000 {
		compatible = "pmem-region";
		reg = <0x00005000 0x00001000>;
	};

	/*
	 * This node specifies two 4KB regions that are backed by
	 * volatile (normal) memory.
	 */
	pmem@6000 {
		compatible = "pmem-region";
		reg = < 0x00006000 0x00001000
			0x00008000 0x00001000 >;
		volatile;
	};
+48 −0
Original line number Diff line number Diff line
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/pmem-region.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#

maintainers:
  - Oliver O'Halloran <oohall@gmail.com>

title: Persistent Memory Regions

description: |
  Persistent memory refers to a class of memory devices that are:

    a) Usable as main system memory (i.e. cacheable), and
    b) Retain their contents across power failure.

  Given b) it is best to think of persistent memory as a kind of memory mapped
  storage device. To ensure data integrity the operating system needs to manage
  persistent regions separately to the normal memory pool. To aid with that this
  binding provides a standardised interface for discovering where persistent
  memory regions exist inside the physical address space.

properties:
  compatible:
    const: pmem-region

  reg:
    maxItems: 1

  volatile:
    description:
      Indicates the region is volatile (non-persistent) and the OS can skip
      cache flushes for writes
    type: boolean

required:
  - compatible
  - reg

additionalProperties: false

examples:
  - |
    pmem@5000 {
        compatible = "pmem-region";
        reg = <0x00005000 0x00001000>;
    };
Loading