Commit 63eb28bb authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Host driver for GICv5, the next generation interrupt controller for
     arm64, including support for interrupt routing, MSIs, interrupt
     translation and wired interrupts

   - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
     GICv5 hardware, leveraging the legacy VGIC interface

   - Userspace control of the 'nASSGIcap' GICv3 feature, allowing
     userspace to disable support for SGIs w/o an active state on
     hardware that previously advertised it unconditionally

   - Map supporting endpoints with cacheable memory attributes on
     systems with FEAT_S2FWB and DIC where KVM no longer needs to
     perform cache maintenance on the address range

   - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the
     guest hypervisor to inject external aborts into an L2 VM and take
     traps of masked external aborts to the hypervisor

   - Convert more system register sanitization to the config-driven
     implementation

   - Fixes to the visibility of EL2 registers, namely making VGICv3
     system registers accessible through the VGIC device instead of the
     ONE_REG vCPU ioctls

   - Various cleanups and minor fixes

  LoongArch:

   - Add stat information for in-kernel irqchip

   - Add tracepoints for CPUCFG and CSR emulation exits

   - Enhance in-kernel irqchip emulation

   - Various cleanups

  RISC-V:

   - Enable ring-based dirty memory tracking

   - Improve perf kvm stat to report interrupt events

   - Delegate illegal instruction trap to VS-mode

   - MMU improvements related to upcoming nested virtualization

  s390x

   - Fixes

  x86:

   - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O
     APIC, PIC, and PIT emulation at compile time

   - Share device posted IRQ code between SVM and VMX and harden it
     against bugs and runtime errors

   - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups
     O(1) instead of O(n)

   - For MMIO stale data mitigation, track whether or not a vCPU has
     access to (host) MMIO based on whether the page tables have MMIO
     pfns mapped; using VFIO is prone to false negatives

   - Rework the MSR interception code so that the SVM and VMX APIs are
     more or less identical

   - Recalculate all MSR intercepts from scratch on MSR filter changes,
     instead of maintaining shadow bitmaps

   - Advertise support for LKGS (Load Kernel GS base), a new instruction
     that's loosely related to FRED, but is supported and enumerated
     independently

   - Fix a user-triggerable WARN that syzkaller found by setting the
     vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting
     the vCPU into VMX Root Mode (post-VMXON). Trying to detect every
     possible path leading to architecturally forbidden states is hard
     and even risks breaking userspace (if it goes from valid to valid
     state but passes through invalid states), so just wait until
     KVM_RUN to detect that the vCPU state isn't allowed

   - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling
     interception of APERF/MPERF reads, so that a "properly" configured
     VM can access APERF/MPERF. This has many caveats (APERF/MPERF
     cannot be zeroed on vCPU creation or saved/restored on suspend and
     resume, or preserved over thread migration let alone VM migration)
     but can be useful whenever you're interested in letting Linux
     guests see the effective physical CPU frequency in /proc/cpuinfo

   - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been
     created, as there's no known use case for changing the default
     frequency for other VM types and it goes counter to the very reason
     why the ioctl was added to the vm file descriptor. And also, there
     would be no way to make it work for confidential VMs with a
     "secure" TSC, so kill two birds with one stone

   - Dynamically allocation the shadow MMU's hashed page list, and defer
     allocating the hashed list until it's actually needed (the TDP MMU
     doesn't use the list)

   - Extract many of KVM's helpers for accessing architectural local
     APIC state to common x86 so that they can be shared by guest-side
     code for Secure AVIC

   - Various cleanups and fixes

  x86 (Intel):

   - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest.
     Failure to honor FREEZE_IN_SMM can leak host state into guests

   - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to
     prevent L1 from running L2 with features that KVM doesn't support,
     e.g. BTF

  x86 (AMD):

   - WARN and reject loading kvm-amd.ko instead of panicking the kernel
     if the nested SVM MSRPM offsets tracker can't handle an MSR (which
     is pretty much a static condition and therefore should never
     happen, but still)

   - Fix a variety of flaws and bugs in the AVIC device posted IRQ code

   - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware
     supports) instead of rejecting vCPU creation

   - Extend enable_ipiv module param support to SVM, by simply leaving
     IsRunning clear in the vCPU's physical ID table entry

   - Disable IPI virtualization, via enable_ipiv, if the CPU is affected
     by erratum #1235, to allow (safely) enabling AVIC on such CPUs

   - Request GA Log interrupts if and only if the target vCPU is
     blocking, i.e. only if KVM needs a notification in order to wake
     the vCPU

   - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to
     the vCPU's CPUID model

   - Accept any SNP policy that is accepted by the firmware with respect
     to SMT and single-socket restrictions. An incompatible policy
     doesn't put the kernel at risk in any way, so there's no reason for
     KVM to care

   - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and
     use WBNOINVD instead of WBINVD when possible for SEV cache
     maintenance

   - When reclaiming memory from an SEV guest, only do cache flushes on
     CPUs that have ever run a vCPU for the guest, i.e. don't flush the
     caches for CPUs that can't possibly have cache lines with dirty,
     encrypted data

  Generic:

   - Rework irqbypass to track/match producers and consumers via an
     xarray instead of a linked list. Using a linked list leads to
     O(n^2) insertion times, which is hugely problematic for use cases
     that create large numbers of VMs. Such use cases typically don't
     actually use irqbypass, but eliminating the pointless registration
     is a future problem to solve as it likely requires new uAPI

   - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a
     "void *", to avoid making a simple concept unnecessarily difficult
     to understand

   - Decouple device posted IRQs from VFIO device assignment, as binding
     a VM to a VFIO group is not a requirement for enabling device
     posted IRQs

   - Clean up and document/comment the irqfd assignment code

   - Disallow binding multiple irqfds to an eventfd with a priority
     waiter, i.e. ensure an eventfd is bound to at most one irqfd
     through the entire host, and add a selftest to verify eventfd:irqfd
     bindings are globally unique

   - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues
     related to private <=> shared memory conversions

   - Drop guest_memfd's .getattr() implementation as the VFS layer will
     call generic_fillattr() if inode_operations.getattr is NULL

   - Fix issues with dirty ring harvesting where KVM doesn't bound the
     processing of entries in any way, which allows userspace to keep
     KVM in a tight loop indefinitely

   - Kill off kvm_arch_{start,end}_assignment() and x86's associated
     tracking, now that KVM no longer uses assigned_device_count as a
     heuristic for either irqbypass usage or MDS mitigation

  Selftests:

   - Fix a comment typo

   - Verify KVM is loaded when getting any KVM module param so that
     attempting to run a selftest without kvm.ko loaded results in a
     SKIP message about KVM not being loaded/enabled (versus some random
     parameter not existing)

   - Skip tests that hit EACCES when attempting to access a file, and
     print a "Root required?" help message. In most cases, the test just
     needs to be run with elevated permissions"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits)
  Documentation: KVM: Use unordered list for pre-init VGIC registers
  RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map()
  RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs
  RISC-V: perf/kvm: Add reporting of interrupt events
  RISC-V: KVM: Enable ring-based dirty memory tracking
  RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap
  RISC-V: KVM: Delegate illegal instruction fault to VS mode
  RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs
  RISC-V: KVM: Factor-out g-stage page table management
  RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence
  RISC-V: KVM: Introduce struct kvm_gstage_mapping
  RISC-V: KVM: Factor-out MMU related declarations into separate headers
  RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect()
  RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range()
  RISC-V: KVM: Don't flush TLB when PTE is unchanged
  RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH
  RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize()
  RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init()
  RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value
  KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list
  ...
parents 7d767a95 196d9e72
Loading
Loading
Loading
Loading
+41 −0
Original line number Diff line number Diff line
@@ -223,6 +223,47 @@ Before jumping into the kernel, the following conditions must be met:

    - SCR_EL3.HCE (bit 8) must be initialised to 0b1.

  For systems with a GICv5 interrupt controller to be used in v5 mode:

  - If the kernel is entered at EL1 and EL2 is present:

      - ICH_HFGRTR_EL2.ICC_PPI_ACTIVERn_EL1 (bit 20) must be initialised to 0b1.
      - ICH_HFGRTR_EL2.ICC_PPI_PRIORITYRn_EL1 (bit 19) must be initialised to 0b1.
      - ICH_HFGRTR_EL2.ICC_PPI_PENDRn_EL1 (bit 18) must be initialised to 0b1.
      - ICH_HFGRTR_EL2.ICC_PPI_ENABLERn_EL1 (bit 17) must be initialised to 0b1.
      - ICH_HFGRTR_EL2.ICC_PPI_HMRn_EL1 (bit 16) must be initialised to 0b1.
      - ICH_HFGRTR_EL2.ICC_IAFFIDR_EL1 (bit 7) must be initialised to 0b1.
      - ICH_HFGRTR_EL2.ICC_ICSR_EL1 (bit 6) must be initialised to 0b1.
      - ICH_HFGRTR_EL2.ICC_PCR_EL1 (bit 5) must be initialised to 0b1.
      - ICH_HFGRTR_EL2.ICC_HPPIR_EL1 (bit 4) must be initialised to 0b1.
      - ICH_HFGRTR_EL2.ICC_HAPR_EL1 (bit 3) must be initialised to 0b1.
      - ICH_HFGRTR_EL2.ICC_CR0_EL1 (bit 2) must be initialised to 0b1.
      - ICH_HFGRTR_EL2.ICC_IDRn_EL1 (bit 1) must be initialised to 0b1.
      - ICH_HFGRTR_EL2.ICC_APR_EL1 (bit 0) must be initialised to 0b1.

      - ICH_HFGWTR_EL2.ICC_PPI_ACTIVERn_EL1 (bit 20) must be initialised to 0b1.
      - ICH_HFGWTR_EL2.ICC_PPI_PRIORITYRn_EL1 (bit 19) must be initialised to 0b1.
      - ICH_HFGWTR_EL2.ICC_PPI_PENDRn_EL1 (bit 18) must be initialised to 0b1.
      - ICH_HFGWTR_EL2.ICC_PPI_ENABLERn_EL1 (bit 17) must be initialised to 0b1.
      - ICH_HFGWTR_EL2.ICC_ICSR_EL1 (bit 6) must be initialised to 0b1.
      - ICH_HFGWTR_EL2.ICC_PCR_EL1 (bit 5) must be initialised to 0b1.
      - ICH_HFGWTR_EL2.ICC_CR0_EL1 (bit 2) must be initialised to 0b1.
      - ICH_HFGWTR_EL2.ICC_APR_EL1 (bit 0) must be initialised to 0b1.

      - ICH_HFGITR_EL2.GICRCDNMIA (bit 10) must be initialised to 0b1.
      - ICH_HFGITR_EL2.GICRCDIA (bit 9) must be initialised to 0b1.
      - ICH_HFGITR_EL2.GICCDDI (bit 8) must be initialised to 0b1.
      - ICH_HFGITR_EL2.GICCDEOI (bit 7) must be initialised to 0b1.
      - ICH_HFGITR_EL2.GICCDHM (bit 6) must be initialised to 0b1.
      - ICH_HFGITR_EL2.GICCDRCFG (bit 5) must be initialised to 0b1.
      - ICH_HFGITR_EL2.GICCDPEND (bit 4) must be initialised to 0b1.
      - ICH_HFGITR_EL2.GICCDAFF (bit 3) must be initialised to 0b1.
      - ICH_HFGITR_EL2.GICCDPRI (bit 2) must be initialised to 0b1.
      - ICH_HFGITR_EL2.GICCDDIS (bit 1) must be initialised to 0b1.
      - ICH_HFGITR_EL2.GICCDEN (bit 0) must be initialised to 0b1.

  - The DT or ACPI tables must describe a GICv5 interrupt controller.

  For systems with a GICv3 interrupt controller to be used in v3 mode:
  - If EL3 is present:

+78 −0
Original line number Diff line number Diff line
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/interrupt-controller/arm,gic-v5-iwb.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#

title: ARM Generic Interrupt Controller, version 5 Interrupt Wire Bridge (IWB)

maintainers:
  - Lorenzo Pieralisi <lpieralisi@kernel.org>
  - Marc Zyngier <maz@kernel.org>

description: |
  The GICv5 architecture defines the guidelines to implement GICv5
  compliant interrupt controllers for AArch64 systems.

  The GICv5 specification can be found at
  https://developer.arm.com/documentation/aes0070

  GICv5 has zero or more Interrupt Wire Bridges (IWB) that are responsible
  for translating wire signals into interrupt messages to the GICv5 ITS.

allOf:
  - $ref: /schemas/interrupt-controller.yaml#

properties:
  compatible:
    const: arm,gic-v5-iwb

  reg:
    items:
      - description: IWB control frame

  "#address-cells":
    const: 0

  "#interrupt-cells":
    description: |
      The 1st cell corresponds to the IWB wire.

      The 2nd cell is the flags, encoded as follows:
      bits[3:0] trigger type and level flags.

      1 = low-to-high edge triggered
      2 = high-to-low edge triggered
      4 = active high level-sensitive
      8 = active low level-sensitive

    const: 2

  interrupt-controller: true

  msi-parent:
    maxItems: 1

required:
  - compatible
  - reg
  - "#interrupt-cells"
  - interrupt-controller
  - msi-parent

additionalProperties: false

examples:
  - |
    interrupt-controller@2f000000 {
      compatible = "arm,gic-v5-iwb";
      reg = <0x2f000000 0x10000>;

      #address-cells = <0>;

      #interrupt-cells = <2>;
      interrupt-controller;

      msi-parent = <&its0 64>;
    };
...
+267 −0
Original line number Diff line number Diff line
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/interrupt-controller/arm,gic-v5.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#

title: ARM Generic Interrupt Controller, version 5

maintainers:
  - Lorenzo Pieralisi <lpieralisi@kernel.org>
  - Marc Zyngier <maz@kernel.org>

description: |
  The GICv5 architecture defines the guidelines to implement GICv5
  compliant interrupt controllers for AArch64 systems.

  The GICv5 specification can be found at
  https://developer.arm.com/documentation/aes0070

  The GICv5 architecture is composed of multiple components:
    - one or more IRS (Interrupt Routing Service)
    - zero or more ITS (Interrupt Translation Service)

  The architecture defines:
    - PE-Private Peripheral Interrupts (PPI)
    - Shared Peripheral Interrupts (SPI)
    - Logical Peripheral Interrupts (LPI)

allOf:
  - $ref: /schemas/interrupt-controller.yaml#

properties:
  compatible:
    const: arm,gic-v5

  "#address-cells":
    enum: [ 1, 2 ]

  "#size-cells":
    enum: [ 1, 2 ]

  ranges: true

  "#interrupt-cells":
    description: |
      The 1st cell corresponds to the INTID.Type field in the INTID; 1 for PPI,
      3 for SPI. LPI interrupts must not be described in the bindings since
      they are allocated dynamically by the software component managing them.

      The 2nd cell contains the interrupt INTID.ID field.

      The 3rd cell is the flags, encoded as follows:
      bits[3:0] trigger type and level flags.

        1 = low-to-high edge triggered
        2 = high-to-low edge triggered
        4 = active high level-sensitive
        8 = active low level-sensitive

    const: 3

  interrupt-controller: true

  interrupts:
    description:
      The VGIC maintenance interrupt.
    maxItems: 1

required:
  - compatible
  - "#address-cells"
  - "#size-cells"
  - ranges
  - "#interrupt-cells"
  - interrupt-controller

patternProperties:
  "^irs@[0-9a-f]+$":
    type: object
    description:
      GICv5 has one or more Interrupt Routing Services (IRS) that are
      responsible for handling IRQ state and routing.

    additionalProperties: false

    properties:
      compatible:
        const: arm,gic-v5-irs

      reg:
        minItems: 1
        items:
          - description: IRS config frames
          - description: IRS setlpi frames

      reg-names:
        description:
          Describe config and setlpi frames that are present.
          "ns-" stands for non-secure, "s-" for secure, "realm-" for realm
          and "el3-" for EL3.
        minItems: 1
        maxItems: 8
        items:
          enum: [ ns-config, s-config, realm-config, el3-config, ns-setlpi,
                  s-setlpi, realm-setlpi, el3-setlpi ]

      "#address-cells":
        enum: [ 1, 2 ]

      "#size-cells":
        enum: [ 1, 2 ]

      ranges: true

      dma-noncoherent:
        description:
          Present if the GIC IRS permits programming shareability and
          cacheability attributes but is connected to a non-coherent
          downstream interconnect.

      cpus:
        description:
          CPUs managed by the IRS.

      arm,iaffids:
        $ref: /schemas/types.yaml#/definitions/uint16-array
        description:
          Interrupt AFFinity ID (IAFFID) associated with the CPU whose
          CPU node phandle is at the same index in the cpus array.

    patternProperties:
      "^its@[0-9a-f]+$":
        type: object
        description:
          GICv5 has zero or more Interrupt Translation Services (ITS) that are
          used to route Message Signalled Interrupts (MSI) to the CPUs. Each
          ITS is connected to an IRS.
        additionalProperties: false

        properties:
          compatible:
            const: arm,gic-v5-its

          reg:
            items:
              - description: ITS config frames

          reg-names:
            description:
              Describe config frames that are present.
              "ns-" stands for non-secure, "s-" for secure, "realm-" for realm
              and "el3-" for EL3.
            minItems: 1
            maxItems: 4
            items:
              enum: [ ns-config, s-config, realm-config, el3-config ]

          "#address-cells":
            enum: [ 1, 2 ]

          "#size-cells":
            enum: [ 1, 2 ]

          ranges: true

          dma-noncoherent:
            description:
              Present if the GIC ITS permits programming shareability and
              cacheability attributes but is connected to a non-coherent
              downstream interconnect.

        patternProperties:
          "^msi-controller@[0-9a-f]+$":
            type: object
            description:
              GICv5 ITS has one or more translate register frames.
            additionalProperties: false

            properties:
              reg:
                items:
                  - description: ITS translate frames

              reg-names:
                description:
                  Describe translate frames that are present.
                  "ns-" stands for non-secure, "s-" for secure, "realm-" for realm
                  and "el3-" for EL3.
                minItems: 1
                maxItems: 4
                items:
                  enum: [ ns-translate, s-translate, realm-translate, el3-translate ]

              "#msi-cells":
                description:
                  The single msi-cell is the DeviceID of the device which will
                  generate the MSI.
                const: 1

              msi-controller: true

            required:
              - reg
              - reg-names
              - "#msi-cells"
              - msi-controller

        required:
          - compatible
          - reg
          - reg-names

    required:
      - compatible
      - reg
      - reg-names
      - cpus
      - arm,iaffids

additionalProperties: false

examples:
  - |
    interrupt-controller {
      compatible = "arm,gic-v5";

      #interrupt-cells = <3>;
      interrupt-controller;

      #address-cells = <1>;
      #size-cells = <1>;
      ranges;

      interrupts = <1 25 4>;

      irs@2f1a0000 {
        compatible = "arm,gic-v5-irs";
        reg = <0x2f1a0000 0x10000>;  // IRS_CONFIG_FRAME
        reg-names = "ns-config";

        #address-cells = <1>;
        #size-cells = <1>;
        ranges;

        cpus = <&cpu0>, <&cpu1>, <&cpu2>, <&cpu3>, <&cpu4>, <&cpu5>, <&cpu6>, <&cpu7>;
        arm,iaffids = /bits/ 16 <0 1 2 3 4 5 6 7>;

        its@2f120000 {
          compatible = "arm,gic-v5-its";
          reg = <0x2f120000 0x10000>;   // ITS_CONFIG_FRAME
          reg-names = "ns-config";

          #address-cells = <1>;
          #size-cells = <1>;
          ranges;

          msi-controller@2f130000 {
            reg = <0x2f130000 0x10000>;   // ITS_TRANSLATE_FRAME
            reg-names = "ns-translate";

            #msi-cells = <1>;
            msi-controller;
          };
        };
      };
    };
...
+37 −3
Original line number Diff line number Diff line
@@ -2006,7 +2006,7 @@ frequency is KHz.

If the KVM_CAP_VM_TSC_CONTROL capability is advertised, this can also
be used as a vm ioctl to set the initial tsc frequency of subsequently
created vCPUs.
created vCPUs.  Note, the vm ioctl is only allowed prior to creating vCPUs.

For TSC protected Confidential Computing (CoCo) VMs where TSC frequency
is configured once at VM scope and remains unchanged during VM's
@@ -7851,6 +7851,7 @@ Valid bits in args[0] are::
  #define KVM_X86_DISABLE_EXITS_HLT              (1 << 1)
  #define KVM_X86_DISABLE_EXITS_PAUSE            (1 << 2)
  #define KVM_X86_DISABLE_EXITS_CSTATE           (1 << 3)
  #define KVM_X86_DISABLE_EXITS_APERFMPERF       (1 << 4)

Enabling this capability on a VM provides userspace with a way to no
longer intercept some instructions for improved latency in some
@@ -7861,6 +7862,28 @@ all such vmexits.

Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits.

Virtualizing the ``IA32_APERF`` and ``IA32_MPERF`` MSRs requires more
than just disabling APERF/MPERF exits. While both Intel and AMD
document strict usage conditions for these MSRs--emphasizing that only
the ratio of their deltas over a time interval (T0 to T1) is
architecturally defined--simply passing through the MSRs can still
produce an incorrect ratio.

This erroneous ratio can occur if, between T0 and T1:

1. The vCPU thread migrates between logical processors.
2. Live migration or suspend/resume operations take place.
3. Another task shares the vCPU's logical processor.
4. C-states lower than C0 are emulated (e.g., via HLT interception).
5. The guest TSC frequency doesn't match the host TSC frequency.

Due to these complexities, KVM does not automatically associate this
passthrough capability with the guest CPUID bit,
``CPUID.6:ECX.APERFMPERF[bit 0]``. Userspace VMMs that deem this
mechanism adequate for virtualizing the ``IA32_APERF`` and
``IA32_MPERF`` MSRs must set the guest CPUID bit explicitly.


7.14 KVM_CAP_S390_HPAGE_1M
--------------------------

@@ -8387,7 +8410,7 @@ core crystal clock frequency, if a non-zero CPUID 0x15 is exposed to the guest.
7.36 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
----------------------------------------------------------

:Architectures: x86, arm64
:Architectures: x86, arm64, riscv
:Type: vm
:Parameters: args[0] - size of the dirty log ring

@@ -8599,7 +8622,7 @@ ENOSYS for the others.
When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of
type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request.

7.37 KVM_CAP_ARM_WRITABLE_IMP_ID_REGS
7.42 KVM_CAP_ARM_WRITABLE_IMP_ID_REGS
-------------------------------------

:Architectures: arm64
@@ -8628,6 +8651,17 @@ given VM.
When this capability is enabled, KVM resets the VCPU when setting
MP_STATE_INIT_RECEIVED through IOCTL.  The original MP_STATE is preserved.

7.43 KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED
-------------------------------------------

:Architectures: arm64
:Target: VM
:Parameters: None

This capability indicate to the userspace whether a PFNMAP memory region
can be safely mapped as cacheable. This relies on the presence of
force write back (FWB) feature support on the hardware.

8. Other capabilities.
======================

+72 −5
Original line number Diff line number Diff line
@@ -78,6 +78,8 @@ Groups:
    -ENXIO   The group or attribute is unknown/unsupported for this device
             or hardware support is missing.
    -EFAULT  Invalid user pointer for attr->addr.
    -EBUSY   Attempt to write a register that is read-only after
             initialization
    =======  =============================================================


@@ -120,6 +122,12 @@ Groups:
    Note that distributor fields are not banked, but return the same value
    regardless of the mpidr used to access the register.

    Userspace is allowed to write the following register fields prior to
    initialization of the VGIC:

      * GICD_IIDR.Revision
      * GICD_TYPER2.nASSGIcap

    GICD_IIDR.Revision is updated when the KVM implementation is changed in a
    way directly observable by the guest or userspace.  Userspace should read
    GICD_IIDR from KVM and write back the read value to confirm its expected
@@ -128,6 +136,12 @@ Groups:
    behavior.


    GICD_TYPER2.nASSGIcap allows userspace to control the support of SGIs
    without an active state. At VGIC creation the field resets to the
    maximum capability of the system. Userspace is expected to read the field
    to determine the supported value(s) before writing to the field.


    The GICD_STATUSR and GICR_STATUSR registers are architecturally defined such
    that a write of a clear bit has no effect, whereas a write with a set bit
    clears that value.  To allow userspace to freely set the values of these two
@@ -202,16 +216,69 @@ Groups:
    KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS accesses the CPU interface registers for the
    CPU specified by the mpidr field.

    CPU interface registers access is not implemented for AArch32 mode.
    Error -ENXIO is returned when accessed in AArch32 mode.
    The available registers are:

    ===============  ====================================================
    ICC_PMR_EL1
    ICC_BPR0_EL1
    ICC_AP0R0_EL1
    ICC_AP0R1_EL1    when the host implements at least 6 bits of priority
    ICC_AP0R2_EL1    when the host implements 7 bits of priority
    ICC_AP0R3_EL1    when the host implements 7 bits of priority
    ICC_AP1R0_EL1
    ICC_AP1R1_EL1    when the host implements at least 6 bits of priority
    ICC_AP1R2_EL1    when the host implements 7 bits of priority
    ICC_AP1R3_EL1    when the host implements 7 bits of priority
    ICC_BPR1_EL1
    ICC_CTLR_EL1
    ICC_SRE_EL1
    ICC_IGRPEN0_EL1
    ICC_IGRPEN1_EL1
    ===============  ====================================================

    When EL2 is available for the guest, these registers are also available:

    =============  ====================================================
    ICH_AP0R0_EL2
    ICH_AP0R1_EL2  when the host implements at least 6 bits of priority
    ICH_AP0R2_EL2  when the host implements 7 bits of priority
    ICH_AP0R3_EL2  when the host implements 7 bits of priority
    ICH_AP1R0_EL2
    ICH_AP1R1_EL2  when the host implements at least 6 bits of priority
    ICH_AP1R2_EL2  when the host implements 7 bits of priority
    ICH_AP1R3_EL2  when the host implements 7 bits of priority
    ICH_HCR_EL2
    ICC_SRE_EL2
    ICH_VTR_EL2
    ICH_VMCR_EL2
    ICH_LR0_EL2
    ICH_LR1_EL2
    ICH_LR2_EL2
    ICH_LR3_EL2
    ICH_LR4_EL2
    ICH_LR5_EL2
    ICH_LR6_EL2
    ICH_LR7_EL2
    ICH_LR8_EL2
    ICH_LR9_EL2
    ICH_LR10_EL2
    ICH_LR11_EL2
    ICH_LR12_EL2
    ICH_LR13_EL2
    ICH_LR14_EL2
    ICH_LR15_EL2
    =============  ====================================================

    CPU interface registers are only described using the AArch64
    encoding.

  Errors:

    =======  =====================================================
    -ENXIO   Getting or setting this register is not yet supported
    =======  =================================================
    -ENXIO   Getting or setting this register is not supported
    -EBUSY   VCPU is running
    -EINVAL  Invalid mpidr or register value supplied
    =======  =====================================================
    =======  =================================================


  KVM_DEV_ARM_VGIC_GRP_NR_IRQS
Loading