guest_memfd:

* Add support for host userspace mapping of guest_memfd-backed memory for VM
   types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE (which isn't
   precisely the same thing as CoCo VMs, since x86's SEV-MEM and SEV-ES have
   no way to detect private vs. shared).
 
   This lays the groundwork for removal of guest memory from the kernel direct
   map, as well as for limited mmap() for guest_memfd-backed memory.
 
   For more information see:
   * a6ad54137a ("Merge branch 'guest-memfd-mmap' into HEAD", 2025-08-27)
   * https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
     (guest_memfd in Firecracker)
   * https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/
     (direct map removal)
   * https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/
     (mmap support)
 
 ARM:
 
 * Add support for FF-A 1.2 as the secure memory conduit for pKVM,
   allowing more registers to be used as part of the message payload.
 
 * Change the way pKVM allocates its VM handles, making sure that the
   privileged hypervisor is never tricked into using uninitialised
   data.
 
 * Speed up MMIO range registration by avoiding unnecessary RCU
   synchronisation, which results in VMs starting much quicker.
 
 * Add the dump of the instruction stream when panic-ing in the EL2
   payload, just like the rest of the kernel has always done. This will
   hopefully help debugging non-VHE setups.
 
 * Add 52bit PA support to the stage-1 page-table walker, and make use
   of it to populate the fault level reported to the guest on failing
   to translate a stage-1 walk.
 
 * Add NV support to the GICv3-on-GICv5 emulation code, ensuring
   feature parity for guests, irrespective of the host platform.
 
 * Fix some really ugly architecture problems when dealing with debug
   in a nested VM. This has some bad performance impacts, but is at
   least correct.
 
 * Add enough infrastructure to be able to disable EL2 features and
   give effective values to the EL2 control registers. This then allows
   a bunch of features to be turned off, which helps cross-host
   migration.
 
 * Large rework of the selftest infrastructure to allow most tests to
   transparently run at EL2. This is the first step towards enabling
   NV testing.
 
 * Various fixes and improvements all over the map, including one BE
   fix, just in time for the removal of the feature.
 
 LoongArch:
 
 * Detect page table walk feature on new hardware
 
 * Add sign extension with kernel MMIO/IOCSR emulation
 
 * Improve in-kernel IPI emulation
 
 * Improve in-kernel PCH-PIC emulation
 
 * Move kvm_iocsr tracepoint out of generic code
 
 RISC-V:
 
 * Added SBI FWFT extension for Guest/VM with misaligned delegation and
   pointer masking PMLEN features
 
 * Added ONE_REG interface for SBI FWFT extension
 
 * Added Zicbop and bfloat16 extensions for Guest/VM
 
 * Enabled more common KVM selftests for RISC-V
 
 * Added SBI v3.0 PMU enhancements in KVM and perf driver
 
 s390:
 
 * Improve interrupt cpu for wakeup, in particular the heuristic to decide
   which vCPU to deliver a floating interrupt to.
 
 * Clear the PTE when discarding a swapped page because of CMMA; this
   bug was introduced in 6.16 when refactoring gmap code.
 
 x86 selftests:
 
 * Add #DE coverage in the fastops test (the only exception that's guest-
   triggerable in fastop-emulated instructions).
 
 * Fix PMU selftests errors encountered on Granite Rapids (GNR), Sierra
   Forest (SRF) and Clearwater Forest (CWF).
 
 * Minor cleanups and improvements
 
 x86 (guest side):
 
 * For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when
   overriding guest MTRR for TDX/SNP to fix an issue where ACPI auto-mapping
   could map devices as WB and prevent the device drivers from mapping their
   devices with UC/UC-.
 
 * Make kvm_async_pf_task_wake() a local static helper and remove its
   export.
 
 * Use native qspinlocks when running in a VM with dedicated vCPU=>pCPU
   bindings even when PV_UNHALT is unsupported.
 
 Generic:
 
 * Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as __GFP_NOWARN is
   now included in GFP_NOWAIT.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmjcGSkUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPSPAgAnJDswU4fZ5YdJr6jGzsbSQ6utlIV
 FeEltLKQIM7Aq/uvL6PLN5Kx1Pb/d9r9ag39mDT6lq9fOfJdOLjJr2SBXPTCsrPS
 6hyNL1mlgo5qzs54T8dkMbQThlSgA4zaehsc0zl8vnwil6ygoAdrtTHqZm6V0hu/
 F/sVlikCsLix1hC0KtzwscyWYcjWtXfVoi9eU5WY6ALpQaVXfRUtwyOhGDkldr+m
 i3iDiGiLAZ5Iu3igUCIOEzSSQY0FgLJpzbwJAeUxIvomDkHGJLaR14ijvM+NkRZi
 FBo2CLbjrwXb56Rbh2ABcq0CGJ3EiU3L+CC34UaRLzbtl/2BtpetkC3irA==
 =fyov
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "This excludes the bulk of the x86 changes, which I will send
  separately. They have two not complex but relatively unusual conflicts
  so I will wait for other dust to settle.

  guest_memfd:

   - Add support for host userspace mapping of guest_memfd-backed memory
     for VM types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE
     (which isn't precisely the same thing as CoCo VMs, since x86's
     SEV-MEM and SEV-ES have no way to detect private vs. shared).

     This lays the groundwork for removal of guest memory from the
     kernel direct map, as well as for limited mmap() for
     guest_memfd-backed memory.

     For more information see:
       - commit a6ad54137a ("Merge branch 'guest-memfd-mmap' into HEAD")
       - guest_memfd in Firecracker:
           https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
       - direct map removal:
           https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/
       - mmap support:
           https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/

  ARM:

   - Add support for FF-A 1.2 as the secure memory conduit for pKVM,
     allowing more registers to be used as part of the message payload.

   - Change the way pKVM allocates its VM handles, making sure that the
     privileged hypervisor is never tricked into using uninitialised
     data.

   - Speed up MMIO range registration by avoiding unnecessary RCU
     synchronisation, which results in VMs starting much quicker.

   - Add the dump of the instruction stream when panic-ing in the EL2
     payload, just like the rest of the kernel has always done. This
     will hopefully help debugging non-VHE setups.

   - Add 52bit PA support to the stage-1 page-table walker, and make use
     of it to populate the fault level reported to the guest on failing
     to translate a stage-1 walk.

   - Add NV support to the GICv3-on-GICv5 emulation code, ensuring
     feature parity for guests, irrespective of the host platform.

   - Fix some really ugly architecture problems when dealing with debug
     in a nested VM. This has some bad performance impacts, but is at
     least correct.

   - Add enough infrastructure to be able to disable EL2 features and
     give effective values to the EL2 control registers. This then
     allows a bunch of features to be turned off, which helps cross-host
     migration.

   - Large rework of the selftest infrastructure to allow most tests to
     transparently run at EL2. This is the first step towards enabling
     NV testing.

   - Various fixes and improvements all over the map, including one BE
     fix, just in time for the removal of the feature.

  LoongArch:

   - Detect page table walk feature on new hardware

   - Add sign extension with kernel MMIO/IOCSR emulation

   - Improve in-kernel IPI emulation

   - Improve in-kernel PCH-PIC emulation

   - Move kvm_iocsr tracepoint out of generic code

  RISC-V:

   - Added SBI FWFT extension for Guest/VM with misaligned delegation
     and pointer masking PMLEN features

   - Added ONE_REG interface for SBI FWFT extension

   - Added Zicbop and bfloat16 extensions for Guest/VM

   - Enabled more common KVM selftests for RISC-V

   - Added SBI v3.0 PMU enhancements in KVM and perf driver

  s390:

   - Improve interrupt cpu for wakeup, in particular the heuristic to
     decide which vCPU to deliver a floating interrupt to.

   - Clear the PTE when discarding a swapped page because of CMMA; this
     bug was introduced in 6.16 when refactoring gmap code.

  x86 selftests:

   - Add #DE coverage in the fastops test (the only exception that's
     guest- triggerable in fastop-emulated instructions).

   - Fix PMU selftests errors encountered on Granite Rapids (GNR),
     Sierra Forest (SRF) and Clearwater Forest (CWF).

   - Minor cleanups and improvements

  x86 (guest side):

   - For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when
     overriding guest MTRR for TDX/SNP to fix an issue where ACPI
     auto-mapping could map devices as WB and prevent the device drivers
     from mapping their devices with UC/UC-.

   - Make kvm_async_pf_task_wake() a local static helper and remove its
     export.

   - Use native qspinlocks when running in a VM with dedicated
     vCPU=>pCPU bindings even when PV_UNHALT is unsupported.

  Generic:

   - Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as
     __GFP_NOWARN is now included in GFP_NOWAIT.

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (178 commits)
  KVM: s390: Fix to clear PTE when discarding a swapped page
  KVM: arm64: selftests: Cover ID_AA64ISAR3_EL1 in set_id_regs
  KVM: arm64: selftests: Remove a duplicate register listing in set_id_regs
  KVM: arm64: selftests: Cope with arch silliness in EL2 selftest
  KVM: arm64: selftests: Add basic test for running in VHE EL2
  KVM: arm64: selftests: Enable EL2 by default
  KVM: arm64: selftests: Initialize HCR_EL2
  KVM: arm64: selftests: Use the vCPU attr for setting nr of PMU counters
  KVM: arm64: selftests: Use hyp timer IRQs when test runs at EL2
  KVM: arm64: selftests: Select SMCCC conduit based on current EL
  KVM: arm64: selftests: Provide helper for getting default vCPU target
  KVM: arm64: selftests: Alias EL1 registers to EL2 counterparts
  KVM: arm64: selftests: Create a VGICv3 for 'default' VMs
  KVM: arm64: selftests: Add unsanitised helpers for VGICv3 creation
  KVM: arm64: selftests: Add helper to check for VGICv3 support
  KVM: arm64: selftests: Initialize VGICv3 only once
  KVM: arm64: selftests: Provide kvm_arch_vm_post_create() in library code
  KVM: selftests: Add ex_str() to print human friendly name of exception vectors
  selftests/kvm: remove stale TODO in xapic_state_test
  KVM: selftests: Handle Intel Atom errata that leads to PMU event overcount
  ...
This commit is contained in:
Linus Torvalds 2025-10-04 08:52:16 -07:00
commit f3826aa996
148 changed files with 4132 additions and 1501 deletions

View File

@ -6414,6 +6414,15 @@ most one mapping per page, i.e. binding multiple memory regions to a single
guest_memfd range is not allowed (any number of memory regions can be bound to guest_memfd range is not allowed (any number of memory regions can be bound to
a single guest_memfd file, but the bound ranges must not overlap). a single guest_memfd file, but the bound ranges must not overlap).
When the capability KVM_CAP_GUEST_MEMFD_MMAP is supported, the 'flags' field
supports GUEST_MEMFD_FLAG_MMAP. Setting this flag on guest_memfd creation
enables mmap() and faulting of guest_memfd memory to host userspace.
When the KVM MMU performs a PFN lookup to service a guest fault and the backing
guest_memfd has the GUEST_MEMFD_FLAG_MMAP set, then the fault will always be
consumed from guest_memfd, regardless of whether it is a shared or a private
fault.
See KVM_SET_USER_MEMORY_REGION2 for additional details. See KVM_SET_USER_MEMORY_REGION2 for additional details.
4.143 KVM_PRE_FAULT_MEMORY 4.143 KVM_PRE_FAULT_MEMORY

View File

@ -81,6 +81,8 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___kvm_timer_set_cntvoff, __KVM_HOST_SMCCC_FUNC___kvm_timer_set_cntvoff,
__KVM_HOST_SMCCC_FUNC___vgic_v3_save_vmcr_aprs, __KVM_HOST_SMCCC_FUNC___vgic_v3_save_vmcr_aprs,
__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs, __KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
__KVM_HOST_SMCCC_FUNC___pkvm_reserve_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_unreserve_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vm, __KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu, __KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
__KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm, __KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm,

View File

@ -220,6 +220,20 @@ static inline bool vcpu_el2_tge_is_set(const struct kvm_vcpu *vcpu)
static inline bool vcpu_el2_amo_is_set(const struct kvm_vcpu *vcpu) static inline bool vcpu_el2_amo_is_set(const struct kvm_vcpu *vcpu)
{ {
/*
* DDI0487L.b Known Issue D22105
*
* When executing at EL2 and HCR_EL2.{E2H,TGE} = {1, 0} it is
* IMPLEMENTATION DEFINED whether the effective value of HCR_EL2.AMO
* is the value programmed or 1.
*
* Make the implementation choice of treating the effective value as 1 as
* we cannot subsequently catch changes to TGE or AMO that would
* otherwise lead to the SError becoming deliverable.
*/
if (vcpu_is_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu) && !vcpu_el2_tge_is_set(vcpu))
return true;
return ctxt_sys_reg(&vcpu->arch.ctxt, HCR_EL2) & HCR_AMO; return ctxt_sys_reg(&vcpu->arch.ctxt, HCR_EL2) & HCR_AMO;
} }
@ -511,21 +525,29 @@ static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
if (vcpu_mode_is_32bit(vcpu)) { if (vcpu_mode_is_32bit(vcpu)) {
*vcpu_cpsr(vcpu) |= PSR_AA32_E_BIT; *vcpu_cpsr(vcpu) |= PSR_AA32_E_BIT;
} else { } else {
u64 sctlr = vcpu_read_sys_reg(vcpu, SCTLR_EL1); enum vcpu_sysreg r;
u64 sctlr;
r = vcpu_has_nv(vcpu) ? SCTLR_EL2 : SCTLR_EL1;
sctlr = vcpu_read_sys_reg(vcpu, r);
sctlr |= SCTLR_ELx_EE; sctlr |= SCTLR_ELx_EE;
vcpu_write_sys_reg(vcpu, sctlr, SCTLR_EL1); vcpu_write_sys_reg(vcpu, sctlr, r);
} }
} }
static inline bool kvm_vcpu_is_be(struct kvm_vcpu *vcpu) static inline bool kvm_vcpu_is_be(struct kvm_vcpu *vcpu)
{ {
enum vcpu_sysreg r;
u64 bit;
if (vcpu_mode_is_32bit(vcpu)) if (vcpu_mode_is_32bit(vcpu))
return !!(*vcpu_cpsr(vcpu) & PSR_AA32_E_BIT); return !!(*vcpu_cpsr(vcpu) & PSR_AA32_E_BIT);
if (vcpu_mode_priv(vcpu)) r = is_hyp_ctxt(vcpu) ? SCTLR_EL2 : SCTLR_EL1;
return !!(vcpu_read_sys_reg(vcpu, SCTLR_EL1) & SCTLR_ELx_EE); bit = vcpu_mode_priv(vcpu) ? SCTLR_ELx_EE : SCTLR_EL1_E0E;
else
return !!(vcpu_read_sys_reg(vcpu, SCTLR_EL1) & SCTLR_EL1_E0E); return vcpu_read_sys_reg(vcpu, r) & bit;
} }
static inline unsigned long vcpu_data_guest_to_host(struct kvm_vcpu *vcpu, static inline unsigned long vcpu_data_guest_to_host(struct kvm_vcpu *vcpu,

View File

@ -252,7 +252,8 @@ struct kvm_protected_vm {
pkvm_handle_t handle; pkvm_handle_t handle;
struct kvm_hyp_memcache teardown_mc; struct kvm_hyp_memcache teardown_mc;
struct kvm_hyp_memcache stage2_teardown_mc; struct kvm_hyp_memcache stage2_teardown_mc;
bool enabled; bool is_protected;
bool is_created;
}; };
struct kvm_mpidr_data { struct kvm_mpidr_data {
@ -1442,7 +1443,7 @@ struct kvm *kvm_arch_alloc_vm(void);
#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE #define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE
#define kvm_vm_is_protected(kvm) (is_protected_kvm_enabled() && (kvm)->arch.pkvm.enabled) #define kvm_vm_is_protected(kvm) (is_protected_kvm_enabled() && (kvm)->arch.pkvm.is_protected)
#define vcpu_is_protected(vcpu) kvm_vm_is_protected((vcpu)->kvm) #define vcpu_is_protected(vcpu) kvm_vm_is_protected((vcpu)->kvm)

View File

@ -83,6 +83,8 @@ extern void check_nested_vcpu_requests(struct kvm_vcpu *vcpu);
extern void kvm_nested_flush_hwstate(struct kvm_vcpu *vcpu); extern void kvm_nested_flush_hwstate(struct kvm_vcpu *vcpu);
extern void kvm_nested_sync_hwstate(struct kvm_vcpu *vcpu); extern void kvm_nested_sync_hwstate(struct kvm_vcpu *vcpu);
extern void kvm_nested_setup_mdcr_el2(struct kvm_vcpu *vcpu);
struct kvm_s2_trans { struct kvm_s2_trans {
phys_addr_t output; phys_addr_t output;
unsigned long block_size; unsigned long block_size;
@ -265,7 +267,7 @@ static inline u64 decode_range_tlbi(u64 val, u64 *range, u16 *asid)
return base; return base;
} }
static inline unsigned int ps_to_output_size(unsigned int ps) static inline unsigned int ps_to_output_size(unsigned int ps, bool pa52bit)
{ {
switch (ps) { switch (ps) {
case 0: return 32; case 0: return 32;
@ -273,7 +275,10 @@ static inline unsigned int ps_to_output_size(unsigned int ps)
case 2: return 40; case 2: return 40;
case 3: return 42; case 3: return 42;
case 4: return 44; case 4: return 44;
case 5: case 5: return 48;
case 6: if (pa52bit)
return 52;
fallthrough;
default: default:
return 48; return 48;
} }
@ -285,13 +290,28 @@ enum trans_regime {
TR_EL2, TR_EL2,
}; };
struct s1_walk_info;
struct s1_walk_context {
struct s1_walk_info *wi;
u64 table_ipa;
int level;
};
struct s1_walk_filter {
int (*fn)(struct s1_walk_context *, void *);
void *priv;
};
struct s1_walk_info { struct s1_walk_info {
struct s1_walk_filter *filter;
u64 baddr; u64 baddr;
enum trans_regime regime; enum trans_regime regime;
unsigned int max_oa_bits; unsigned int max_oa_bits;
unsigned int pgshift; unsigned int pgshift;
unsigned int txsz; unsigned int txsz;
int sl; int sl;
u8 sh;
bool as_el0; bool as_el0;
bool hpd; bool hpd;
bool e0poe; bool e0poe;
@ -299,6 +319,7 @@ struct s1_walk_info {
bool pan; bool pan;
bool be; bool be;
bool s2; bool s2;
bool pa52bit;
}; };
struct s1_walk_result { struct s1_walk_result {
@ -334,6 +355,8 @@ struct s1_walk_result {
int __kvm_translate_va(struct kvm_vcpu *vcpu, struct s1_walk_info *wi, int __kvm_translate_va(struct kvm_vcpu *vcpu, struct s1_walk_info *wi,
struct s1_walk_result *wr, u64 va); struct s1_walk_result *wr, u64 va);
int __kvm_find_s1_desc_level(struct kvm_vcpu *vcpu, u64 va, u64 ipa,
int *level);
/* VNCR management */ /* VNCR management */
int kvm_vcpu_allocate_vncr_tlb(struct kvm_vcpu *vcpu); int kvm_vcpu_allocate_vncr_tlb(struct kvm_vcpu *vcpu);

View File

@ -18,6 +18,7 @@
int pkvm_init_host_vm(struct kvm *kvm); int pkvm_init_host_vm(struct kvm *kvm);
int pkvm_create_hyp_vm(struct kvm *kvm); int pkvm_create_hyp_vm(struct kvm *kvm);
bool pkvm_hyp_vm_is_created(struct kvm *kvm);
void pkvm_destroy_hyp_vm(struct kvm *kvm); void pkvm_destroy_hyp_vm(struct kvm *kvm);
int pkvm_create_hyp_vcpu(struct kvm_vcpu *vcpu); int pkvm_create_hyp_vcpu(struct kvm_vcpu *vcpu);

View File

@ -36,6 +36,7 @@ int kasan_brk_handler(struct pt_regs *regs, unsigned long esr);
int ubsan_brk_handler(struct pt_regs *regs, unsigned long esr); int ubsan_brk_handler(struct pt_regs *regs, unsigned long esr);
int early_brk64(unsigned long addr, unsigned long esr, struct pt_regs *regs); int early_brk64(unsigned long addr, unsigned long esr, struct pt_regs *regs);
void dump_kernel_instr(unsigned long kaddr);
/* /*
* Move regs->pc to next instruction and do necessary setup before it * Move regs->pc to next instruction and do necessary setup before it

View File

@ -94,6 +94,8 @@
#define VNCR_PMSICR_EL1 0x838 #define VNCR_PMSICR_EL1 0x838
#define VNCR_PMSIRR_EL1 0x840 #define VNCR_PMSIRR_EL1 0x840
#define VNCR_PMSLATFR_EL1 0x848 #define VNCR_PMSLATFR_EL1 0x848
#define VNCR_PMSNEVFR_EL1 0x850
#define VNCR_PMSDSFR_EL1 0x858
#define VNCR_TRFCR_EL1 0x880 #define VNCR_TRFCR_EL1 0x880
#define VNCR_MPAM1_EL1 0x900 #define VNCR_MPAM1_EL1 0x900
#define VNCR_MPAMHCR_EL2 0x930 #define VNCR_MPAMHCR_EL2 0x930

View File

@ -2550,6 +2550,15 @@ test_has_mpam_hcr(const struct arm64_cpu_capabilities *entry, int scope)
return idr & MPAMIDR_EL1_HAS_HCR; return idr & MPAMIDR_EL1_HAS_HCR;
} }
static bool
test_has_gicv5_legacy(const struct arm64_cpu_capabilities *entry, int scope)
{
if (!this_cpu_has_cap(ARM64_HAS_GICV5_CPUIF))
return false;
return !!(read_sysreg_s(SYS_ICC_IDR0_EL1) & ICC_IDR0_EL1_GCIE_LEGACY);
}
static const struct arm64_cpu_capabilities arm64_features[] = { static const struct arm64_cpu_capabilities arm64_features[] = {
{ {
.capability = ARM64_ALWAYS_BOOT, .capability = ARM64_ALWAYS_BOOT,
@ -3167,6 +3176,12 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
.matches = has_cpuid_feature, .matches = has_cpuid_feature,
ARM64_CPUID_FIELDS(ID_AA64PFR2_EL1, GCIE, IMP) ARM64_CPUID_FIELDS(ID_AA64PFR2_EL1, GCIE, IMP)
}, },
{
.desc = "GICv5 Legacy vCPU interface",
.type = ARM64_CPUCAP_EARLY_LOCAL_CPU_FEATURE,
.capability = ARM64_HAS_GICV5_LEGACY,
.matches = test_has_gicv5_legacy,
},
{}, {},
}; };

View File

@ -105,6 +105,9 @@ KVM_NVHE_ALIAS(__hyp_stub_vectors);
KVM_NVHE_ALIAS(vgic_v2_cpuif_trap); KVM_NVHE_ALIAS(vgic_v2_cpuif_trap);
KVM_NVHE_ALIAS(vgic_v3_cpuif_trap); KVM_NVHE_ALIAS(vgic_v3_cpuif_trap);
/* Static key indicating whether GICv3 has GICv2 compatibility */
KVM_NVHE_ALIAS(vgic_v3_has_v2_compat);
/* Static key which is set if CNTVOFF_EL2 is unusable */ /* Static key which is set if CNTVOFF_EL2 is unusable */
KVM_NVHE_ALIAS(broken_cntvoff_key); KVM_NVHE_ALIAS(broken_cntvoff_key);

View File

@ -149,19 +149,18 @@ pstate_check_t * const aarch32_opcode_cond_checks[16] = {
int show_unhandled_signals = 0; int show_unhandled_signals = 0;
static void dump_kernel_instr(const char *lvl, struct pt_regs *regs) void dump_kernel_instr(unsigned long kaddr)
{ {
unsigned long addr = instruction_pointer(regs);
char str[sizeof("00000000 ") * 5 + 2 + 1], *p = str; char str[sizeof("00000000 ") * 5 + 2 + 1], *p = str;
int i; int i;
if (user_mode(regs)) if (!is_ttbr1_addr(kaddr))
return; return;
for (i = -4; i < 1; i++) { for (i = -4; i < 1; i++) {
unsigned int val, bad; unsigned int val, bad;
bad = aarch64_insn_read(&((u32 *)addr)[i], &val); bad = aarch64_insn_read(&((u32 *)kaddr)[i], &val);
if (!bad) if (!bad)
p += sprintf(p, i == 0 ? "(%08x) " : "%08x ", val); p += sprintf(p, i == 0 ? "(%08x) " : "%08x ", val);
@ -169,7 +168,7 @@ static void dump_kernel_instr(const char *lvl, struct pt_regs *regs)
p += sprintf(p, i == 0 ? "(????????) " : "???????? "); p += sprintf(p, i == 0 ? "(????????) " : "???????? ");
} }
printk("%sCode: %s\n", lvl, str); printk(KERN_EMERG "Code: %s\n", str);
} }
#define S_SMP " SMP" #define S_SMP " SMP"
@ -178,6 +177,7 @@ static int __die(const char *str, long err, struct pt_regs *regs)
{ {
static int die_counter; static int die_counter;
int ret; int ret;
unsigned long addr = instruction_pointer(regs);
pr_emerg("Internal error: %s: %016lx [#%d] " S_SMP "\n", pr_emerg("Internal error: %s: %016lx [#%d] " S_SMP "\n",
str, err, ++die_counter); str, err, ++die_counter);
@ -190,7 +190,10 @@ static int __die(const char *str, long err, struct pt_regs *regs)
print_modules(); print_modules();
show_regs(regs); show_regs(regs);
dump_kernel_instr(KERN_EMERG, regs); if (user_mode(regs))
return ret;
dump_kernel_instr(addr);
return ret; return ret;
} }

View File

@ -37,6 +37,7 @@ menuconfig KVM
select HAVE_KVM_VCPU_RUN_PID_CHANGE select HAVE_KVM_VCPU_RUN_PID_CHANGE
select SCHED_INFO select SCHED_INFO
select GUEST_PERF_EVENTS if PERF_EVENTS select GUEST_PERF_EVENTS if PERF_EVENTS
select KVM_GUEST_MEMFD
help help
Support hosting virtualized guest machines. Support hosting virtualized guest machines.

View File

@ -170,10 +170,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
if (ret) if (ret)
return ret; return ret;
ret = pkvm_init_host_vm(kvm);
if (ret)
goto err_unshare_kvm;
if (!zalloc_cpumask_var(&kvm->arch.supported_cpus, GFP_KERNEL_ACCOUNT)) { if (!zalloc_cpumask_var(&kvm->arch.supported_cpus, GFP_KERNEL_ACCOUNT)) {
ret = -ENOMEM; ret = -ENOMEM;
goto err_unshare_kvm; goto err_unshare_kvm;
@ -184,6 +180,16 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
if (ret) if (ret)
goto err_free_cpumask; goto err_free_cpumask;
if (is_protected_kvm_enabled()) {
/*
* If any failures occur after this is successful, make sure to
* call __pkvm_unreserve_vm to unreserve the VM in hyp.
*/
ret = pkvm_init_host_vm(kvm);
if (ret)
goto err_free_cpumask;
}
kvm_vgic_early_init(kvm); kvm_vgic_early_init(kvm);
kvm_timer_init_vm(kvm); kvm_timer_init_vm(kvm);
@ -2317,8 +2323,9 @@ static int __init init_subsystems(void)
} }
if (kvm_mode == KVM_MODE_NV && if (kvm_mode == KVM_MODE_NV &&
!(vgic_present && kvm_vgic_global_state.type == VGIC_V3)) { !(vgic_present && (kvm_vgic_global_state.type == VGIC_V3 ||
kvm_err("NV support requires GICv3, giving up\n"); kvm_vgic_global_state.has_gcie_v3_compat))) {
kvm_err("NV support requires GICv3 or GICv5 with legacy support, giving up\n");
err = -EINVAL; err = -EINVAL;
goto out; goto out;
} }

View File

@ -28,9 +28,57 @@ static int get_ia_size(struct s1_walk_info *wi)
/* Return true if the IPA is out of the OA range */ /* Return true if the IPA is out of the OA range */
static bool check_output_size(u64 ipa, struct s1_walk_info *wi) static bool check_output_size(u64 ipa, struct s1_walk_info *wi)
{ {
if (wi->pa52bit)
return wi->max_oa_bits < 52 && (ipa & GENMASK_ULL(51, wi->max_oa_bits));
return wi->max_oa_bits < 48 && (ipa & GENMASK_ULL(47, wi->max_oa_bits)); return wi->max_oa_bits < 48 && (ipa & GENMASK_ULL(47, wi->max_oa_bits));
} }
static bool has_52bit_pa(struct kvm_vcpu *vcpu, struct s1_walk_info *wi, u64 tcr)
{
switch (BIT(wi->pgshift)) {
case SZ_64K:
default: /* IMPDEF: treat any other value as 64k */
if (!kvm_has_feat_enum(vcpu->kvm, ID_AA64MMFR0_EL1, PARANGE, 52))
return false;
return ((wi->regime == TR_EL2 ?
FIELD_GET(TCR_EL2_PS_MASK, tcr) :
FIELD_GET(TCR_IPS_MASK, tcr)) == 0b0110);
case SZ_16K:
if (!kvm_has_feat(vcpu->kvm, ID_AA64MMFR0_EL1, TGRAN16, 52_BIT))
return false;
break;
case SZ_4K:
if (!kvm_has_feat(vcpu->kvm, ID_AA64MMFR0_EL1, TGRAN4, 52_BIT))
return false;
break;
}
return (tcr & (wi->regime == TR_EL2 ? TCR_EL2_DS : TCR_DS));
}
static u64 desc_to_oa(struct s1_walk_info *wi, u64 desc)
{
u64 addr;
if (!wi->pa52bit)
return desc & GENMASK_ULL(47, wi->pgshift);
switch (BIT(wi->pgshift)) {
case SZ_4K:
case SZ_16K:
addr = desc & GENMASK_ULL(49, wi->pgshift);
addr |= FIELD_GET(KVM_PTE_ADDR_51_50_LPA2, desc) << 50;
break;
case SZ_64K:
default: /* IMPDEF: treat any other value as 64k */
addr = desc & GENMASK_ULL(47, wi->pgshift);
addr |= FIELD_GET(KVM_PTE_ADDR_51_48, desc) << 48;
break;
}
return addr;
}
/* Return the translation regime that applies to an AT instruction */ /* Return the translation regime that applies to an AT instruction */
static enum trans_regime compute_translation_regime(struct kvm_vcpu *vcpu, u32 op) static enum trans_regime compute_translation_regime(struct kvm_vcpu *vcpu, u32 op)
{ {
@ -50,21 +98,26 @@ static enum trans_regime compute_translation_regime(struct kvm_vcpu *vcpu, u32 o
} }
} }
static u64 effective_tcr2(struct kvm_vcpu *vcpu, enum trans_regime regime)
{
if (regime == TR_EL10) {
if (vcpu_has_nv(vcpu) &&
!(__vcpu_sys_reg(vcpu, HCRX_EL2) & HCRX_EL2_TCR2En))
return 0;
return vcpu_read_sys_reg(vcpu, TCR2_EL1);
}
return vcpu_read_sys_reg(vcpu, TCR2_EL2);
}
static bool s1pie_enabled(struct kvm_vcpu *vcpu, enum trans_regime regime) static bool s1pie_enabled(struct kvm_vcpu *vcpu, enum trans_regime regime)
{ {
if (!kvm_has_s1pie(vcpu->kvm)) if (!kvm_has_s1pie(vcpu->kvm))
return false; return false;
switch (regime) { /* Abuse TCR2_EL1_PIE and use it for EL2 as well */
case TR_EL2: return effective_tcr2(vcpu, regime) & TCR2_EL1_PIE;
case TR_EL20:
return vcpu_read_sys_reg(vcpu, TCR2_EL2) & TCR2_EL2_PIE;
case TR_EL10:
return (__vcpu_sys_reg(vcpu, HCRX_EL2) & HCRX_EL2_TCR2En) &&
(__vcpu_sys_reg(vcpu, TCR2_EL1) & TCR2_EL1_PIE);
default:
BUG();
}
} }
static void compute_s1poe(struct kvm_vcpu *vcpu, struct s1_walk_info *wi) static void compute_s1poe(struct kvm_vcpu *vcpu, struct s1_walk_info *wi)
@ -76,23 +129,11 @@ static void compute_s1poe(struct kvm_vcpu *vcpu, struct s1_walk_info *wi)
return; return;
} }
switch (wi->regime) { val = effective_tcr2(vcpu, wi->regime);
case TR_EL2:
case TR_EL20:
val = vcpu_read_sys_reg(vcpu, TCR2_EL2);
wi->poe = val & TCR2_EL2_POE;
wi->e0poe = (wi->regime == TR_EL20) && (val & TCR2_EL2_E0POE);
break;
case TR_EL10:
if (__vcpu_sys_reg(vcpu, HCRX_EL2) & HCRX_EL2_TCR2En) {
wi->poe = wi->e0poe = false;
return;
}
val = __vcpu_sys_reg(vcpu, TCR2_EL1); /* Abuse TCR2_EL1_* for EL2 */
wi->poe = val & TCR2_EL1_POE; wi->poe = val & TCR2_EL1_POE;
wi->e0poe = val & TCR2_EL1_E0POE; wi->e0poe = (wi->regime != TR_EL2) && (val & TCR2_EL1_E0POE);
}
} }
static int setup_s1_walk(struct kvm_vcpu *vcpu, struct s1_walk_info *wi, static int setup_s1_walk(struct kvm_vcpu *vcpu, struct s1_walk_info *wi,
@ -102,14 +143,16 @@ static int setup_s1_walk(struct kvm_vcpu *vcpu, struct s1_walk_info *wi,
unsigned int stride, x; unsigned int stride, x;
bool va55, tbi, lva; bool va55, tbi, lva;
hcr = __vcpu_sys_reg(vcpu, HCR_EL2);
va55 = va & BIT(55); va55 = va & BIT(55);
if (wi->regime == TR_EL2 && va55) if (vcpu_has_nv(vcpu)) {
goto addrsz; hcr = __vcpu_sys_reg(vcpu, HCR_EL2);
wi->s2 = wi->regime == TR_EL10 && (hcr & (HCR_VM | HCR_DC));
wi->s2 = wi->regime == TR_EL10 && (hcr & (HCR_VM | HCR_DC)); } else {
WARN_ON_ONCE(wi->regime != TR_EL10);
wi->s2 = false;
hcr = 0;
}
switch (wi->regime) { switch (wi->regime) {
case TR_EL10: case TR_EL10:
@ -131,6 +174,46 @@ static int setup_s1_walk(struct kvm_vcpu *vcpu, struct s1_walk_info *wi,
BUG(); BUG();
} }
/* Someone was silly enough to encode TG0/TG1 differently */
if (va55 && wi->regime != TR_EL2) {
wi->txsz = FIELD_GET(TCR_T1SZ_MASK, tcr);
tg = FIELD_GET(TCR_TG1_MASK, tcr);
switch (tg << TCR_TG1_SHIFT) {
case TCR_TG1_4K:
wi->pgshift = 12; break;
case TCR_TG1_16K:
wi->pgshift = 14; break;
case TCR_TG1_64K:
default: /* IMPDEF: treat any other value as 64k */
wi->pgshift = 16; break;
}
} else {
wi->txsz = FIELD_GET(TCR_T0SZ_MASK, tcr);
tg = FIELD_GET(TCR_TG0_MASK, tcr);
switch (tg << TCR_TG0_SHIFT) {
case TCR_TG0_4K:
wi->pgshift = 12; break;
case TCR_TG0_16K:
wi->pgshift = 14; break;
case TCR_TG0_64K:
default: /* IMPDEF: treat any other value as 64k */
wi->pgshift = 16; break;
}
}
wi->pa52bit = has_52bit_pa(vcpu, wi, tcr);
ia_bits = get_ia_size(wi);
/* AArch64.S1StartLevel() */
stride = wi->pgshift - 3;
wi->sl = 3 - (((ia_bits - 1) - wi->pgshift) / stride);
if (wi->regime == TR_EL2 && va55)
goto addrsz;
tbi = (wi->regime == TR_EL2 ? tbi = (wi->regime == TR_EL2 ?
FIELD_GET(TCR_EL2_TBI, tcr) : FIELD_GET(TCR_EL2_TBI, tcr) :
(va55 ? (va55 ?
@ -140,6 +223,12 @@ static int setup_s1_walk(struct kvm_vcpu *vcpu, struct s1_walk_info *wi,
if (!tbi && (u64)sign_extend64(va, 55) != va) if (!tbi && (u64)sign_extend64(va, 55) != va)
goto addrsz; goto addrsz;
wi->sh = (wi->regime == TR_EL2 ?
FIELD_GET(TCR_EL2_SH0_MASK, tcr) :
(va55 ?
FIELD_GET(TCR_SH1_MASK, tcr) :
FIELD_GET(TCR_SH0_MASK, tcr)));
va = (u64)sign_extend64(va, 55); va = (u64)sign_extend64(va, 55);
/* Let's put the MMU disabled case aside immediately */ /* Let's put the MMU disabled case aside immediately */
@ -194,53 +283,20 @@ static int setup_s1_walk(struct kvm_vcpu *vcpu, struct s1_walk_info *wi,
/* R_BVXDG */ /* R_BVXDG */
wi->hpd |= (wi->poe || wi->e0poe); wi->hpd |= (wi->poe || wi->e0poe);
/* Someone was silly enough to encode TG0/TG1 differently */
if (va55) {
wi->txsz = FIELD_GET(TCR_T1SZ_MASK, tcr);
tg = FIELD_GET(TCR_TG1_MASK, tcr);
switch (tg << TCR_TG1_SHIFT) {
case TCR_TG1_4K:
wi->pgshift = 12; break;
case TCR_TG1_16K:
wi->pgshift = 14; break;
case TCR_TG1_64K:
default: /* IMPDEF: treat any other value as 64k */
wi->pgshift = 16; break;
}
} else {
wi->txsz = FIELD_GET(TCR_T0SZ_MASK, tcr);
tg = FIELD_GET(TCR_TG0_MASK, tcr);
switch (tg << TCR_TG0_SHIFT) {
case TCR_TG0_4K:
wi->pgshift = 12; break;
case TCR_TG0_16K:
wi->pgshift = 14; break;
case TCR_TG0_64K:
default: /* IMPDEF: treat any other value as 64k */
wi->pgshift = 16; break;
}
}
/* R_PLCGL, R_YXNYW */ /* R_PLCGL, R_YXNYW */
if (!kvm_has_feat_enum(vcpu->kvm, ID_AA64MMFR2_EL1, ST, 48_47)) { if (!kvm_has_feat_enum(vcpu->kvm, ID_AA64MMFR2_EL1, ST, 48_47)) {
if (wi->txsz > 39) if (wi->txsz > 39)
goto transfault_l0; goto transfault;
} else { } else {
if (wi->txsz > 48 || (BIT(wi->pgshift) == SZ_64K && wi->txsz > 47)) if (wi->txsz > 48 || (BIT(wi->pgshift) == SZ_64K && wi->txsz > 47))
goto transfault_l0; goto transfault;
} }
/* R_GTJBY, R_SXWGM */ /* R_GTJBY, R_SXWGM */
switch (BIT(wi->pgshift)) { switch (BIT(wi->pgshift)) {
case SZ_4K: case SZ_4K:
lva = kvm_has_feat(vcpu->kvm, ID_AA64MMFR0_EL1, TGRAN4, 52_BIT);
lva &= tcr & (wi->regime == TR_EL2 ? TCR_EL2_DS : TCR_DS);
break;
case SZ_16K: case SZ_16K:
lva = kvm_has_feat(vcpu->kvm, ID_AA64MMFR0_EL1, TGRAN16, 52_BIT); lva = wi->pa52bit;
lva &= tcr & (wi->regime == TR_EL2 ? TCR_EL2_DS : TCR_DS);
break; break;
case SZ_64K: case SZ_64K:
lva = kvm_has_feat(vcpu->kvm, ID_AA64MMFR2_EL1, VARange, 52); lva = kvm_has_feat(vcpu->kvm, ID_AA64MMFR2_EL1, VARange, 52);
@ -248,38 +304,42 @@ static int setup_s1_walk(struct kvm_vcpu *vcpu, struct s1_walk_info *wi,
} }
if ((lva && wi->txsz < 12) || (!lva && wi->txsz < 16)) if ((lva && wi->txsz < 12) || (!lva && wi->txsz < 16))
goto transfault_l0; goto transfault;
ia_bits = get_ia_size(wi);
/* R_YYVYV, I_THCZK */ /* R_YYVYV, I_THCZK */
if ((!va55 && va > GENMASK(ia_bits - 1, 0)) || if ((!va55 && va > GENMASK(ia_bits - 1, 0)) ||
(va55 && va < GENMASK(63, ia_bits))) (va55 && va < GENMASK(63, ia_bits)))
goto transfault_l0; goto transfault;
/* I_ZFSYQ */ /* I_ZFSYQ */
if (wi->regime != TR_EL2 && if (wi->regime != TR_EL2 &&
(tcr & (va55 ? TCR_EPD1_MASK : TCR_EPD0_MASK))) (tcr & (va55 ? TCR_EPD1_MASK : TCR_EPD0_MASK)))
goto transfault_l0; goto transfault;
/* R_BNDVG and following statements */ /* R_BNDVG and following statements */
if (kvm_has_feat(vcpu->kvm, ID_AA64MMFR2_EL1, E0PD, IMP) && if (kvm_has_feat(vcpu->kvm, ID_AA64MMFR2_EL1, E0PD, IMP) &&
wi->as_el0 && (tcr & (va55 ? TCR_E0PD1 : TCR_E0PD0))) wi->as_el0 && (tcr & (va55 ? TCR_E0PD1 : TCR_E0PD0)))
goto transfault_l0; goto transfault;
/* AArch64.S1StartLevel() */
stride = wi->pgshift - 3;
wi->sl = 3 - (((ia_bits - 1) - wi->pgshift) / stride);
ps = (wi->regime == TR_EL2 ? ps = (wi->regime == TR_EL2 ?
FIELD_GET(TCR_EL2_PS_MASK, tcr) : FIELD_GET(TCR_IPS_MASK, tcr)); FIELD_GET(TCR_EL2_PS_MASK, tcr) : FIELD_GET(TCR_IPS_MASK, tcr));
wi->max_oa_bits = min(get_kvm_ipa_limit(), ps_to_output_size(ps)); wi->max_oa_bits = min(get_kvm_ipa_limit(), ps_to_output_size(ps, wi->pa52bit));
/* Compute minimal alignment */ /* Compute minimal alignment */
x = 3 + ia_bits - ((3 - wi->sl) * stride + wi->pgshift); x = 3 + ia_bits - ((3 - wi->sl) * stride + wi->pgshift);
wi->baddr = ttbr & TTBRx_EL1_BADDR; wi->baddr = ttbr & TTBRx_EL1_BADDR;
if (wi->pa52bit) {
/*
* Force the alignment on 64 bytes for top-level tables
* smaller than 8 entries, since TTBR.BADDR[5:2] are used to
* store bits [51:48] of the first level of lookup.
*/
x = max(x, 6);
wi->baddr |= FIELD_GET(GENMASK_ULL(5, 2), ttbr) << 48;
}
/* R_VPBBF */ /* R_VPBBF */
if (check_output_size(wi->baddr, wi)) if (check_output_size(wi->baddr, wi))
@ -289,12 +349,17 @@ static int setup_s1_walk(struct kvm_vcpu *vcpu, struct s1_walk_info *wi,
return 0; return 0;
addrsz: /* Address Size Fault level 0 */ addrsz:
/*
* Address Size Fault level 0 to indicate it comes from TTBR.
* yes, this is an oddity.
*/
fail_s1_walk(wr, ESR_ELx_FSC_ADDRSZ_L(0), false); fail_s1_walk(wr, ESR_ELx_FSC_ADDRSZ_L(0), false);
return -EFAULT; return -EFAULT;
transfault_l0: /* Translation Fault level 0 */ transfault:
fail_s1_walk(wr, ESR_ELx_FSC_FAULT_L(0), false); /* Translation Fault on start level */
fail_s1_walk(wr, ESR_ELx_FSC_FAULT_L(wi->sl), false);
return -EFAULT; return -EFAULT;
} }
@ -339,6 +404,17 @@ static int walk_s1(struct kvm_vcpu *vcpu, struct s1_walk_info *wi,
ipa = kvm_s2_trans_output(&s2_trans); ipa = kvm_s2_trans_output(&s2_trans);
} }
if (wi->filter) {
ret = wi->filter->fn(&(struct s1_walk_context)
{
.wi = wi,
.table_ipa = baddr,
.level = level,
}, wi->filter->priv);
if (ret)
return ret;
}
ret = kvm_read_guest(vcpu->kvm, ipa, &desc, sizeof(desc)); ret = kvm_read_guest(vcpu->kvm, ipa, &desc, sizeof(desc));
if (ret) { if (ret) {
fail_s1_walk(wr, ESR_ELx_FSC_SEA_TTW(level), false); fail_s1_walk(wr, ESR_ELx_FSC_SEA_TTW(level), false);
@ -369,7 +445,7 @@ static int walk_s1(struct kvm_vcpu *vcpu, struct s1_walk_info *wi,
wr->PXNTable |= FIELD_GET(PMD_TABLE_PXN, desc); wr->PXNTable |= FIELD_GET(PMD_TABLE_PXN, desc);
} }
baddr = desc & GENMASK_ULL(47, wi->pgshift); baddr = desc_to_oa(wi, desc);
/* Check for out-of-range OA */ /* Check for out-of-range OA */
if (check_output_size(baddr, wi)) if (check_output_size(baddr, wi))
@ -386,11 +462,11 @@ static int walk_s1(struct kvm_vcpu *vcpu, struct s1_walk_info *wi,
switch (BIT(wi->pgshift)) { switch (BIT(wi->pgshift)) {
case SZ_4K: case SZ_4K:
valid_block = level == 1 || level == 2; valid_block = level == 1 || level == 2 || (wi->pa52bit && level == 0);
break; break;
case SZ_16K: case SZ_16K:
case SZ_64K: case SZ_64K:
valid_block = level == 2; valid_block = level == 2 || (wi->pa52bit && level == 1);
break; break;
} }
@ -398,7 +474,8 @@ static int walk_s1(struct kvm_vcpu *vcpu, struct s1_walk_info *wi,
goto transfault; goto transfault;
} }
if (check_output_size(desc & GENMASK(47, va_bottom), wi)) baddr = desc_to_oa(wi, desc);
if (check_output_size(baddr & GENMASK(52, va_bottom), wi))
goto addrsz; goto addrsz;
if (!(desc & PTE_AF)) { if (!(desc & PTE_AF)) {
@ -411,7 +488,7 @@ static int walk_s1(struct kvm_vcpu *vcpu, struct s1_walk_info *wi,
wr->failed = false; wr->failed = false;
wr->level = level; wr->level = level;
wr->desc = desc; wr->desc = desc;
wr->pa = desc & GENMASK(47, va_bottom); wr->pa = baddr & GENMASK(52, va_bottom);
wr->pa |= va & GENMASK_ULL(va_bottom - 1, 0); wr->pa |= va & GENMASK_ULL(va_bottom - 1, 0);
wr->nG = (wi->regime != TR_EL2) && (desc & PTE_NG); wr->nG = (wi->regime != TR_EL2) && (desc & PTE_NG);
@ -640,21 +717,36 @@ static u8 combine_s1_s2_attr(u8 s1, u8 s2)
#define ATTR_OSH 0b10 #define ATTR_OSH 0b10
#define ATTR_ISH 0b11 #define ATTR_ISH 0b11
static u8 compute_sh(u8 attr, u64 desc) static u8 compute_final_sh(u8 attr, u8 sh)
{ {
u8 sh;
/* Any form of device, as well as NC has SH[1:0]=0b10 */ /* Any form of device, as well as NC has SH[1:0]=0b10 */
if (MEMATTR_IS_DEVICE(attr) || attr == MEMATTR(NC, NC)) if (MEMATTR_IS_DEVICE(attr) || attr == MEMATTR(NC, NC))
return ATTR_OSH; return ATTR_OSH;
sh = FIELD_GET(PTE_SHARED, desc);
if (sh == ATTR_RSV) /* Reserved, mapped to NSH */ if (sh == ATTR_RSV) /* Reserved, mapped to NSH */
sh = ATTR_NSH; sh = ATTR_NSH;
return sh; return sh;
} }
static u8 compute_s1_sh(struct s1_walk_info *wi, struct s1_walk_result *wr,
u8 attr)
{
u8 sh;
/*
* non-52bit and LPA have their basic shareability described in the
* descriptor. LPA2 gets it from the corresponding field in TCR,
* conveniently recorded in the walk info.
*/
if (!wi->pa52bit || BIT(wi->pgshift) == SZ_64K)
sh = FIELD_GET(KVM_PTE_LEAF_ATTR_LO_S1_SH, wr->desc);
else
sh = wi->sh;
return compute_final_sh(attr, sh);
}
static u8 combine_sh(u8 s1_sh, u8 s2_sh) static u8 combine_sh(u8 s1_sh, u8 s2_sh)
{ {
if (s1_sh == ATTR_OSH || s2_sh == ATTR_OSH) if (s1_sh == ATTR_OSH || s2_sh == ATTR_OSH)
@ -668,7 +760,7 @@ static u8 combine_sh(u8 s1_sh, u8 s2_sh)
static u64 compute_par_s12(struct kvm_vcpu *vcpu, u64 s1_par, static u64 compute_par_s12(struct kvm_vcpu *vcpu, u64 s1_par,
struct kvm_s2_trans *tr) struct kvm_s2_trans *tr)
{ {
u8 s1_parattr, s2_memattr, final_attr; u8 s1_parattr, s2_memattr, final_attr, s2_sh;
u64 par; u64 par;
/* If S2 has failed to translate, report the damage */ /* If S2 has failed to translate, report the damage */
@ -741,17 +833,19 @@ static u64 compute_par_s12(struct kvm_vcpu *vcpu, u64 s1_par,
!MEMATTR_IS_DEVICE(final_attr)) !MEMATTR_IS_DEVICE(final_attr))
final_attr = MEMATTR(NC, NC); final_attr = MEMATTR(NC, NC);
s2_sh = FIELD_GET(KVM_PTE_LEAF_ATTR_LO_S2_SH, tr->desc);
par = FIELD_PREP(SYS_PAR_EL1_ATTR, final_attr); par = FIELD_PREP(SYS_PAR_EL1_ATTR, final_attr);
par |= tr->output & GENMASK(47, 12); par |= tr->output & GENMASK(47, 12);
par |= FIELD_PREP(SYS_PAR_EL1_SH, par |= FIELD_PREP(SYS_PAR_EL1_SH,
combine_sh(FIELD_GET(SYS_PAR_EL1_SH, s1_par), combine_sh(FIELD_GET(SYS_PAR_EL1_SH, s1_par),
compute_sh(final_attr, tr->desc))); compute_final_sh(final_attr, s2_sh)));
return par; return par;
} }
static u64 compute_par_s1(struct kvm_vcpu *vcpu, struct s1_walk_result *wr, static u64 compute_par_s1(struct kvm_vcpu *vcpu, struct s1_walk_info *wi,
enum trans_regime regime) struct s1_walk_result *wr)
{ {
u64 par; u64 par;
@ -764,9 +858,9 @@ static u64 compute_par_s1(struct kvm_vcpu *vcpu, struct s1_walk_result *wr,
} else if (wr->level == S1_MMU_DISABLED) { } else if (wr->level == S1_MMU_DISABLED) {
/* MMU off or HCR_EL2.DC == 1 */ /* MMU off or HCR_EL2.DC == 1 */
par = SYS_PAR_EL1_NSE; par = SYS_PAR_EL1_NSE;
par |= wr->pa & GENMASK_ULL(47, 12); par |= wr->pa & SYS_PAR_EL1_PA;
if (regime == TR_EL10 && if (wi->regime == TR_EL10 && vcpu_has_nv(vcpu) &&
(__vcpu_sys_reg(vcpu, HCR_EL2) & HCR_DC)) { (__vcpu_sys_reg(vcpu, HCR_EL2) & HCR_DC)) {
par |= FIELD_PREP(SYS_PAR_EL1_ATTR, par |= FIELD_PREP(SYS_PAR_EL1_ATTR,
MEMATTR(WbRaWa, WbRaWa)); MEMATTR(WbRaWa, WbRaWa));
@ -781,14 +875,14 @@ static u64 compute_par_s1(struct kvm_vcpu *vcpu, struct s1_walk_result *wr,
par = SYS_PAR_EL1_NSE; par = SYS_PAR_EL1_NSE;
mair = (regime == TR_EL10 ? mair = (wi->regime == TR_EL10 ?
vcpu_read_sys_reg(vcpu, MAIR_EL1) : vcpu_read_sys_reg(vcpu, MAIR_EL1) :
vcpu_read_sys_reg(vcpu, MAIR_EL2)); vcpu_read_sys_reg(vcpu, MAIR_EL2));
mair >>= FIELD_GET(PTE_ATTRINDX_MASK, wr->desc) * 8; mair >>= FIELD_GET(PTE_ATTRINDX_MASK, wr->desc) * 8;
mair &= 0xff; mair &= 0xff;
sctlr = (regime == TR_EL10 ? sctlr = (wi->regime == TR_EL10 ?
vcpu_read_sys_reg(vcpu, SCTLR_EL1) : vcpu_read_sys_reg(vcpu, SCTLR_EL1) :
vcpu_read_sys_reg(vcpu, SCTLR_EL2)); vcpu_read_sys_reg(vcpu, SCTLR_EL2));
@ -797,9 +891,9 @@ static u64 compute_par_s1(struct kvm_vcpu *vcpu, struct s1_walk_result *wr,
mair = MEMATTR(NC, NC); mair = MEMATTR(NC, NC);
par |= FIELD_PREP(SYS_PAR_EL1_ATTR, mair); par |= FIELD_PREP(SYS_PAR_EL1_ATTR, mair);
par |= wr->pa & GENMASK_ULL(47, 12); par |= wr->pa & SYS_PAR_EL1_PA;
sh = compute_sh(mair, wr->desc); sh = compute_s1_sh(wi, wr, mair);
par |= FIELD_PREP(SYS_PAR_EL1_SH, sh); par |= FIELD_PREP(SYS_PAR_EL1_SH, sh);
} }
@ -873,7 +967,7 @@ static void compute_s1_direct_permissions(struct kvm_vcpu *vcpu,
wxn = (vcpu_read_sys_reg(vcpu, SCTLR_EL2) & SCTLR_ELx_WXN); wxn = (vcpu_read_sys_reg(vcpu, SCTLR_EL2) & SCTLR_ELx_WXN);
break; break;
case TR_EL10: case TR_EL10:
wxn = (__vcpu_sys_reg(vcpu, SCTLR_EL1) & SCTLR_ELx_WXN); wxn = (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & SCTLR_ELx_WXN);
break; break;
} }
@ -1186,7 +1280,7 @@ static u64 handle_at_slow(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
fail_s1_walk(&wr, ESR_ELx_FSC_PERM_L(wr.level), false); fail_s1_walk(&wr, ESR_ELx_FSC_PERM_L(wr.level), false);
compute_par: compute_par:
return compute_par_s1(vcpu, &wr, wi.regime); return compute_par_s1(vcpu, &wi, &wr);
} }
/* /*
@ -1202,7 +1296,7 @@ static u64 __kvm_at_s1e01_fast(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
{ {
struct mmu_config config; struct mmu_config config;
struct kvm_s2_mmu *mmu; struct kvm_s2_mmu *mmu;
bool fail; bool fail, mmu_cs;
u64 par; u64 par;
par = SYS_PAR_EL1_F; par = SYS_PAR_EL1_F;
@ -1218,8 +1312,13 @@ static u64 __kvm_at_s1e01_fast(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
* If HCR_EL2.{E2H,TGE} == {1,1}, the MMU context is already * If HCR_EL2.{E2H,TGE} == {1,1}, the MMU context is already
* the right one (as we trapped from vEL2). If not, save the * the right one (as we trapped from vEL2). If not, save the
* full MMU context. * full MMU context.
*
* We are also guaranteed to be in the correct context if
* we're not in a nested VM.
*/ */
if (vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu)) mmu_cs = (vcpu_has_nv(vcpu) &&
!(vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu)));
if (!mmu_cs)
goto skip_mmu_switch; goto skip_mmu_switch;
/* /*
@ -1287,7 +1386,7 @@ skip_mmu_switch:
write_sysreg_hcr(HCR_HOST_VHE_FLAGS); write_sysreg_hcr(HCR_HOST_VHE_FLAGS);
if (!(vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu))) if (mmu_cs)
__mmu_config_restore(&config); __mmu_config_restore(&config);
return par; return par;
@ -1470,3 +1569,68 @@ int __kvm_translate_va(struct kvm_vcpu *vcpu, struct s1_walk_info *wi,
return 0; return 0;
} }
struct desc_match {
u64 ipa;
int level;
};
static int match_s1_desc(struct s1_walk_context *ctxt, void *priv)
{
struct desc_match *dm = priv;
u64 ipa = dm->ipa;
/* Use S1 granule alignment */
ipa &= GENMASK(51, ctxt->wi->pgshift);
/* Not the IPA we're looking for? Continue. */
if (ipa != ctxt->table_ipa)
return 0;
/* Note the level and interrupt the walk */
dm->level = ctxt->level;
return -EINTR;
}
int __kvm_find_s1_desc_level(struct kvm_vcpu *vcpu, u64 va, u64 ipa, int *level)
{
struct desc_match dm = {
.ipa = ipa,
};
struct s1_walk_info wi = {
.filter = &(struct s1_walk_filter){
.fn = match_s1_desc,
.priv = &dm,
},
.regime = TR_EL10,
.as_el0 = false,
.pan = false,
};
struct s1_walk_result wr = {};
int ret;
ret = setup_s1_walk(vcpu, &wi, &wr, va);
if (ret)
return ret;
/* We really expect the S1 MMU to be on here... */
if (WARN_ON_ONCE(wr.level == S1_MMU_DISABLED)) {
*level = 0;
return 0;
}
/* Walk the guest's PT, looking for a match along the way */
ret = walk_s1(vcpu, &wi, &wr, va);
switch (ret) {
case -EINTR:
/* We interrupted the walk on a match, return the level */
*level = dm.level;
return 0;
case 0:
/* The walk completed, we failed to find the entry */
return -ENOENT;
default:
/* Any other error... */
return ret;
}
}

View File

@ -7,12 +7,22 @@
#include <linux/kvm_host.h> #include <linux/kvm_host.h>
#include <asm/sysreg.h> #include <asm/sysreg.h>
/*
* Describes the dependencies between a set of bits (or the negation
* of a set of RES0 bits) and a feature. The flags indicate how the
* data is interpreted.
*/
struct reg_bits_to_feat_map { struct reg_bits_to_feat_map {
u64 bits; union {
u64 bits;
u64 *res0p;
};
#define NEVER_FGU BIT(0) /* Can trap, but never UNDEF */ #define NEVER_FGU BIT(0) /* Can trap, but never UNDEF */
#define CALL_FUNC BIT(1) /* Needs to evaluate tons of crap */ #define CALL_FUNC BIT(1) /* Needs to evaluate tons of crap */
#define FIXED_VALUE BIT(2) /* RAZ/WI or RAO/WI in KVM */ #define FIXED_VALUE BIT(2) /* RAZ/WI or RAO/WI in KVM */
#define RES0_POINTER BIT(3) /* Pointer to RES0 value instead of bits */
unsigned long flags; unsigned long flags;
union { union {
@ -28,9 +38,27 @@ struct reg_bits_to_feat_map {
}; };
}; };
#define __NEEDS_FEAT_3(m, f, id, fld, lim) \ /*
* Describes the dependencies for a given register:
*
* @feat_map describes the dependency for the whole register. If the
* features the register depends on are not present, the whole
* register is effectively RES0.
*
* @bit_feat_map describes the dependencies for a set of bits in that
* register. If the features these bits depend on are not present, the
* bits are effectively RES0.
*/
struct reg_feat_map_desc {
const char *name;
const struct reg_bits_to_feat_map feat_map;
const struct reg_bits_to_feat_map *bit_feat_map;
const unsigned int bit_feat_map_sz;
};
#define __NEEDS_FEAT_3(m, f, w, id, fld, lim) \
{ \ { \
.bits = (m), \ .w = (m), \
.flags = (f), \ .flags = (f), \
.regidx = IDREG_IDX(SYS_ ## id), \ .regidx = IDREG_IDX(SYS_ ## id), \
.shift = id ##_## fld ## _SHIFT, \ .shift = id ##_## fld ## _SHIFT, \
@ -39,28 +67,63 @@ struct reg_bits_to_feat_map {
.lo_lim = id ##_## fld ##_## lim \ .lo_lim = id ##_## fld ##_## lim \
} }
#define __NEEDS_FEAT_2(m, f, fun, dummy) \ #define __NEEDS_FEAT_2(m, f, w, fun, dummy) \
{ \ { \
.bits = (m), \ .w = (m), \
.flags = (f) | CALL_FUNC, \ .flags = (f) | CALL_FUNC, \
.fval = (fun), \ .fval = (fun), \
} }
#define __NEEDS_FEAT_1(m, f, fun) \ #define __NEEDS_FEAT_1(m, f, w, fun) \
{ \ { \
.bits = (m), \ .w = (m), \
.flags = (f) | CALL_FUNC, \ .flags = (f) | CALL_FUNC, \
.match = (fun), \ .match = (fun), \
} }
#define __NEEDS_FEAT_FLAG(m, f, w, ...) \
CONCATENATE(__NEEDS_FEAT_, COUNT_ARGS(__VA_ARGS__))(m, f, w, __VA_ARGS__)
#define NEEDS_FEAT_FLAG(m, f, ...) \ #define NEEDS_FEAT_FLAG(m, f, ...) \
CONCATENATE(__NEEDS_FEAT_, COUNT_ARGS(__VA_ARGS__))(m, f, __VA_ARGS__) __NEEDS_FEAT_FLAG(m, f, bits, __VA_ARGS__)
#define NEEDS_FEAT_FIXED(m, ...) \ #define NEEDS_FEAT_FIXED(m, ...) \
NEEDS_FEAT_FLAG(m, FIXED_VALUE, __VA_ARGS__, 0) __NEEDS_FEAT_FLAG(m, FIXED_VALUE, bits, __VA_ARGS__, 0)
#define NEEDS_FEAT_RES0(p, ...) \
__NEEDS_FEAT_FLAG(p, RES0_POINTER, res0p, __VA_ARGS__)
/*
* Declare the dependency between a set of bits and a set of features,
* generating a struct reg_bit_to_feat_map.
*/
#define NEEDS_FEAT(m, ...) NEEDS_FEAT_FLAG(m, 0, __VA_ARGS__) #define NEEDS_FEAT(m, ...) NEEDS_FEAT_FLAG(m, 0, __VA_ARGS__)
/*
* Declare the dependency between a non-FGT register, a set of
* feature, and the set of individual bits it contains. This generates
* a struct reg_feat_map_desc.
*/
#define DECLARE_FEAT_MAP(n, r, m, f) \
struct reg_feat_map_desc n = { \
.name = #r, \
.feat_map = NEEDS_FEAT(~r##_RES0, f), \
.bit_feat_map = m, \
.bit_feat_map_sz = ARRAY_SIZE(m), \
}
/*
* Specialised version of the above for FGT registers that have their
* RES0 masks described as struct fgt_masks.
*/
#define DECLARE_FEAT_MAP_FGT(n, msk, m, f) \
struct reg_feat_map_desc n = { \
.name = #msk, \
.feat_map = NEEDS_FEAT_RES0(&msk.res0, f),\
.bit_feat_map = m, \
.bit_feat_map_sz = ARRAY_SIZE(m), \
}
#define FEAT_SPE ID_AA64DFR0_EL1, PMSVer, IMP #define FEAT_SPE ID_AA64DFR0_EL1, PMSVer, IMP
#define FEAT_SPE_FnE ID_AA64DFR0_EL1, PMSVer, V1P2 #define FEAT_SPE_FnE ID_AA64DFR0_EL1, PMSVer, V1P2
#define FEAT_BRBE ID_AA64DFR0_EL1, BRBE, IMP #define FEAT_BRBE ID_AA64DFR0_EL1, BRBE, IMP
@ -73,6 +136,7 @@ struct reg_bits_to_feat_map {
#define FEAT_AA32EL0 ID_AA64PFR0_EL1, EL0, AARCH32 #define FEAT_AA32EL0 ID_AA64PFR0_EL1, EL0, AARCH32
#define FEAT_AA32EL1 ID_AA64PFR0_EL1, EL1, AARCH32 #define FEAT_AA32EL1 ID_AA64PFR0_EL1, EL1, AARCH32
#define FEAT_AA64EL1 ID_AA64PFR0_EL1, EL1, IMP #define FEAT_AA64EL1 ID_AA64PFR0_EL1, EL1, IMP
#define FEAT_AA64EL2 ID_AA64PFR0_EL1, EL2, IMP
#define FEAT_AA64EL3 ID_AA64PFR0_EL1, EL3, IMP #define FEAT_AA64EL3 ID_AA64PFR0_EL1, EL3, IMP
#define FEAT_AIE ID_AA64MMFR3_EL1, AIE, IMP #define FEAT_AIE ID_AA64MMFR3_EL1, AIE, IMP
#define FEAT_S2POE ID_AA64MMFR3_EL1, S2POE, IMP #define FEAT_S2POE ID_AA64MMFR3_EL1, S2POE, IMP
@ -131,7 +195,6 @@ struct reg_bits_to_feat_map {
#define FEAT_SPMU ID_AA64DFR1_EL1, SPMU, IMP #define FEAT_SPMU ID_AA64DFR1_EL1, SPMU, IMP
#define FEAT_SPE_nVM ID_AA64DFR2_EL1, SPE_nVM, IMP #define FEAT_SPE_nVM ID_AA64DFR2_EL1, SPE_nVM, IMP
#define FEAT_STEP2 ID_AA64DFR2_EL1, STEP, IMP #define FEAT_STEP2 ID_AA64DFR2_EL1, STEP, IMP
#define FEAT_SYSREG128 ID_AA64ISAR2_EL1, SYSREG_128, IMP
#define FEAT_CPA2 ID_AA64ISAR3_EL1, CPA, CPA2 #define FEAT_CPA2 ID_AA64ISAR3_EL1, CPA, CPA2
#define FEAT_ASID2 ID_AA64MMFR4_EL1, ASID2, IMP #define FEAT_ASID2 ID_AA64MMFR4_EL1, ASID2, IMP
#define FEAT_MEC ID_AA64MMFR3_EL1, MEC, IMP #define FEAT_MEC ID_AA64MMFR3_EL1, MEC, IMP
@ -143,7 +206,6 @@ struct reg_bits_to_feat_map {
#define FEAT_LSMAOC ID_AA64MMFR2_EL1, LSM, IMP #define FEAT_LSMAOC ID_AA64MMFR2_EL1, LSM, IMP
#define FEAT_MixedEnd ID_AA64MMFR0_EL1, BIGEND, IMP #define FEAT_MixedEnd ID_AA64MMFR0_EL1, BIGEND, IMP
#define FEAT_MixedEndEL0 ID_AA64MMFR0_EL1, BIGENDEL0, IMP #define FEAT_MixedEndEL0 ID_AA64MMFR0_EL1, BIGENDEL0, IMP
#define FEAT_MTE2 ID_AA64PFR1_EL1, MTE, MTE2
#define FEAT_MTE_ASYNC ID_AA64PFR1_EL1, MTE_frac, ASYNC #define FEAT_MTE_ASYNC ID_AA64PFR1_EL1, MTE_frac, ASYNC
#define FEAT_MTE_STORE_ONLY ID_AA64PFR2_EL1, MTESTOREONLY, IMP #define FEAT_MTE_STORE_ONLY ID_AA64PFR2_EL1, MTESTOREONLY, IMP
#define FEAT_PAN ID_AA64MMFR1_EL1, PAN, IMP #define FEAT_PAN ID_AA64MMFR1_EL1, PAN, IMP
@ -151,7 +213,9 @@ struct reg_bits_to_feat_map {
#define FEAT_SSBS ID_AA64PFR1_EL1, SSBS, IMP #define FEAT_SSBS ID_AA64PFR1_EL1, SSBS, IMP
#define FEAT_TIDCP1 ID_AA64MMFR1_EL1, TIDCP1, IMP #define FEAT_TIDCP1 ID_AA64MMFR1_EL1, TIDCP1, IMP
#define FEAT_FGT ID_AA64MMFR0_EL1, FGT, IMP #define FEAT_FGT ID_AA64MMFR0_EL1, FGT, IMP
#define FEAT_FGT2 ID_AA64MMFR0_EL1, FGT, FGT2
#define FEAT_MTPMU ID_AA64DFR0_EL1, MTPMU, IMP #define FEAT_MTPMU ID_AA64DFR0_EL1, MTPMU, IMP
#define FEAT_HCX ID_AA64MMFR1_EL1, HCX, IMP
static bool not_feat_aa64el3(struct kvm *kvm) static bool not_feat_aa64el3(struct kvm *kvm)
{ {
@ -397,6 +461,10 @@ static const struct reg_bits_to_feat_map hfgrtr_feat_map[] = {
NEVER_FGU, FEAT_AA64EL1), NEVER_FGU, FEAT_AA64EL1),
}; };
static const DECLARE_FEAT_MAP_FGT(hfgrtr_desc, hfgrtr_masks,
hfgrtr_feat_map, FEAT_FGT);
static const struct reg_bits_to_feat_map hfgwtr_feat_map[] = { static const struct reg_bits_to_feat_map hfgwtr_feat_map[] = {
NEEDS_FEAT(HFGWTR_EL2_nAMAIR2_EL1 | NEEDS_FEAT(HFGWTR_EL2_nAMAIR2_EL1 |
HFGWTR_EL2_nMAIR2_EL1, HFGWTR_EL2_nMAIR2_EL1,
@ -461,6 +529,9 @@ static const struct reg_bits_to_feat_map hfgwtr_feat_map[] = {
NEVER_FGU, FEAT_AA64EL1), NEVER_FGU, FEAT_AA64EL1),
}; };
static const DECLARE_FEAT_MAP_FGT(hfgwtr_desc, hfgwtr_masks,
hfgwtr_feat_map, FEAT_FGT);
static const struct reg_bits_to_feat_map hdfgrtr_feat_map[] = { static const struct reg_bits_to_feat_map hdfgrtr_feat_map[] = {
NEEDS_FEAT(HDFGRTR_EL2_PMBIDR_EL1 | NEEDS_FEAT(HDFGRTR_EL2_PMBIDR_EL1 |
HDFGRTR_EL2_PMSLATFR_EL1 | HDFGRTR_EL2_PMSLATFR_EL1 |
@ -528,6 +599,9 @@ static const struct reg_bits_to_feat_map hdfgrtr_feat_map[] = {
NEVER_FGU, FEAT_AA64EL1) NEVER_FGU, FEAT_AA64EL1)
}; };
static const DECLARE_FEAT_MAP_FGT(hdfgrtr_desc, hdfgrtr_masks,
hdfgrtr_feat_map, FEAT_FGT);
static const struct reg_bits_to_feat_map hdfgwtr_feat_map[] = { static const struct reg_bits_to_feat_map hdfgwtr_feat_map[] = {
NEEDS_FEAT(HDFGWTR_EL2_PMSLATFR_EL1 | NEEDS_FEAT(HDFGWTR_EL2_PMSLATFR_EL1 |
HDFGWTR_EL2_PMSIRR_EL1 | HDFGWTR_EL2_PMSIRR_EL1 |
@ -588,6 +662,8 @@ static const struct reg_bits_to_feat_map hdfgwtr_feat_map[] = {
NEEDS_FEAT(HDFGWTR_EL2_TRFCR_EL1, FEAT_TRF), NEEDS_FEAT(HDFGWTR_EL2_TRFCR_EL1, FEAT_TRF),
}; };
static const DECLARE_FEAT_MAP_FGT(hdfgwtr_desc, hdfgwtr_masks,
hdfgwtr_feat_map, FEAT_FGT);
static const struct reg_bits_to_feat_map hfgitr_feat_map[] = { static const struct reg_bits_to_feat_map hfgitr_feat_map[] = {
NEEDS_FEAT(HFGITR_EL2_PSBCSYNC, FEAT_SPEv1p5), NEEDS_FEAT(HFGITR_EL2_PSBCSYNC, FEAT_SPEv1p5),
@ -662,6 +738,9 @@ static const struct reg_bits_to_feat_map hfgitr_feat_map[] = {
NEVER_FGU, FEAT_AA64EL1), NEVER_FGU, FEAT_AA64EL1),
}; };
static const DECLARE_FEAT_MAP_FGT(hfgitr_desc, hfgitr_masks,
hfgitr_feat_map, FEAT_FGT);
static const struct reg_bits_to_feat_map hafgrtr_feat_map[] = { static const struct reg_bits_to_feat_map hafgrtr_feat_map[] = {
NEEDS_FEAT(HAFGRTR_EL2_AMEVTYPER115_EL0 | NEEDS_FEAT(HAFGRTR_EL2_AMEVTYPER115_EL0 |
HAFGRTR_EL2_AMEVTYPER114_EL0 | HAFGRTR_EL2_AMEVTYPER114_EL0 |
@ -704,11 +783,17 @@ static const struct reg_bits_to_feat_map hafgrtr_feat_map[] = {
FEAT_AMUv1), FEAT_AMUv1),
}; };
static const DECLARE_FEAT_MAP_FGT(hafgrtr_desc, hafgrtr_masks,
hafgrtr_feat_map, FEAT_FGT);
static const struct reg_bits_to_feat_map hfgitr2_feat_map[] = { static const struct reg_bits_to_feat_map hfgitr2_feat_map[] = {
NEEDS_FEAT(HFGITR2_EL2_nDCCIVAPS, FEAT_PoPS), NEEDS_FEAT(HFGITR2_EL2_nDCCIVAPS, FEAT_PoPS),
NEEDS_FEAT(HFGITR2_EL2_TSBCSYNC, FEAT_TRBEv1p1) NEEDS_FEAT(HFGITR2_EL2_TSBCSYNC, FEAT_TRBEv1p1)
}; };
static const DECLARE_FEAT_MAP_FGT(hfgitr2_desc, hfgitr2_masks,
hfgitr2_feat_map, FEAT_FGT2);
static const struct reg_bits_to_feat_map hfgrtr2_feat_map[] = { static const struct reg_bits_to_feat_map hfgrtr2_feat_map[] = {
NEEDS_FEAT(HFGRTR2_EL2_nPFAR_EL1, FEAT_PFAR), NEEDS_FEAT(HFGRTR2_EL2_nPFAR_EL1, FEAT_PFAR),
NEEDS_FEAT(HFGRTR2_EL2_nERXGSR_EL1, FEAT_RASv2), NEEDS_FEAT(HFGRTR2_EL2_nERXGSR_EL1, FEAT_RASv2),
@ -728,6 +813,9 @@ static const struct reg_bits_to_feat_map hfgrtr2_feat_map[] = {
NEEDS_FEAT(HFGRTR2_EL2_nRCWSMASK_EL1, FEAT_THE), NEEDS_FEAT(HFGRTR2_EL2_nRCWSMASK_EL1, FEAT_THE),
}; };
static const DECLARE_FEAT_MAP_FGT(hfgrtr2_desc, hfgrtr2_masks,
hfgrtr2_feat_map, FEAT_FGT2);
static const struct reg_bits_to_feat_map hfgwtr2_feat_map[] = { static const struct reg_bits_to_feat_map hfgwtr2_feat_map[] = {
NEEDS_FEAT(HFGWTR2_EL2_nPFAR_EL1, FEAT_PFAR), NEEDS_FEAT(HFGWTR2_EL2_nPFAR_EL1, FEAT_PFAR),
NEEDS_FEAT(HFGWTR2_EL2_nACTLRALIAS_EL1 | NEEDS_FEAT(HFGWTR2_EL2_nACTLRALIAS_EL1 |
@ -746,6 +834,9 @@ static const struct reg_bits_to_feat_map hfgwtr2_feat_map[] = {
NEEDS_FEAT(HFGWTR2_EL2_nRCWSMASK_EL1, FEAT_THE), NEEDS_FEAT(HFGWTR2_EL2_nRCWSMASK_EL1, FEAT_THE),
}; };
static const DECLARE_FEAT_MAP_FGT(hfgwtr2_desc, hfgwtr2_masks,
hfgwtr2_feat_map, FEAT_FGT2);
static const struct reg_bits_to_feat_map hdfgrtr2_feat_map[] = { static const struct reg_bits_to_feat_map hdfgrtr2_feat_map[] = {
NEEDS_FEAT(HDFGRTR2_EL2_nMDSELR_EL1, FEAT_Debugv8p9), NEEDS_FEAT(HDFGRTR2_EL2_nMDSELR_EL1, FEAT_Debugv8p9),
NEEDS_FEAT(HDFGRTR2_EL2_nPMECR_EL1, feat_ebep_pmuv3_ss), NEEDS_FEAT(HDFGRTR2_EL2_nPMECR_EL1, feat_ebep_pmuv3_ss),
@ -776,6 +867,9 @@ static const struct reg_bits_to_feat_map hdfgrtr2_feat_map[] = {
NEEDS_FEAT(HDFGRTR2_EL2_nTRBMPAM_EL1, feat_trbe_mpam), NEEDS_FEAT(HDFGRTR2_EL2_nTRBMPAM_EL1, feat_trbe_mpam),
}; };
static const DECLARE_FEAT_MAP_FGT(hdfgrtr2_desc, hdfgrtr2_masks,
hdfgrtr2_feat_map, FEAT_FGT2);
static const struct reg_bits_to_feat_map hdfgwtr2_feat_map[] = { static const struct reg_bits_to_feat_map hdfgwtr2_feat_map[] = {
NEEDS_FEAT(HDFGWTR2_EL2_nMDSELR_EL1, FEAT_Debugv8p9), NEEDS_FEAT(HDFGWTR2_EL2_nMDSELR_EL1, FEAT_Debugv8p9),
NEEDS_FEAT(HDFGWTR2_EL2_nPMECR_EL1, feat_ebep_pmuv3_ss), NEEDS_FEAT(HDFGWTR2_EL2_nPMECR_EL1, feat_ebep_pmuv3_ss),
@ -804,6 +898,10 @@ static const struct reg_bits_to_feat_map hdfgwtr2_feat_map[] = {
NEEDS_FEAT(HDFGWTR2_EL2_nTRBMPAM_EL1, feat_trbe_mpam), NEEDS_FEAT(HDFGWTR2_EL2_nTRBMPAM_EL1, feat_trbe_mpam),
}; };
static const DECLARE_FEAT_MAP_FGT(hdfgwtr2_desc, hdfgwtr2_masks,
hdfgwtr2_feat_map, FEAT_FGT2);
static const struct reg_bits_to_feat_map hcrx_feat_map[] = { static const struct reg_bits_to_feat_map hcrx_feat_map[] = {
NEEDS_FEAT(HCRX_EL2_PACMEn, feat_pauth_lr), NEEDS_FEAT(HCRX_EL2_PACMEn, feat_pauth_lr),
NEEDS_FEAT(HCRX_EL2_EnFPM, FEAT_FPMR), NEEDS_FEAT(HCRX_EL2_EnFPM, FEAT_FPMR),
@ -833,6 +931,10 @@ static const struct reg_bits_to_feat_map hcrx_feat_map[] = {
NEEDS_FEAT(HCRX_EL2_EnAS0, FEAT_LS64_ACCDATA), NEEDS_FEAT(HCRX_EL2_EnAS0, FEAT_LS64_ACCDATA),
}; };
static const DECLARE_FEAT_MAP(hcrx_desc, __HCRX_EL2,
hcrx_feat_map, FEAT_HCX);
static const struct reg_bits_to_feat_map hcr_feat_map[] = { static const struct reg_bits_to_feat_map hcr_feat_map[] = {
NEEDS_FEAT(HCR_EL2_TID0, FEAT_AA32EL0), NEEDS_FEAT(HCR_EL2_TID0, FEAT_AA32EL0),
NEEDS_FEAT_FIXED(HCR_EL2_RW, compute_hcr_rw), NEEDS_FEAT_FIXED(HCR_EL2_RW, compute_hcr_rw),
@ -904,6 +1006,9 @@ static const struct reg_bits_to_feat_map hcr_feat_map[] = {
NEEDS_FEAT_FIXED(HCR_EL2_E2H, compute_hcr_e2h), NEEDS_FEAT_FIXED(HCR_EL2_E2H, compute_hcr_e2h),
}; };
static const DECLARE_FEAT_MAP(hcr_desc, HCR_EL2,
hcr_feat_map, FEAT_AA64EL2);
static const struct reg_bits_to_feat_map sctlr2_feat_map[] = { static const struct reg_bits_to_feat_map sctlr2_feat_map[] = {
NEEDS_FEAT(SCTLR2_EL1_NMEA | NEEDS_FEAT(SCTLR2_EL1_NMEA |
SCTLR2_EL1_EASE, SCTLR2_EL1_EASE,
@ -921,6 +1026,9 @@ static const struct reg_bits_to_feat_map sctlr2_feat_map[] = {
FEAT_CPA2), FEAT_CPA2),
}; };
static const DECLARE_FEAT_MAP(sctlr2_desc, SCTLR2_EL1,
sctlr2_feat_map, FEAT_SCTLR2);
static const struct reg_bits_to_feat_map tcr2_el2_feat_map[] = { static const struct reg_bits_to_feat_map tcr2_el2_feat_map[] = {
NEEDS_FEAT(TCR2_EL2_FNG1 | NEEDS_FEAT(TCR2_EL2_FNG1 |
TCR2_EL2_FNG0 | TCR2_EL2_FNG0 |
@ -943,6 +1051,9 @@ static const struct reg_bits_to_feat_map tcr2_el2_feat_map[] = {
NEEDS_FEAT(TCR2_EL2_PIE, FEAT_S1PIE), NEEDS_FEAT(TCR2_EL2_PIE, FEAT_S1PIE),
}; };
static const DECLARE_FEAT_MAP(tcr2_el2_desc, TCR2_EL2,
tcr2_el2_feat_map, FEAT_TCR2);
static const struct reg_bits_to_feat_map sctlr_el1_feat_map[] = { static const struct reg_bits_to_feat_map sctlr_el1_feat_map[] = {
NEEDS_FEAT(SCTLR_EL1_CP15BEN | NEEDS_FEAT(SCTLR_EL1_CP15BEN |
SCTLR_EL1_ITD | SCTLR_EL1_ITD |
@ -1017,6 +1128,9 @@ static const struct reg_bits_to_feat_map sctlr_el1_feat_map[] = {
FEAT_AA64EL1), FEAT_AA64EL1),
}; };
static const DECLARE_FEAT_MAP(sctlr_el1_desc, SCTLR_EL1,
sctlr_el1_feat_map, FEAT_AA64EL1);
static const struct reg_bits_to_feat_map mdcr_el2_feat_map[] = { static const struct reg_bits_to_feat_map mdcr_el2_feat_map[] = {
NEEDS_FEAT(MDCR_EL2_EBWE, FEAT_Debugv8p9), NEEDS_FEAT(MDCR_EL2_EBWE, FEAT_Debugv8p9),
NEEDS_FEAT(MDCR_EL2_TDOSA, FEAT_DoubleLock), NEEDS_FEAT(MDCR_EL2_TDOSA, FEAT_DoubleLock),
@ -1048,6 +1162,9 @@ static const struct reg_bits_to_feat_map mdcr_el2_feat_map[] = {
FEAT_AA64EL1), FEAT_AA64EL1),
}; };
static const DECLARE_FEAT_MAP(mdcr_el2_desc, MDCR_EL2,
mdcr_el2_feat_map, FEAT_AA64EL2);
static void __init check_feat_map(const struct reg_bits_to_feat_map *map, static void __init check_feat_map(const struct reg_bits_to_feat_map *map,
int map_size, u64 res0, const char *str) int map_size, u64 res0, const char *str)
{ {
@ -1061,32 +1178,36 @@ static void __init check_feat_map(const struct reg_bits_to_feat_map *map,
str, mask ^ ~res0); str, mask ^ ~res0);
} }
static u64 reg_feat_map_bits(const struct reg_bits_to_feat_map *map)
{
return map->flags & RES0_POINTER ? ~(*map->res0p) : map->bits;
}
static void __init check_reg_desc(const struct reg_feat_map_desc *r)
{
check_feat_map(r->bit_feat_map, r->bit_feat_map_sz,
~reg_feat_map_bits(&r->feat_map), r->name);
}
void __init check_feature_map(void) void __init check_feature_map(void)
{ {
check_feat_map(hfgrtr_feat_map, ARRAY_SIZE(hfgrtr_feat_map), check_reg_desc(&hfgrtr_desc);
hfgrtr_masks.res0, hfgrtr_masks.str); check_reg_desc(&hfgwtr_desc);
check_feat_map(hfgwtr_feat_map, ARRAY_SIZE(hfgwtr_feat_map), check_reg_desc(&hfgitr_desc);
hfgwtr_masks.res0, hfgwtr_masks.str); check_reg_desc(&hdfgrtr_desc);
check_feat_map(hfgitr_feat_map, ARRAY_SIZE(hfgitr_feat_map), check_reg_desc(&hdfgwtr_desc);
hfgitr_masks.res0, hfgitr_masks.str); check_reg_desc(&hafgrtr_desc);
check_feat_map(hdfgrtr_feat_map, ARRAY_SIZE(hdfgrtr_feat_map), check_reg_desc(&hfgrtr2_desc);
hdfgrtr_masks.res0, hdfgrtr_masks.str); check_reg_desc(&hfgwtr2_desc);
check_feat_map(hdfgwtr_feat_map, ARRAY_SIZE(hdfgwtr_feat_map), check_reg_desc(&hfgitr2_desc);
hdfgwtr_masks.res0, hdfgwtr_masks.str); check_reg_desc(&hdfgrtr2_desc);
check_feat_map(hafgrtr_feat_map, ARRAY_SIZE(hafgrtr_feat_map), check_reg_desc(&hdfgwtr2_desc);
hafgrtr_masks.res0, hafgrtr_masks.str); check_reg_desc(&hcrx_desc);
check_feat_map(hcrx_feat_map, ARRAY_SIZE(hcrx_feat_map), check_reg_desc(&hcr_desc);
__HCRX_EL2_RES0, "HCRX_EL2"); check_reg_desc(&sctlr2_desc);
check_feat_map(hcr_feat_map, ARRAY_SIZE(hcr_feat_map), check_reg_desc(&tcr2_el2_desc);
HCR_EL2_RES0, "HCR_EL2"); check_reg_desc(&sctlr_el1_desc);
check_feat_map(sctlr2_feat_map, ARRAY_SIZE(sctlr2_feat_map), check_reg_desc(&mdcr_el2_desc);
SCTLR2_EL1_RES0, "SCTLR2_EL1");
check_feat_map(tcr2_el2_feat_map, ARRAY_SIZE(tcr2_el2_feat_map),
TCR2_EL2_RES0, "TCR2_EL2");
check_feat_map(sctlr_el1_feat_map, ARRAY_SIZE(sctlr_el1_feat_map),
SCTLR_EL1_RES0, "SCTLR_EL1");
check_feat_map(mdcr_el2_feat_map, ARRAY_SIZE(mdcr_el2_feat_map),
MDCR_EL2_RES0, "MDCR_EL2");
} }
static bool idreg_feat_match(struct kvm *kvm, const struct reg_bits_to_feat_map *map) static bool idreg_feat_match(struct kvm *kvm, const struct reg_bits_to_feat_map *map)
@ -1129,7 +1250,7 @@ static u64 __compute_fixed_bits(struct kvm *kvm,
match = idreg_feat_match(kvm, &map[i]); match = idreg_feat_match(kvm, &map[i]);
if (!match || (map[i].flags & FIXED_VALUE)) if (!match || (map[i].flags & FIXED_VALUE))
val |= map[i].bits; val |= reg_feat_map_bits(&map[i]);
} }
return val; return val;
@ -1145,15 +1266,36 @@ static u64 compute_res0_bits(struct kvm *kvm,
require, exclude | FIXED_VALUE); require, exclude | FIXED_VALUE);
} }
static u64 compute_fixed_bits(struct kvm *kvm, static u64 compute_reg_res0_bits(struct kvm *kvm,
const struct reg_bits_to_feat_map *map, const struct reg_feat_map_desc *r,
int map_size, unsigned long require, unsigned long exclude)
u64 *fixed_bits,
unsigned long require,
unsigned long exclude)
{ {
return __compute_fixed_bits(kvm, map, map_size, fixed_bits, u64 res0;
require | FIXED_VALUE, exclude);
res0 = compute_res0_bits(kvm, r->bit_feat_map, r->bit_feat_map_sz,
require, exclude);
/*
* If computing FGUs, don't take RES0 or register existence
* into account -- we're not computing bits for the register
* itself.
*/
if (!(exclude & NEVER_FGU)) {
res0 |= compute_res0_bits(kvm, &r->feat_map, 1, require, exclude);
res0 |= ~reg_feat_map_bits(&r->feat_map);
}
return res0;
}
static u64 compute_reg_fixed_bits(struct kvm *kvm,
const struct reg_feat_map_desc *r,
u64 *fixed_bits, unsigned long require,
unsigned long exclude)
{
return __compute_fixed_bits(kvm, r->bit_feat_map, r->bit_feat_map_sz,
fixed_bits, require | FIXED_VALUE, exclude);
} }
void compute_fgu(struct kvm *kvm, enum fgt_group_id fgt) void compute_fgu(struct kvm *kvm, enum fgt_group_id fgt)
@ -1162,51 +1304,40 @@ void compute_fgu(struct kvm *kvm, enum fgt_group_id fgt)
switch (fgt) { switch (fgt) {
case HFGRTR_GROUP: case HFGRTR_GROUP:
val |= compute_res0_bits(kvm, hfgrtr_feat_map, val |= compute_reg_res0_bits(kvm, &hfgrtr_desc,
ARRAY_SIZE(hfgrtr_feat_map), 0, NEVER_FGU);
0, NEVER_FGU); val |= compute_reg_res0_bits(kvm, &hfgwtr_desc,
val |= compute_res0_bits(kvm, hfgwtr_feat_map, 0, NEVER_FGU);
ARRAY_SIZE(hfgwtr_feat_map),
0, NEVER_FGU);
break; break;
case HFGITR_GROUP: case HFGITR_GROUP:
val |= compute_res0_bits(kvm, hfgitr_feat_map, val |= compute_reg_res0_bits(kvm, &hfgitr_desc,
ARRAY_SIZE(hfgitr_feat_map), 0, NEVER_FGU);
0, NEVER_FGU);
break; break;
case HDFGRTR_GROUP: case HDFGRTR_GROUP:
val |= compute_res0_bits(kvm, hdfgrtr_feat_map, val |= compute_reg_res0_bits(kvm, &hdfgrtr_desc,
ARRAY_SIZE(hdfgrtr_feat_map), 0, NEVER_FGU);
0, NEVER_FGU); val |= compute_reg_res0_bits(kvm, &hdfgwtr_desc,
val |= compute_res0_bits(kvm, hdfgwtr_feat_map, 0, NEVER_FGU);
ARRAY_SIZE(hdfgwtr_feat_map),
0, NEVER_FGU);
break; break;
case HAFGRTR_GROUP: case HAFGRTR_GROUP:
val |= compute_res0_bits(kvm, hafgrtr_feat_map, val |= compute_reg_res0_bits(kvm, &hafgrtr_desc,
ARRAY_SIZE(hafgrtr_feat_map), 0, NEVER_FGU);
0, NEVER_FGU);
break; break;
case HFGRTR2_GROUP: case HFGRTR2_GROUP:
val |= compute_res0_bits(kvm, hfgrtr2_feat_map, val |= compute_reg_res0_bits(kvm, &hfgrtr2_desc,
ARRAY_SIZE(hfgrtr2_feat_map), 0, NEVER_FGU);
0, NEVER_FGU); val |= compute_reg_res0_bits(kvm, &hfgwtr2_desc,
val |= compute_res0_bits(kvm, hfgwtr2_feat_map, 0, NEVER_FGU);
ARRAY_SIZE(hfgwtr2_feat_map),
0, NEVER_FGU);
break; break;
case HFGITR2_GROUP: case HFGITR2_GROUP:
val |= compute_res0_bits(kvm, hfgitr2_feat_map, val |= compute_reg_res0_bits(kvm, &hfgitr2_desc,
ARRAY_SIZE(hfgitr2_feat_map), 0, NEVER_FGU);
0, NEVER_FGU);
break; break;
case HDFGRTR2_GROUP: case HDFGRTR2_GROUP:
val |= compute_res0_bits(kvm, hdfgrtr2_feat_map, val |= compute_reg_res0_bits(kvm, &hdfgrtr2_desc,
ARRAY_SIZE(hdfgrtr2_feat_map), 0, NEVER_FGU);
0, NEVER_FGU); val |= compute_reg_res0_bits(kvm, &hdfgwtr2_desc,
val |= compute_res0_bits(kvm, hdfgwtr2_feat_map, 0, NEVER_FGU);
ARRAY_SIZE(hdfgwtr2_feat_map),
0, NEVER_FGU);
break; break;
default: default:
BUG(); BUG();
@ -1221,109 +1352,74 @@ void get_reg_fixed_bits(struct kvm *kvm, enum vcpu_sysreg reg, u64 *res0, u64 *r
switch (reg) { switch (reg) {
case HFGRTR_EL2: case HFGRTR_EL2:
*res0 = compute_res0_bits(kvm, hfgrtr_feat_map, *res0 = compute_reg_res0_bits(kvm, &hfgrtr_desc, 0, 0);
ARRAY_SIZE(hfgrtr_feat_map), 0, 0);
*res0 |= hfgrtr_masks.res0;
*res1 = HFGRTR_EL2_RES1; *res1 = HFGRTR_EL2_RES1;
break; break;
case HFGWTR_EL2: case HFGWTR_EL2:
*res0 = compute_res0_bits(kvm, hfgwtr_feat_map, *res0 = compute_reg_res0_bits(kvm, &hfgwtr_desc, 0, 0);
ARRAY_SIZE(hfgwtr_feat_map), 0, 0);
*res0 |= hfgwtr_masks.res0;
*res1 = HFGWTR_EL2_RES1; *res1 = HFGWTR_EL2_RES1;
break; break;
case HFGITR_EL2: case HFGITR_EL2:
*res0 = compute_res0_bits(kvm, hfgitr_feat_map, *res0 = compute_reg_res0_bits(kvm, &hfgitr_desc, 0, 0);
ARRAY_SIZE(hfgitr_feat_map), 0, 0);
*res0 |= hfgitr_masks.res0;
*res1 = HFGITR_EL2_RES1; *res1 = HFGITR_EL2_RES1;
break; break;
case HDFGRTR_EL2: case HDFGRTR_EL2:
*res0 = compute_res0_bits(kvm, hdfgrtr_feat_map, *res0 = compute_reg_res0_bits(kvm, &hdfgrtr_desc, 0, 0);
ARRAY_SIZE(hdfgrtr_feat_map), 0, 0);
*res0 |= hdfgrtr_masks.res0;
*res1 = HDFGRTR_EL2_RES1; *res1 = HDFGRTR_EL2_RES1;
break; break;
case HDFGWTR_EL2: case HDFGWTR_EL2:
*res0 = compute_res0_bits(kvm, hdfgwtr_feat_map, *res0 = compute_reg_res0_bits(kvm, &hdfgwtr_desc, 0, 0);
ARRAY_SIZE(hdfgwtr_feat_map), 0, 0);
*res0 |= hdfgwtr_masks.res0;
*res1 = HDFGWTR_EL2_RES1; *res1 = HDFGWTR_EL2_RES1;
break; break;
case HAFGRTR_EL2: case HAFGRTR_EL2:
*res0 = compute_res0_bits(kvm, hafgrtr_feat_map, *res0 = compute_reg_res0_bits(kvm, &hafgrtr_desc, 0, 0);
ARRAY_SIZE(hafgrtr_feat_map), 0, 0);
*res0 |= hafgrtr_masks.res0;
*res1 = HAFGRTR_EL2_RES1; *res1 = HAFGRTR_EL2_RES1;
break; break;
case HFGRTR2_EL2: case HFGRTR2_EL2:
*res0 = compute_res0_bits(kvm, hfgrtr2_feat_map, *res0 = compute_reg_res0_bits(kvm, &hfgrtr2_desc, 0, 0);
ARRAY_SIZE(hfgrtr2_feat_map), 0, 0);
*res0 |= hfgrtr2_masks.res0;
*res1 = HFGRTR2_EL2_RES1; *res1 = HFGRTR2_EL2_RES1;
break; break;
case HFGWTR2_EL2: case HFGWTR2_EL2:
*res0 = compute_res0_bits(kvm, hfgwtr2_feat_map, *res0 = compute_reg_res0_bits(kvm, &hfgwtr2_desc, 0, 0);
ARRAY_SIZE(hfgwtr2_feat_map), 0, 0);
*res0 |= hfgwtr2_masks.res0;
*res1 = HFGWTR2_EL2_RES1; *res1 = HFGWTR2_EL2_RES1;
break; break;
case HFGITR2_EL2: case HFGITR2_EL2:
*res0 = compute_res0_bits(kvm, hfgitr2_feat_map, *res0 = compute_reg_res0_bits(kvm, &hfgitr2_desc, 0, 0);
ARRAY_SIZE(hfgitr2_feat_map), 0, 0);
*res0 |= hfgitr2_masks.res0;
*res1 = HFGITR2_EL2_RES1; *res1 = HFGITR2_EL2_RES1;
break; break;
case HDFGRTR2_EL2: case HDFGRTR2_EL2:
*res0 = compute_res0_bits(kvm, hdfgrtr2_feat_map, *res0 = compute_reg_res0_bits(kvm, &hdfgrtr2_desc, 0, 0);
ARRAY_SIZE(hdfgrtr2_feat_map), 0, 0);
*res0 |= hdfgrtr2_masks.res0;
*res1 = HDFGRTR2_EL2_RES1; *res1 = HDFGRTR2_EL2_RES1;
break; break;
case HDFGWTR2_EL2: case HDFGWTR2_EL2:
*res0 = compute_res0_bits(kvm, hdfgwtr2_feat_map, *res0 = compute_reg_res0_bits(kvm, &hdfgwtr2_desc, 0, 0);
ARRAY_SIZE(hdfgwtr2_feat_map), 0, 0);
*res0 |= hdfgwtr2_masks.res0;
*res1 = HDFGWTR2_EL2_RES1; *res1 = HDFGWTR2_EL2_RES1;
break; break;
case HCRX_EL2: case HCRX_EL2:
*res0 = compute_res0_bits(kvm, hcrx_feat_map, *res0 = compute_reg_res0_bits(kvm, &hcrx_desc, 0, 0);
ARRAY_SIZE(hcrx_feat_map), 0, 0);
*res0 |= __HCRX_EL2_RES0;
*res1 = __HCRX_EL2_RES1; *res1 = __HCRX_EL2_RES1;
break; break;
case HCR_EL2: case HCR_EL2:
mask = compute_fixed_bits(kvm, hcr_feat_map, mask = compute_reg_fixed_bits(kvm, &hcr_desc, &fixed, 0, 0);
ARRAY_SIZE(hcr_feat_map), &fixed, *res0 = compute_reg_res0_bits(kvm, &hcr_desc, 0, 0);
0, 0); *res0 |= (mask & ~fixed);
*res0 = compute_res0_bits(kvm, hcr_feat_map,
ARRAY_SIZE(hcr_feat_map), 0, 0);
*res0 |= HCR_EL2_RES0 | (mask & ~fixed);
*res1 = HCR_EL2_RES1 | (mask & fixed); *res1 = HCR_EL2_RES1 | (mask & fixed);
break; break;
case SCTLR2_EL1: case SCTLR2_EL1:
case SCTLR2_EL2: case SCTLR2_EL2:
*res0 = compute_res0_bits(kvm, sctlr2_feat_map, *res0 = compute_reg_res0_bits(kvm, &sctlr2_desc, 0, 0);
ARRAY_SIZE(sctlr2_feat_map), 0, 0);
*res0 |= SCTLR2_EL1_RES0;
*res1 = SCTLR2_EL1_RES1; *res1 = SCTLR2_EL1_RES1;
break; break;
case TCR2_EL2: case TCR2_EL2:
*res0 = compute_res0_bits(kvm, tcr2_el2_feat_map, *res0 = compute_reg_res0_bits(kvm, &tcr2_el2_desc, 0, 0);
ARRAY_SIZE(tcr2_el2_feat_map), 0, 0);
*res0 |= TCR2_EL2_RES0;
*res1 = TCR2_EL2_RES1; *res1 = TCR2_EL2_RES1;
break; break;
case SCTLR_EL1: case SCTLR_EL1:
*res0 = compute_res0_bits(kvm, sctlr_el1_feat_map, *res0 = compute_reg_res0_bits(kvm, &sctlr_el1_desc, 0, 0);
ARRAY_SIZE(sctlr_el1_feat_map), 0, 0);
*res0 |= SCTLR_EL1_RES0;
*res1 = SCTLR_EL1_RES1; *res1 = SCTLR_EL1_RES1;
break; break;
case MDCR_EL2: case MDCR_EL2:
*res0 = compute_res0_bits(kvm, mdcr_el2_feat_map, *res0 = compute_reg_res0_bits(kvm, &mdcr_el2_desc, 0, 0);
ARRAY_SIZE(mdcr_el2_feat_map), 0, 0);
*res0 |= MDCR_EL2_RES0;
*res1 = MDCR_EL2_RES1; *res1 = MDCR_EL2_RES1;
break; break;
default: default:

View File

@ -56,6 +56,9 @@ static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu)
if (!kvm_guest_owns_debug_regs(vcpu)) if (!kvm_guest_owns_debug_regs(vcpu))
vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA; vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
if (vcpu_has_nv(vcpu))
kvm_nested_setup_mdcr_el2(vcpu);
/* Write MDCR_EL2 directly if we're already at EL2 */ /* Write MDCR_EL2 directly if we're already at EL2 */
if (has_vhe()) if (has_vhe())
write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2); write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
@ -243,29 +246,29 @@ void kvm_debug_handle_oslar(struct kvm_vcpu *vcpu, u64 val)
preempt_enable(); preempt_enable();
} }
static bool skip_trbe_access(bool skip_condition)
{
return (WARN_ON_ONCE(preemptible()) || skip_condition ||
is_protected_kvm_enabled() || !is_kvm_arm_initialised());
}
void kvm_enable_trbe(void) void kvm_enable_trbe(void)
{ {
if (has_vhe() || is_protected_kvm_enabled() || if (!skip_trbe_access(has_vhe()))
WARN_ON_ONCE(preemptible())) host_data_set_flag(TRBE_ENABLED);
return;
host_data_set_flag(TRBE_ENABLED);
} }
EXPORT_SYMBOL_GPL(kvm_enable_trbe); EXPORT_SYMBOL_GPL(kvm_enable_trbe);
void kvm_disable_trbe(void) void kvm_disable_trbe(void)
{ {
if (has_vhe() || is_protected_kvm_enabled() || if (!skip_trbe_access(has_vhe()))
WARN_ON_ONCE(preemptible())) host_data_clear_flag(TRBE_ENABLED);
return;
host_data_clear_flag(TRBE_ENABLED);
} }
EXPORT_SYMBOL_GPL(kvm_disable_trbe); EXPORT_SYMBOL_GPL(kvm_disable_trbe);
void kvm_tracing_set_el1_configuration(u64 trfcr_while_in_guest) void kvm_tracing_set_el1_configuration(u64 trfcr_while_in_guest)
{ {
if (is_protected_kvm_enabled() || WARN_ON_ONCE(preemptible())) if (skip_trbe_access(false))
return; return;
if (has_vhe()) { if (has_vhe()) {

View File

@ -1185,6 +1185,7 @@ static const struct encoding_to_trap_config encoding_to_cgt[] __initconst = {
SR_TRAP(SYS_PMSIRR_EL1, CGT_MDCR_TPMS), SR_TRAP(SYS_PMSIRR_EL1, CGT_MDCR_TPMS),
SR_TRAP(SYS_PMSLATFR_EL1, CGT_MDCR_TPMS), SR_TRAP(SYS_PMSLATFR_EL1, CGT_MDCR_TPMS),
SR_TRAP(SYS_PMSNEVFR_EL1, CGT_MDCR_TPMS), SR_TRAP(SYS_PMSNEVFR_EL1, CGT_MDCR_TPMS),
SR_TRAP(SYS_PMSDSFR_EL1, CGT_MDCR_TPMS),
SR_TRAP(SYS_TRFCR_EL1, CGT_MDCR_TTRF), SR_TRAP(SYS_TRFCR_EL1, CGT_MDCR_TTRF),
SR_TRAP(SYS_TRBBASER_EL1, CGT_MDCR_E2TB), SR_TRAP(SYS_TRBBASER_EL1, CGT_MDCR_E2TB),
SR_TRAP(SYS_TRBLIMITR_EL1, CGT_MDCR_E2TB), SR_TRAP(SYS_TRBLIMITR_EL1, CGT_MDCR_E2TB),

View File

@ -559,6 +559,9 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
/* Dump the nVHE hypervisor backtrace */ /* Dump the nVHE hypervisor backtrace */
kvm_nvhe_dump_backtrace(hyp_offset); kvm_nvhe_dump_backtrace(hyp_offset);
/* Dump the faulting instruction */
dump_kernel_instr(panic_addr + kaslr_offset());
/* /*
* Hyp has panicked and we're going to handle that by panicking the * Hyp has panicked and we're going to handle that by panicking the
* kernel. The kernel offset will be revealed in the panic so we're * kernel. The kernel offset will be revealed in the panic so we're

View File

@ -29,7 +29,7 @@ struct pkvm_hyp_vcpu {
}; };
/* /*
* Holds the relevant data for running a protected vm. * Holds the relevant data for running a vm in protected mode.
*/ */
struct pkvm_hyp_vm { struct pkvm_hyp_vm {
struct kvm kvm; struct kvm kvm;
@ -67,6 +67,8 @@ static inline bool pkvm_hyp_vm_is_protected(struct pkvm_hyp_vm *hyp_vm)
void pkvm_hyp_vm_table_init(void *tbl); void pkvm_hyp_vm_table_init(void *tbl);
int __pkvm_reserve_vm(void);
void __pkvm_unreserve_vm(pkvm_handle_t handle);
int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva, int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
unsigned long pgd_hva); unsigned long pgd_hva);
int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu, int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,

View File

@ -12,7 +12,8 @@
#include <asm/kvm_host.h> #include <asm/kvm_host.h>
#define cpu_reg(ctxt, r) (ctxt)->regs.regs[r] #define cpu_reg(ctxt, r) (ctxt)->regs.regs[r]
#define DECLARE_REG(type, name, ctxt, reg) \ #define DECLARE_REG(type, name, ctxt, reg) \
__always_unused int ___check_reg_ ## reg; \
type name = (type)cpu_reg(ctxt, (reg)) type name = (type)cpu_reg(ctxt, (reg))
#endif /* __ARM64_KVM_NVHE_TRAP_HANDLER_H__ */ #endif /* __ARM64_KVM_NVHE_TRAP_HANDLER_H__ */

View File

@ -27,6 +27,7 @@ hyp-obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o
cache.o setup.o mm.o mem_protect.o sys_regs.o pkvm.o stacktrace.o ffa.o cache.o setup.o mm.o mem_protect.o sys_regs.o pkvm.o stacktrace.o ffa.o
hyp-obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \ hyp-obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
hyp-obj-y += ../../../kernel/smccc-call.o
hyp-obj-$(CONFIG_LIST_HARDENED) += list_debug.o hyp-obj-$(CONFIG_LIST_HARDENED) += list_debug.o
hyp-obj-y += $(lib-objs) hyp-obj-y += $(lib-objs)

View File

@ -71,36 +71,68 @@ static u32 hyp_ffa_version;
static bool has_version_negotiated; static bool has_version_negotiated;
static hyp_spinlock_t version_lock; static hyp_spinlock_t version_lock;
static void ffa_to_smccc_error(struct arm_smccc_res *res, u64 ffa_errno) static void ffa_to_smccc_error(struct arm_smccc_1_2_regs *res, u64 ffa_errno)
{ {
*res = (struct arm_smccc_res) { *res = (struct arm_smccc_1_2_regs) {
.a0 = FFA_ERROR, .a0 = FFA_ERROR,
.a2 = ffa_errno, .a2 = ffa_errno,
}; };
} }
static void ffa_to_smccc_res_prop(struct arm_smccc_res *res, int ret, u64 prop) static void ffa_to_smccc_res_prop(struct arm_smccc_1_2_regs *res, int ret, u64 prop)
{ {
if (ret == FFA_RET_SUCCESS) { if (ret == FFA_RET_SUCCESS) {
*res = (struct arm_smccc_res) { .a0 = FFA_SUCCESS, *res = (struct arm_smccc_1_2_regs) { .a0 = FFA_SUCCESS,
.a2 = prop }; .a2 = prop };
} else { } else {
ffa_to_smccc_error(res, ret); ffa_to_smccc_error(res, ret);
} }
} }
static void ffa_to_smccc_res(struct arm_smccc_res *res, int ret) static void ffa_to_smccc_res(struct arm_smccc_1_2_regs *res, int ret)
{ {
ffa_to_smccc_res_prop(res, ret, 0); ffa_to_smccc_res_prop(res, ret, 0);
} }
static void ffa_set_retval(struct kvm_cpu_context *ctxt, static void ffa_set_retval(struct kvm_cpu_context *ctxt,
struct arm_smccc_res *res) struct arm_smccc_1_2_regs *res)
{ {
cpu_reg(ctxt, 0) = res->a0; cpu_reg(ctxt, 0) = res->a0;
cpu_reg(ctxt, 1) = res->a1; cpu_reg(ctxt, 1) = res->a1;
cpu_reg(ctxt, 2) = res->a2; cpu_reg(ctxt, 2) = res->a2;
cpu_reg(ctxt, 3) = res->a3; cpu_reg(ctxt, 3) = res->a3;
cpu_reg(ctxt, 4) = res->a4;
cpu_reg(ctxt, 5) = res->a5;
cpu_reg(ctxt, 6) = res->a6;
cpu_reg(ctxt, 7) = res->a7;
/*
* DEN0028C 2.6: SMC32/HVC32 call from aarch64 must preserve x8-x30.
*
* In FF-A 1.2, we cannot rely on the function ID sent by the caller to
* detect 32-bit calls because the CPU cycle management interfaces (e.g.
* FFA_MSG_WAIT, FFA_RUN) are 32-bit only but can have 64-bit responses.
*
* FFA-1.3 introduces 64-bit variants of the CPU cycle management
* interfaces. Moreover, FF-A 1.3 clarifies that SMC32 direct requests
* complete with SMC32 direct reponses which *should* allow us use the
* function ID sent by the caller to determine whether to return x8-x17.
*
* Note that we also cannot rely on function IDs in the response.
*
* Given the above, assume SMC64 and send back x0-x17 unconditionally
* as the passthrough code (__kvm_hyp_host_forward_smc) does the same.
*/
cpu_reg(ctxt, 8) = res->a8;
cpu_reg(ctxt, 9) = res->a9;
cpu_reg(ctxt, 10) = res->a10;
cpu_reg(ctxt, 11) = res->a11;
cpu_reg(ctxt, 12) = res->a12;
cpu_reg(ctxt, 13) = res->a13;
cpu_reg(ctxt, 14) = res->a14;
cpu_reg(ctxt, 15) = res->a15;
cpu_reg(ctxt, 16) = res->a16;
cpu_reg(ctxt, 17) = res->a17;
} }
static bool is_ffa_call(u64 func_id) static bool is_ffa_call(u64 func_id)
@ -113,82 +145,92 @@ static bool is_ffa_call(u64 func_id)
static int ffa_map_hyp_buffers(u64 ffa_page_count) static int ffa_map_hyp_buffers(u64 ffa_page_count)
{ {
struct arm_smccc_res res; struct arm_smccc_1_2_regs res;
arm_smccc_1_1_smc(FFA_FN64_RXTX_MAP, arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) {
hyp_virt_to_phys(hyp_buffers.tx), .a0 = FFA_FN64_RXTX_MAP,
hyp_virt_to_phys(hyp_buffers.rx), .a1 = hyp_virt_to_phys(hyp_buffers.tx),
ffa_page_count, .a2 = hyp_virt_to_phys(hyp_buffers.rx),
0, 0, 0, 0, .a3 = ffa_page_count,
&res); }, &res);
return res.a0 == FFA_SUCCESS ? FFA_RET_SUCCESS : res.a2; return res.a0 == FFA_SUCCESS ? FFA_RET_SUCCESS : res.a2;
} }
static int ffa_unmap_hyp_buffers(void) static int ffa_unmap_hyp_buffers(void)
{ {
struct arm_smccc_res res; struct arm_smccc_1_2_regs res;
arm_smccc_1_1_smc(FFA_RXTX_UNMAP, arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) {
HOST_FFA_ID, .a0 = FFA_RXTX_UNMAP,
0, 0, 0, 0, 0, 0, .a1 = HOST_FFA_ID,
&res); }, &res);
return res.a0 == FFA_SUCCESS ? FFA_RET_SUCCESS : res.a2; return res.a0 == FFA_SUCCESS ? FFA_RET_SUCCESS : res.a2;
} }
static void ffa_mem_frag_tx(struct arm_smccc_res *res, u32 handle_lo, static void ffa_mem_frag_tx(struct arm_smccc_1_2_regs *res, u32 handle_lo,
u32 handle_hi, u32 fraglen, u32 endpoint_id) u32 handle_hi, u32 fraglen, u32 endpoint_id)
{ {
arm_smccc_1_1_smc(FFA_MEM_FRAG_TX, arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) {
handle_lo, handle_hi, fraglen, endpoint_id, .a0 = FFA_MEM_FRAG_TX,
0, 0, 0, .a1 = handle_lo,
res); .a2 = handle_hi,
.a3 = fraglen,
.a4 = endpoint_id,
}, res);
} }
static void ffa_mem_frag_rx(struct arm_smccc_res *res, u32 handle_lo, static void ffa_mem_frag_rx(struct arm_smccc_1_2_regs *res, u32 handle_lo,
u32 handle_hi, u32 fragoff) u32 handle_hi, u32 fragoff)
{ {
arm_smccc_1_1_smc(FFA_MEM_FRAG_RX, arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) {
handle_lo, handle_hi, fragoff, HOST_FFA_ID, .a0 = FFA_MEM_FRAG_RX,
0, 0, 0, .a1 = handle_lo,
res); .a2 = handle_hi,
.a3 = fragoff,
.a4 = HOST_FFA_ID,
}, res);
} }
static void ffa_mem_xfer(struct arm_smccc_res *res, u64 func_id, u32 len, static void ffa_mem_xfer(struct arm_smccc_1_2_regs *res, u64 func_id, u32 len,
u32 fraglen) u32 fraglen)
{ {
arm_smccc_1_1_smc(func_id, len, fraglen, arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) {
0, 0, 0, 0, 0, .a0 = func_id,
res); .a1 = len,
.a2 = fraglen,
}, res);
} }
static void ffa_mem_reclaim(struct arm_smccc_res *res, u32 handle_lo, static void ffa_mem_reclaim(struct arm_smccc_1_2_regs *res, u32 handle_lo,
u32 handle_hi, u32 flags) u32 handle_hi, u32 flags)
{ {
arm_smccc_1_1_smc(FFA_MEM_RECLAIM, arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) {
handle_lo, handle_hi, flags, .a0 = FFA_MEM_RECLAIM,
0, 0, 0, 0, .a1 = handle_lo,
res); .a2 = handle_hi,
.a3 = flags,
}, res);
} }
static void ffa_retrieve_req(struct arm_smccc_res *res, u32 len) static void ffa_retrieve_req(struct arm_smccc_1_2_regs *res, u32 len)
{ {
arm_smccc_1_1_smc(FFA_FN64_MEM_RETRIEVE_REQ, arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) {
len, len, .a0 = FFA_FN64_MEM_RETRIEVE_REQ,
0, 0, 0, 0, 0, .a1 = len,
res); .a2 = len,
}, res);
} }
static void ffa_rx_release(struct arm_smccc_res *res) static void ffa_rx_release(struct arm_smccc_1_2_regs *res)
{ {
arm_smccc_1_1_smc(FFA_RX_RELEASE, arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) {
0, 0, .a0 = FFA_RX_RELEASE,
0, 0, 0, 0, 0, }, res);
res);
} }
static void do_ffa_rxtx_map(struct arm_smccc_res *res, static void do_ffa_rxtx_map(struct arm_smccc_1_2_regs *res,
struct kvm_cpu_context *ctxt) struct kvm_cpu_context *ctxt)
{ {
DECLARE_REG(phys_addr_t, tx, ctxt, 1); DECLARE_REG(phys_addr_t, tx, ctxt, 1);
@ -267,7 +309,7 @@ err_unmap:
goto out_unlock; goto out_unlock;
} }
static void do_ffa_rxtx_unmap(struct arm_smccc_res *res, static void do_ffa_rxtx_unmap(struct arm_smccc_1_2_regs *res,
struct kvm_cpu_context *ctxt) struct kvm_cpu_context *ctxt)
{ {
DECLARE_REG(u32, id, ctxt, 1); DECLARE_REG(u32, id, ctxt, 1);
@ -368,7 +410,7 @@ static int ffa_host_unshare_ranges(struct ffa_mem_region_addr_range *ranges,
return ret; return ret;
} }
static void do_ffa_mem_frag_tx(struct arm_smccc_res *res, static void do_ffa_mem_frag_tx(struct arm_smccc_1_2_regs *res,
struct kvm_cpu_context *ctxt) struct kvm_cpu_context *ctxt)
{ {
DECLARE_REG(u32, handle_lo, ctxt, 1); DECLARE_REG(u32, handle_lo, ctxt, 1);
@ -427,7 +469,7 @@ out:
} }
static void __do_ffa_mem_xfer(const u64 func_id, static void __do_ffa_mem_xfer(const u64 func_id,
struct arm_smccc_res *res, struct arm_smccc_1_2_regs *res,
struct kvm_cpu_context *ctxt) struct kvm_cpu_context *ctxt)
{ {
DECLARE_REG(u32, len, ctxt, 1); DECLARE_REG(u32, len, ctxt, 1);
@ -521,7 +563,7 @@ err_unshare:
__do_ffa_mem_xfer((fid), (res), (ctxt)); \ __do_ffa_mem_xfer((fid), (res), (ctxt)); \
} while (0); } while (0);
static void do_ffa_mem_reclaim(struct arm_smccc_res *res, static void do_ffa_mem_reclaim(struct arm_smccc_1_2_regs *res,
struct kvm_cpu_context *ctxt) struct kvm_cpu_context *ctxt)
{ {
DECLARE_REG(u32, handle_lo, ctxt, 1); DECLARE_REG(u32, handle_lo, ctxt, 1);
@ -628,13 +670,26 @@ static bool ffa_call_supported(u64 func_id)
case FFA_RXTX_MAP: case FFA_RXTX_MAP:
case FFA_MEM_DONATE: case FFA_MEM_DONATE:
case FFA_MEM_RETRIEVE_REQ: case FFA_MEM_RETRIEVE_REQ:
/* Optional notification interfaces added in FF-A 1.1 */
case FFA_NOTIFICATION_BITMAP_CREATE:
case FFA_NOTIFICATION_BITMAP_DESTROY:
case FFA_NOTIFICATION_BIND:
case FFA_NOTIFICATION_UNBIND:
case FFA_NOTIFICATION_SET:
case FFA_NOTIFICATION_GET:
case FFA_NOTIFICATION_INFO_GET:
/* Optional interfaces added in FF-A 1.2 */
case FFA_MSG_SEND_DIRECT_REQ2: /* Optional per 7.5.1 */
case FFA_MSG_SEND_DIRECT_RESP2: /* Optional per 7.5.1 */
case FFA_CONSOLE_LOG: /* Optional per 13.1: not in Table 13.1 */
case FFA_PARTITION_INFO_GET_REGS: /* Optional for virtual instances per 13.1 */
return false; return false;
} }
return true; return true;
} }
static bool do_ffa_features(struct arm_smccc_res *res, static bool do_ffa_features(struct arm_smccc_1_2_regs *res,
struct kvm_cpu_context *ctxt) struct kvm_cpu_context *ctxt)
{ {
DECLARE_REG(u32, id, ctxt, 1); DECLARE_REG(u32, id, ctxt, 1);
@ -666,21 +721,25 @@ out_handled:
static int hyp_ffa_post_init(void) static int hyp_ffa_post_init(void)
{ {
size_t min_rxtx_sz; size_t min_rxtx_sz;
struct arm_smccc_res res; struct arm_smccc_1_2_regs res;
arm_smccc_1_1_smc(FFA_ID_GET, 0, 0, 0, 0, 0, 0, 0, &res); arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs){
.a0 = FFA_ID_GET,
}, &res);
if (res.a0 != FFA_SUCCESS) if (res.a0 != FFA_SUCCESS)
return -EOPNOTSUPP; return -EOPNOTSUPP;
if (res.a2 != HOST_FFA_ID) if (res.a2 != HOST_FFA_ID)
return -EINVAL; return -EINVAL;
arm_smccc_1_1_smc(FFA_FEATURES, FFA_FN64_RXTX_MAP, arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs){
0, 0, 0, 0, 0, 0, &res); .a0 = FFA_FEATURES,
.a1 = FFA_FN64_RXTX_MAP,
}, &res);
if (res.a0 != FFA_SUCCESS) if (res.a0 != FFA_SUCCESS)
return -EOPNOTSUPP; return -EOPNOTSUPP;
switch (res.a2) { switch (res.a2 & FFA_FEAT_RXTX_MIN_SZ_MASK) {
case FFA_FEAT_RXTX_MIN_SZ_4K: case FFA_FEAT_RXTX_MIN_SZ_4K:
min_rxtx_sz = SZ_4K; min_rxtx_sz = SZ_4K;
break; break;
@ -700,7 +759,7 @@ static int hyp_ffa_post_init(void)
return 0; return 0;
} }
static void do_ffa_version(struct arm_smccc_res *res, static void do_ffa_version(struct arm_smccc_1_2_regs *res,
struct kvm_cpu_context *ctxt) struct kvm_cpu_context *ctxt)
{ {
DECLARE_REG(u32, ffa_req_version, ctxt, 1); DECLARE_REG(u32, ffa_req_version, ctxt, 1);
@ -712,7 +771,10 @@ static void do_ffa_version(struct arm_smccc_res *res,
hyp_spin_lock(&version_lock); hyp_spin_lock(&version_lock);
if (has_version_negotiated) { if (has_version_negotiated) {
res->a0 = hyp_ffa_version; if (FFA_MINOR_VERSION(ffa_req_version) < FFA_MINOR_VERSION(hyp_ffa_version))
res->a0 = FFA_RET_NOT_SUPPORTED;
else
res->a0 = hyp_ffa_version;
goto unlock; goto unlock;
} }
@ -721,9 +783,10 @@ static void do_ffa_version(struct arm_smccc_res *res,
* first if TEE supports it. * first if TEE supports it.
*/ */
if (FFA_MINOR_VERSION(ffa_req_version) < FFA_MINOR_VERSION(hyp_ffa_version)) { if (FFA_MINOR_VERSION(ffa_req_version) < FFA_MINOR_VERSION(hyp_ffa_version)) {
arm_smccc_1_1_smc(FFA_VERSION, ffa_req_version, 0, arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) {
0, 0, 0, 0, 0, .a0 = FFA_VERSION,
res); .a1 = ffa_req_version,
}, res);
if (res->a0 == FFA_RET_NOT_SUPPORTED) if (res->a0 == FFA_RET_NOT_SUPPORTED)
goto unlock; goto unlock;
@ -740,7 +803,7 @@ unlock:
hyp_spin_unlock(&version_lock); hyp_spin_unlock(&version_lock);
} }
static void do_ffa_part_get(struct arm_smccc_res *res, static void do_ffa_part_get(struct arm_smccc_1_2_regs *res,
struct kvm_cpu_context *ctxt) struct kvm_cpu_context *ctxt)
{ {
DECLARE_REG(u32, uuid0, ctxt, 1); DECLARE_REG(u32, uuid0, ctxt, 1);
@ -756,9 +819,14 @@ static void do_ffa_part_get(struct arm_smccc_res *res,
goto out_unlock; goto out_unlock;
} }
arm_smccc_1_1_smc(FFA_PARTITION_INFO_GET, uuid0, uuid1, arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) {
uuid2, uuid3, flags, 0, 0, .a0 = FFA_PARTITION_INFO_GET,
res); .a1 = uuid0,
.a2 = uuid1,
.a3 = uuid2,
.a4 = uuid3,
.a5 = flags,
}, res);
if (res->a0 != FFA_SUCCESS) if (res->a0 != FFA_SUCCESS)
goto out_unlock; goto out_unlock;
@ -791,7 +859,7 @@ out_unlock:
bool kvm_host_ffa_handler(struct kvm_cpu_context *host_ctxt, u32 func_id) bool kvm_host_ffa_handler(struct kvm_cpu_context *host_ctxt, u32 func_id)
{ {
struct arm_smccc_res res; struct arm_smccc_1_2_regs res;
/* /*
* There's no way we can tell what a non-standard SMC call might * There's no way we can tell what a non-standard SMC call might
@ -860,13 +928,16 @@ out_handled:
int hyp_ffa_init(void *pages) int hyp_ffa_init(void *pages)
{ {
struct arm_smccc_res res; struct arm_smccc_1_2_regs res;
void *tx, *rx; void *tx, *rx;
if (kvm_host_psci_config.smccc_version < ARM_SMCCC_VERSION_1_2) if (kvm_host_psci_config.smccc_version < ARM_SMCCC_VERSION_1_2)
return 0; return 0;
arm_smccc_1_1_smc(FFA_VERSION, FFA_VERSION_1_1, 0, 0, 0, 0, 0, 0, &res); arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) {
.a0 = FFA_VERSION,
.a1 = FFA_VERSION_1_2,
}, &res);
if (res.a0 == FFA_RET_NOT_SUPPORTED) if (res.a0 == FFA_RET_NOT_SUPPORTED)
return 0; return 0;
@ -886,10 +957,10 @@ int hyp_ffa_init(void *pages)
if (FFA_MAJOR_VERSION(res.a0) != 1) if (FFA_MAJOR_VERSION(res.a0) != 1)
return -EOPNOTSUPP; return -EOPNOTSUPP;
if (FFA_MINOR_VERSION(res.a0) < FFA_MINOR_VERSION(FFA_VERSION_1_1)) if (FFA_MINOR_VERSION(res.a0) < FFA_MINOR_VERSION(FFA_VERSION_1_2))
hyp_ffa_version = res.a0; hyp_ffa_version = res.a0;
else else
hyp_ffa_version = FFA_VERSION_1_1; hyp_ffa_version = FFA_VERSION_1_2;
tx = pages; tx = pages;
pages += KVM_FFA_MBOX_NR_PAGES * PAGE_SIZE; pages += KVM_FFA_MBOX_NR_PAGES * PAGE_SIZE;

View File

@ -546,6 +546,18 @@ static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = __pkvm_prot_finalize(); cpu_reg(host_ctxt, 1) = __pkvm_prot_finalize();
} }
static void handle___pkvm_reserve_vm(struct kvm_cpu_context *host_ctxt)
{
cpu_reg(host_ctxt, 1) = __pkvm_reserve_vm();
}
static void handle___pkvm_unreserve_vm(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
__pkvm_unreserve_vm(handle);
}
static void handle___pkvm_init_vm(struct kvm_cpu_context *host_ctxt) static void handle___pkvm_init_vm(struct kvm_cpu_context *host_ctxt)
{ {
DECLARE_REG(struct kvm *, host_kvm, host_ctxt, 1); DECLARE_REG(struct kvm *, host_kvm, host_ctxt, 1);
@ -606,6 +618,8 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__kvm_timer_set_cntvoff), HANDLE_FUNC(__kvm_timer_set_cntvoff),
HANDLE_FUNC(__vgic_v3_save_vmcr_aprs), HANDLE_FUNC(__vgic_v3_save_vmcr_aprs),
HANDLE_FUNC(__vgic_v3_restore_vmcr_aprs), HANDLE_FUNC(__vgic_v3_restore_vmcr_aprs),
HANDLE_FUNC(__pkvm_reserve_vm),
HANDLE_FUNC(__pkvm_unreserve_vm),
HANDLE_FUNC(__pkvm_init_vm), HANDLE_FUNC(__pkvm_init_vm),
HANDLE_FUNC(__pkvm_init_vcpu), HANDLE_FUNC(__pkvm_init_vcpu),
HANDLE_FUNC(__pkvm_teardown_vm), HANDLE_FUNC(__pkvm_teardown_vm),

View File

@ -1010,9 +1010,12 @@ static int __check_host_shared_guest(struct pkvm_hyp_vm *vm, u64 *__phys, u64 ip
return ret; return ret;
if (!kvm_pte_valid(pte)) if (!kvm_pte_valid(pte))
return -ENOENT; return -ENOENT;
if (kvm_granule_size(level) != size) if (size && kvm_granule_size(level) != size)
return -E2BIG; return -E2BIG;
if (!size)
size = kvm_granule_size(level);
state = guest_get_page_state(pte, ipa); state = guest_get_page_state(pte, ipa);
if (state != PKVM_PAGE_SHARED_BORROWED) if (state != PKVM_PAGE_SHARED_BORROWED)
return -EPERM; return -EPERM;
@ -1100,7 +1103,7 @@ int __pkvm_host_relax_perms_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_
if (prot & ~KVM_PGTABLE_PROT_RWX) if (prot & ~KVM_PGTABLE_PROT_RWX)
return -EINVAL; return -EINVAL;
assert_host_shared_guest(vm, ipa, PAGE_SIZE); assert_host_shared_guest(vm, ipa, 0);
guest_lock_component(vm); guest_lock_component(vm);
ret = kvm_pgtable_stage2_relax_perms(&vm->pgt, ipa, prot, 0); ret = kvm_pgtable_stage2_relax_perms(&vm->pgt, ipa, prot, 0);
guest_unlock_component(vm); guest_unlock_component(vm);
@ -1156,7 +1159,7 @@ int __pkvm_host_mkyoung_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu)
if (pkvm_hyp_vm_is_protected(vm)) if (pkvm_hyp_vm_is_protected(vm))
return -EPERM; return -EPERM;
assert_host_shared_guest(vm, ipa, PAGE_SIZE); assert_host_shared_guest(vm, ipa, 0);
guest_lock_component(vm); guest_lock_component(vm);
kvm_pgtable_stage2_mkyoung(&vm->pgt, ipa, 0); kvm_pgtable_stage2_mkyoung(&vm->pgt, ipa, 0);
guest_unlock_component(vm); guest_unlock_component(vm);

View File

@ -23,8 +23,8 @@ unsigned int kvm_arm_vmid_bits;
unsigned int kvm_host_sve_max_vl; unsigned int kvm_host_sve_max_vl;
/* /*
* The currently loaded hyp vCPU for each physical CPU. Used only when * The currently loaded hyp vCPU for each physical CPU. Used in protected mode
* protected KVM is enabled, but for both protected and non-protected VMs. * for both protected and non-protected VMs.
*/ */
static DEFINE_PER_CPU(struct pkvm_hyp_vcpu *, loaded_hyp_vcpu); static DEFINE_PER_CPU(struct pkvm_hyp_vcpu *, loaded_hyp_vcpu);
@ -135,7 +135,7 @@ static int pkvm_check_pvm_cpu_features(struct kvm_vcpu *vcpu)
{ {
struct kvm *kvm = vcpu->kvm; struct kvm *kvm = vcpu->kvm;
/* Protected KVM does not support AArch32 guests. */ /* No AArch32 support for protected guests. */
if (kvm_has_feat(kvm, ID_AA64PFR0_EL1, EL0, AARCH32) || if (kvm_has_feat(kvm, ID_AA64PFR0_EL1, EL0, AARCH32) ||
kvm_has_feat(kvm, ID_AA64PFR0_EL1, EL1, AARCH32)) kvm_has_feat(kvm, ID_AA64PFR0_EL1, EL1, AARCH32))
return -EINVAL; return -EINVAL;
@ -192,6 +192,11 @@ static int pkvm_vcpu_init_traps(struct pkvm_hyp_vcpu *hyp_vcpu)
*/ */
#define HANDLE_OFFSET 0x1000 #define HANDLE_OFFSET 0x1000
/*
* Marks a reserved but not yet used entry in the VM table.
*/
#define RESERVED_ENTRY ((void *)0xa110ca7ed)
static unsigned int vm_handle_to_idx(pkvm_handle_t handle) static unsigned int vm_handle_to_idx(pkvm_handle_t handle)
{ {
return handle - HANDLE_OFFSET; return handle - HANDLE_OFFSET;
@ -210,8 +215,8 @@ static pkvm_handle_t idx_to_vm_handle(unsigned int idx)
DEFINE_HYP_SPINLOCK(vm_table_lock); DEFINE_HYP_SPINLOCK(vm_table_lock);
/* /*
* The table of VM entries for protected VMs in hyp. * A table that tracks all VMs in protected mode.
* Allocated at hyp initialization and setup. * Allocated during hyp initialization and setup.
*/ */
static struct pkvm_hyp_vm **vm_table; static struct pkvm_hyp_vm **vm_table;
@ -231,6 +236,10 @@ static struct pkvm_hyp_vm *get_vm_by_handle(pkvm_handle_t handle)
if (unlikely(idx >= KVM_MAX_PVMS)) if (unlikely(idx >= KVM_MAX_PVMS))
return NULL; return NULL;
/* A reserved entry doesn't represent an initialized VM. */
if (unlikely(vm_table[idx] == RESERVED_ENTRY))
return NULL;
return vm_table[idx]; return vm_table[idx];
} }
@ -401,14 +410,26 @@ static void unpin_host_vcpus(struct pkvm_hyp_vcpu *hyp_vcpus[],
} }
static void init_pkvm_hyp_vm(struct kvm *host_kvm, struct pkvm_hyp_vm *hyp_vm, static void init_pkvm_hyp_vm(struct kvm *host_kvm, struct pkvm_hyp_vm *hyp_vm,
unsigned int nr_vcpus) unsigned int nr_vcpus, pkvm_handle_t handle)
{ {
struct kvm_s2_mmu *mmu = &hyp_vm->kvm.arch.mmu;
int idx = vm_handle_to_idx(handle);
hyp_vm->kvm.arch.pkvm.handle = handle;
hyp_vm->host_kvm = host_kvm; hyp_vm->host_kvm = host_kvm;
hyp_vm->kvm.created_vcpus = nr_vcpus; hyp_vm->kvm.created_vcpus = nr_vcpus;
hyp_vm->kvm.arch.mmu.vtcr = host_mmu.arch.mmu.vtcr; hyp_vm->kvm.arch.pkvm.is_protected = READ_ONCE(host_kvm->arch.pkvm.is_protected);
hyp_vm->kvm.arch.pkvm.enabled = READ_ONCE(host_kvm->arch.pkvm.enabled); hyp_vm->kvm.arch.pkvm.is_created = true;
hyp_vm->kvm.arch.flags = 0; hyp_vm->kvm.arch.flags = 0;
pkvm_init_features_from_host(hyp_vm, host_kvm); pkvm_init_features_from_host(hyp_vm, host_kvm);
/* VMID 0 is reserved for the host */
atomic64_set(&mmu->vmid.id, idx + 1);
mmu->vtcr = host_mmu.arch.mmu.vtcr;
mmu->arch = &hyp_vm->kvm.arch;
mmu->pgt = &hyp_vm->pgt;
} }
static int pkvm_vcpu_init_sve(struct pkvm_hyp_vcpu *hyp_vcpu, struct kvm_vcpu *host_vcpu) static int pkvm_vcpu_init_sve(struct pkvm_hyp_vcpu *hyp_vcpu, struct kvm_vcpu *host_vcpu)
@ -480,7 +501,7 @@ done:
return ret; return ret;
} }
static int find_free_vm_table_entry(struct kvm *host_kvm) static int find_free_vm_table_entry(void)
{ {
int i; int i;
@ -493,15 +514,13 @@ static int find_free_vm_table_entry(struct kvm *host_kvm)
} }
/* /*
* Allocate a VM table entry and insert a pointer to the new vm. * Reserve a VM table entry.
* *
* Return a unique handle to the protected VM on success, * Return a unique handle to the VM on success,
* negative error code on failure. * negative error code on failure.
*/ */
static pkvm_handle_t insert_vm_table_entry(struct kvm *host_kvm, static int allocate_vm_table_entry(void)
struct pkvm_hyp_vm *hyp_vm)
{ {
struct kvm_s2_mmu *mmu = &hyp_vm->kvm.arch.mmu;
int idx; int idx;
hyp_assert_lock_held(&vm_table_lock); hyp_assert_lock_held(&vm_table_lock);
@ -514,20 +533,57 @@ static pkvm_handle_t insert_vm_table_entry(struct kvm *host_kvm,
if (unlikely(!vm_table)) if (unlikely(!vm_table))
return -EINVAL; return -EINVAL;
idx = find_free_vm_table_entry(host_kvm); idx = find_free_vm_table_entry();
if (idx < 0) if (unlikely(idx < 0))
return idx; return idx;
hyp_vm->kvm.arch.pkvm.handle = idx_to_vm_handle(idx); vm_table[idx] = RESERVED_ENTRY;
/* VMID 0 is reserved for the host */ return idx;
atomic64_set(&mmu->vmid.id, idx + 1); }
mmu->arch = &hyp_vm->kvm.arch; static int __insert_vm_table_entry(pkvm_handle_t handle,
mmu->pgt = &hyp_vm->pgt; struct pkvm_hyp_vm *hyp_vm)
{
unsigned int idx;
hyp_assert_lock_held(&vm_table_lock);
/*
* Initializing protected state might have failed, yet a malicious
* host could trigger this function. Thus, ensure that 'vm_table'
* exists.
*/
if (unlikely(!vm_table))
return -EINVAL;
idx = vm_handle_to_idx(handle);
if (unlikely(idx >= KVM_MAX_PVMS))
return -EINVAL;
if (unlikely(vm_table[idx] != RESERVED_ENTRY))
return -EINVAL;
vm_table[idx] = hyp_vm; vm_table[idx] = hyp_vm;
return hyp_vm->kvm.arch.pkvm.handle;
return 0;
}
/*
* Insert a pointer to the initialized VM into the VM table.
*
* Return 0 on success, or negative error code on failure.
*/
static int insert_vm_table_entry(pkvm_handle_t handle,
struct pkvm_hyp_vm *hyp_vm)
{
int ret;
hyp_spin_lock(&vm_table_lock);
ret = __insert_vm_table_entry(handle, hyp_vm);
hyp_spin_unlock(&vm_table_lock);
return ret;
} }
/* /*
@ -594,10 +650,45 @@ static void unmap_donated_memory_noclear(void *va, size_t size)
} }
/* /*
* Initialize the hypervisor copy of the protected VM state using the * Reserves an entry in the hypervisor for a new VM in protected mode.
* memory donated by the host.
* *
* Unmaps the donated memory from the host at stage 2. * Return a unique handle to the VM on success, negative error code on failure.
*/
int __pkvm_reserve_vm(void)
{
int ret;
hyp_spin_lock(&vm_table_lock);
ret = allocate_vm_table_entry();
hyp_spin_unlock(&vm_table_lock);
if (ret < 0)
return ret;
return idx_to_vm_handle(ret);
}
/*
* Removes a reserved entry, but only if is hasn't been used yet.
* Otherwise, the VM needs to be destroyed.
*/
void __pkvm_unreserve_vm(pkvm_handle_t handle)
{
unsigned int idx = vm_handle_to_idx(handle);
if (unlikely(!vm_table))
return;
hyp_spin_lock(&vm_table_lock);
if (likely(idx < KVM_MAX_PVMS && vm_table[idx] == RESERVED_ENTRY))
remove_vm_table_entry(handle);
hyp_spin_unlock(&vm_table_lock);
}
/*
* Initialize the hypervisor copy of the VM state using host-donated memory.
*
* Unmap the donated memory from the host at stage 2.
* *
* host_kvm: A pointer to the host's struct kvm. * host_kvm: A pointer to the host's struct kvm.
* vm_hva: The host va of the area being donated for the VM state. * vm_hva: The host va of the area being donated for the VM state.
@ -606,8 +697,7 @@ static void unmap_donated_memory_noclear(void *va, size_t size)
* the VM. Must be page aligned. Its size is implied by the VM's * the VM. Must be page aligned. Its size is implied by the VM's
* VTCR. * VTCR.
* *
* Return a unique handle to the protected VM on success, * Return 0 success, negative error code on failure.
* negative error code on failure.
*/ */
int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva, int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
unsigned long pgd_hva) unsigned long pgd_hva)
@ -615,6 +705,7 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
struct pkvm_hyp_vm *hyp_vm = NULL; struct pkvm_hyp_vm *hyp_vm = NULL;
size_t vm_size, pgd_size; size_t vm_size, pgd_size;
unsigned int nr_vcpus; unsigned int nr_vcpus;
pkvm_handle_t handle;
void *pgd = NULL; void *pgd = NULL;
int ret; int ret;
@ -628,6 +719,12 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
goto err_unpin_kvm; goto err_unpin_kvm;
} }
handle = READ_ONCE(host_kvm->arch.pkvm.handle);
if (unlikely(handle < HANDLE_OFFSET)) {
ret = -EINVAL;
goto err_unpin_kvm;
}
vm_size = pkvm_get_hyp_vm_size(nr_vcpus); vm_size = pkvm_get_hyp_vm_size(nr_vcpus);
pgd_size = kvm_pgtable_stage2_pgd_size(host_mmu.arch.mmu.vtcr); pgd_size = kvm_pgtable_stage2_pgd_size(host_mmu.arch.mmu.vtcr);
@ -641,24 +738,19 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
if (!pgd) if (!pgd)
goto err_remove_mappings; goto err_remove_mappings;
init_pkvm_hyp_vm(host_kvm, hyp_vm, nr_vcpus); init_pkvm_hyp_vm(host_kvm, hyp_vm, nr_vcpus, handle);
hyp_spin_lock(&vm_table_lock);
ret = insert_vm_table_entry(host_kvm, hyp_vm);
if (ret < 0)
goto err_unlock;
ret = kvm_guest_prepare_stage2(hyp_vm, pgd); ret = kvm_guest_prepare_stage2(hyp_vm, pgd);
if (ret) if (ret)
goto err_remove_vm_table_entry; goto err_remove_mappings;
hyp_spin_unlock(&vm_table_lock);
return hyp_vm->kvm.arch.pkvm.handle; /* Must be called last since this publishes the VM. */
ret = insert_vm_table_entry(handle, hyp_vm);
if (ret)
goto err_remove_mappings;
return 0;
err_remove_vm_table_entry:
remove_vm_table_entry(hyp_vm->kvm.arch.pkvm.handle);
err_unlock:
hyp_spin_unlock(&vm_table_lock);
err_remove_mappings: err_remove_mappings:
unmap_donated_memory(hyp_vm, vm_size); unmap_donated_memory(hyp_vm, vm_size);
unmap_donated_memory(pgd, pgd_size); unmap_donated_memory(pgd, pgd_size);
@ -668,10 +760,9 @@ err_unpin_kvm:
} }
/* /*
* Initialize the hypervisor copy of the protected vCPU state using the * Initialize the hypervisor copy of the vCPU state using host-donated memory.
* memory donated by the host.
* *
* handle: The handle for the protected vm. * handle: The hypervisor handle for the vm.
* host_vcpu: A pointer to the corresponding host vcpu. * host_vcpu: A pointer to the corresponding host vcpu.
* vcpu_hva: The host va of the area being donated for the vcpu state. * vcpu_hva: The host va of the area being donated for the vcpu state.
* Must be page aligned. The size of the area must be equal to * Must be page aligned. The size of the area must be equal to

View File

@ -192,6 +192,7 @@ static int fix_host_ownership_walker(const struct kvm_pgtable_visit_ctx *ctx,
enum pkvm_page_state state; enum pkvm_page_state state;
struct hyp_page *page; struct hyp_page *page;
phys_addr_t phys; phys_addr_t phys;
enum kvm_pgtable_prot prot;
if (!kvm_pte_valid(ctx->old)) if (!kvm_pte_valid(ctx->old))
return 0; return 0;
@ -210,11 +211,18 @@ static int fix_host_ownership_walker(const struct kvm_pgtable_visit_ctx *ctx,
* configured in the hypervisor stage-1, and make sure to propagate them * configured in the hypervisor stage-1, and make sure to propagate them
* to the hyp_vmemmap state. * to the hyp_vmemmap state.
*/ */
state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(ctx->old)); prot = kvm_pgtable_hyp_pte_prot(ctx->old);
state = pkvm_getstate(prot);
switch (state) { switch (state) {
case PKVM_PAGE_OWNED: case PKVM_PAGE_OWNED:
set_hyp_state(page, PKVM_PAGE_OWNED); set_hyp_state(page, PKVM_PAGE_OWNED);
return host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HYP); /* hyp text is RO in the host stage-2 to be inspected on panic. */
if (prot == PAGE_HYP_EXEC) {
set_host_state(page, PKVM_NOPAGE);
return host_stage2_idmap_locked(phys, PAGE_SIZE, KVM_PGTABLE_PROT_R);
} else {
return host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HYP);
}
case PKVM_PAGE_SHARED_OWNED: case PKVM_PAGE_SHARED_OWNED:
set_hyp_state(page, PKVM_PAGE_SHARED_OWNED); set_hyp_state(page, PKVM_PAGE_SHARED_OWNED);
set_host_state(page, PKVM_PAGE_SHARED_BORROWED); set_host_state(page, PKVM_PAGE_SHARED_BORROWED);

View File

@ -295,12 +295,8 @@ void __vgic_v3_activate_traps(struct vgic_v3_cpu_if *cpu_if)
} }
} }
/* /* Only disable SRE if the host implements the GICv2 interface */
* GICv5 BET0 FEAT_GCIE_LEGACY doesn't include ICC_SRE_EL2. This is due if (static_branch_unlikely(&vgic_v3_has_v2_compat)) {
* to be relaxed in a future spec release, at which point this in
* condition can be dropped.
*/
if (!cpus_have_final_cap(ARM64_HAS_GICV5_CPUIF)) {
/* /*
* Prevent the guest from touching the ICC_SRE_EL1 system * Prevent the guest from touching the ICC_SRE_EL1 system
* register. Note that this may not have any effect, as * register. Note that this may not have any effect, as
@ -329,19 +325,16 @@ void __vgic_v3_deactivate_traps(struct vgic_v3_cpu_if *cpu_if)
cpu_if->vgic_vmcr = read_gicreg(ICH_VMCR_EL2); cpu_if->vgic_vmcr = read_gicreg(ICH_VMCR_EL2);
} }
/* /* Only restore SRE if the host implements the GICv2 interface */
* Can be dropped in the future when GICv5 spec is relaxed. See comment if (static_branch_unlikely(&vgic_v3_has_v2_compat)) {
* above.
*/
if (!cpus_have_final_cap(ARM64_HAS_GICV5_CPUIF)) {
val = read_gicreg(ICC_SRE_EL2); val = read_gicreg(ICC_SRE_EL2);
write_gicreg(val | ICC_SRE_EL2_ENABLE, ICC_SRE_EL2); write_gicreg(val | ICC_SRE_EL2_ENABLE, ICC_SRE_EL2);
}
if (!cpu_if->vgic_sre) { if (!cpu_if->vgic_sre) {
/* Make sure ENABLE is set at EL2 before setting SRE at EL1 */ /* Make sure ENABLE is set at EL2 before setting SRE at EL1 */
isb(); isb();
write_gicreg(1, ICC_SRE_EL1); write_gicreg(1, ICC_SRE_EL1);
}
} }
/* /*

View File

@ -95,6 +95,13 @@ static u64 __compute_hcr(struct kvm_vcpu *vcpu)
/* Force NV2 in case the guest is forgetful... */ /* Force NV2 in case the guest is forgetful... */
guest_hcr |= HCR_NV2; guest_hcr |= HCR_NV2;
} }
/*
* Exclude the guest's TWED configuration if it hasn't set TWE
* to avoid potentially delaying traps for the host.
*/
if (!(guest_hcr & HCR_TWE))
guest_hcr &= ~(HCR_EL2_TWEDEn | HCR_EL2_TWEDEL);
} }
BUG_ON(host_data_test_flag(VCPU_IN_HYP_CONTEXT) && BUG_ON(host_data_test_flag(VCPU_IN_HYP_CONTEXT) &&

View File

@ -106,7 +106,30 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
{ {
unsigned long cpsr = *vcpu_cpsr(vcpu); unsigned long cpsr = *vcpu_cpsr(vcpu);
bool is_aarch32 = vcpu_mode_is_32bit(vcpu); bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
u64 esr = 0; u64 esr = 0, fsc;
int level;
/*
* If injecting an abort from a failed S1PTW, rewalk the S1 PTs to
* find the failing level. If we can't find it, assume the error was
* transient and restart without changing the state.
*/
if (kvm_vcpu_abt_iss1tw(vcpu)) {
u64 hpfar = kvm_vcpu_get_fault_ipa(vcpu);
int ret;
if (hpfar == INVALID_GPA)
return;
ret = __kvm_find_s1_desc_level(vcpu, addr, hpfar, &level);
if (ret)
return;
WARN_ON_ONCE(level < -1 || level > 3);
fsc = ESR_ELx_FSC_SEA_TTW(level);
} else {
fsc = ESR_ELx_FSC_EXTABT;
}
/* This delight is brought to you by FEAT_DoubleFault2. */ /* This delight is brought to you by FEAT_DoubleFault2. */
if (effective_sctlr2_ease(vcpu)) if (effective_sctlr2_ease(vcpu))
@ -133,7 +156,7 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
if (!is_iabt) if (!is_iabt)
esr |= ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT; esr |= ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT;
esr |= ESR_ELx_FSC_EXTABT; esr |= fsc;
vcpu_write_sys_reg(vcpu, addr, exception_far_elx(vcpu)); vcpu_write_sys_reg(vcpu, addr, exception_far_elx(vcpu));
vcpu_write_sys_reg(vcpu, esr, exception_esr_elx(vcpu)); vcpu_write_sys_reg(vcpu, esr, exception_esr_elx(vcpu));

View File

@ -1431,11 +1431,8 @@ static int get_vma_page_shift(struct vm_area_struct *vma, unsigned long hva)
* able to see the page's tags and therefore they must be initialised first. If * able to see the page's tags and therefore they must be initialised first. If
* PG_mte_tagged is set, tags have already been initialised. * PG_mte_tagged is set, tags have already been initialised.
* *
* The race in the test/set of the PG_mte_tagged flag is handled by: * Must be called with kvm->mmu_lock held to ensure the memory remains mapped
* - preventing VM_SHARED mappings in a memslot with MTE preventing two VMs * while the tags are zeroed.
* racing to santise the same page
* - mmap_lock protects between a VM faulting a page in and the VMM performing
* an mprotect() to add VM_MTE
*/ */
static void sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn, static void sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
unsigned long size) unsigned long size)
@ -1482,13 +1479,132 @@ static bool kvm_vma_is_cacheable(struct vm_area_struct *vma)
} }
} }
static int prepare_mmu_memcache(struct kvm_vcpu *vcpu, bool topup_memcache,
void **memcache)
{
int min_pages;
if (!is_protected_kvm_enabled())
*memcache = &vcpu->arch.mmu_page_cache;
else
*memcache = &vcpu->arch.pkvm_memcache;
if (!topup_memcache)
return 0;
min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu);
if (!is_protected_kvm_enabled())
return kvm_mmu_topup_memory_cache(*memcache, min_pages);
return topup_hyp_memcache(*memcache, min_pages);
}
/*
* Potentially reduce shadow S2 permissions to match the guest's own S2. For
* exec faults, we'd only reach this point if the guest actually allowed it (see
* kvm_s2_handle_perm_fault).
*
* Also encode the level of the original translation in the SW bits of the leaf
* entry as a proxy for the span of that translation. This will be retrieved on
* TLB invalidation from the guest and used to limit the invalidation scope if a
* TTL hint or a range isn't provided.
*/
static void adjust_nested_fault_perms(struct kvm_s2_trans *nested,
enum kvm_pgtable_prot *prot,
bool *writable)
{
*writable &= kvm_s2_trans_writable(nested);
if (!kvm_s2_trans_readable(nested))
*prot &= ~KVM_PGTABLE_PROT_R;
*prot |= kvm_encode_nested_level(nested);
}
#define KVM_PGTABLE_WALK_MEMABORT_FLAGS (KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED)
static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
struct kvm_s2_trans *nested,
struct kvm_memory_slot *memslot, bool is_perm)
{
bool write_fault, exec_fault, writable;
enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
unsigned long mmu_seq;
struct page *page;
struct kvm *kvm = vcpu->kvm;
void *memcache;
kvm_pfn_t pfn;
gfn_t gfn;
int ret;
ret = prepare_mmu_memcache(vcpu, true, &memcache);
if (ret)
return ret;
if (nested)
gfn = kvm_s2_trans_output(nested) >> PAGE_SHIFT;
else
gfn = fault_ipa >> PAGE_SHIFT;
write_fault = kvm_is_write_fault(vcpu);
exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
VM_WARN_ON_ONCE(write_fault && exec_fault);
mmu_seq = kvm->mmu_invalidate_seq;
/* Pairs with the smp_wmb() in kvm_mmu_invalidate_end(). */
smp_rmb();
ret = kvm_gmem_get_pfn(kvm, memslot, gfn, &pfn, &page, NULL);
if (ret) {
kvm_prepare_memory_fault_exit(vcpu, fault_ipa, PAGE_SIZE,
write_fault, exec_fault, false);
return ret;
}
writable = !(memslot->flags & KVM_MEM_READONLY);
if (nested)
adjust_nested_fault_perms(nested, &prot, &writable);
if (writable)
prot |= KVM_PGTABLE_PROT_W;
if (exec_fault ||
(cpus_have_final_cap(ARM64_HAS_CACHE_DIC) &&
(!nested || kvm_s2_trans_executable(nested))))
prot |= KVM_PGTABLE_PROT_X;
kvm_fault_lock(kvm);
if (mmu_invalidate_retry(kvm, mmu_seq)) {
ret = -EAGAIN;
goto out_unlock;
}
ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault_ipa, PAGE_SIZE,
__pfn_to_phys(pfn), prot,
memcache, flags);
out_unlock:
kvm_release_faultin_page(kvm, page, !!ret, writable);
kvm_fault_unlock(kvm);
if (writable && !ret)
mark_page_dirty_in_slot(kvm, memslot, gfn);
return ret != -EAGAIN ? ret : 0;
}
static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
struct kvm_s2_trans *nested, struct kvm_s2_trans *nested,
struct kvm_memory_slot *memslot, unsigned long hva, struct kvm_memory_slot *memslot, unsigned long hva,
bool fault_is_perm) bool fault_is_perm)
{ {
int ret = 0; int ret = 0;
bool write_fault, writable, force_pte = false; bool topup_memcache;
bool write_fault, writable;
bool exec_fault, mte_allowed, is_vma_cacheable; bool exec_fault, mte_allowed, is_vma_cacheable;
bool s2_force_noncacheable = false, vfio_allow_any_uc = false; bool s2_force_noncacheable = false, vfio_allow_any_uc = false;
unsigned long mmu_seq; unsigned long mmu_seq;
@ -1500,23 +1616,19 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
gfn_t gfn; gfn_t gfn;
kvm_pfn_t pfn; kvm_pfn_t pfn;
bool logging_active = memslot_is_logging(memslot); bool logging_active = memslot_is_logging(memslot);
bool force_pte = logging_active;
long vma_pagesize, fault_granule; long vma_pagesize, fault_granule;
enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R; enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
struct kvm_pgtable *pgt; struct kvm_pgtable *pgt;
struct page *page; struct page *page;
vm_flags_t vm_flags; vm_flags_t vm_flags;
enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED; enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
if (fault_is_perm) if (fault_is_perm)
fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu); fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu);
write_fault = kvm_is_write_fault(vcpu); write_fault = kvm_is_write_fault(vcpu);
exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu); exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
VM_BUG_ON(write_fault && exec_fault); VM_WARN_ON_ONCE(write_fault && exec_fault);
if (!is_protected_kvm_enabled())
memcache = &vcpu->arch.mmu_page_cache;
else
memcache = &vcpu->arch.pkvm_memcache;
/* /*
* Permission faults just need to update the existing leaf entry, * Permission faults just need to update the existing leaf entry,
@ -1524,17 +1636,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
* only exception to this is when dirty logging is enabled at runtime * only exception to this is when dirty logging is enabled at runtime
* and a write fault needs to collapse a block entry into a table. * and a write fault needs to collapse a block entry into a table.
*/ */
if (!fault_is_perm || (logging_active && write_fault)) { topup_memcache = !fault_is_perm || (logging_active && write_fault);
int min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu); ret = prepare_mmu_memcache(vcpu, topup_memcache, &memcache);
if (ret)
if (!is_protected_kvm_enabled()) return ret;
ret = kvm_mmu_topup_memory_cache(memcache, min_pages);
else
ret = topup_hyp_memcache(memcache, min_pages);
if (ret)
return ret;
}
/* /*
* Let's check if we will get back a huge page backed by hugetlbfs, or * Let's check if we will get back a huge page backed by hugetlbfs, or
@ -1548,16 +1653,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
return -EFAULT; return -EFAULT;
} }
/* if (force_pte)
* logging_active is guaranteed to never be true for VM_PFNMAP
* memslots.
*/
if (logging_active) {
force_pte = true;
vma_shift = PAGE_SHIFT; vma_shift = PAGE_SHIFT;
} else { else
vma_shift = get_vma_page_shift(vma, hva); vma_shift = get_vma_page_shift(vma, hva);
}
switch (vma_shift) { switch (vma_shift) {
#ifndef __PAGETABLE_PMD_FOLDED #ifndef __PAGETABLE_PMD_FOLDED
@ -1609,7 +1708,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
max_map_size = PAGE_SIZE; max_map_size = PAGE_SIZE;
force_pte = (max_map_size == PAGE_SIZE); force_pte = (max_map_size == PAGE_SIZE);
vma_pagesize = min(vma_pagesize, (long)max_map_size); vma_pagesize = min_t(long, vma_pagesize, max_map_size);
} }
/* /*
@ -1642,7 +1741,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
* Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
* with the smp_wmb() in kvm_mmu_invalidate_end(). * with the smp_wmb() in kvm_mmu_invalidate_end().
*/ */
mmu_seq = vcpu->kvm->mmu_invalidate_seq; mmu_seq = kvm->mmu_invalidate_seq;
mmap_read_unlock(current->mm); mmap_read_unlock(current->mm);
pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0, pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0,
@ -1673,7 +1772,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
* cache maintenance. * cache maintenance.
*/ */
if (!kvm_supports_cacheable_pfnmap()) if (!kvm_supports_cacheable_pfnmap())
return -EFAULT; ret = -EFAULT;
} else { } else {
/* /*
* If the page was identified as device early by looking at * If the page was identified as device early by looking at
@ -1696,27 +1795,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
} }
if (exec_fault && s2_force_noncacheable) if (exec_fault && s2_force_noncacheable)
return -ENOEXEC; ret = -ENOEXEC;
/* if (ret) {
* Potentially reduce shadow S2 permissions to match the guest's own kvm_release_page_unused(page);
* S2. For exec faults, we'd only reach this point if the guest return ret;
* actually allowed it (see kvm_s2_handle_perm_fault).
*
* Also encode the level of the original translation in the SW bits
* of the leaf entry as a proxy for the span of that translation.
* This will be retrieved on TLB invalidation from the guest and
* used to limit the invalidation scope if a TTL hint or a range
* isn't provided.
*/
if (nested) {
writable &= kvm_s2_trans_writable(nested);
if (!kvm_s2_trans_readable(nested))
prot &= ~KVM_PGTABLE_PROT_R;
prot |= kvm_encode_nested_level(nested);
} }
if (nested)
adjust_nested_fault_perms(nested, &prot, &writable);
kvm_fault_lock(kvm); kvm_fault_lock(kvm);
pgt = vcpu->arch.hw_mmu->pgt; pgt = vcpu->arch.hw_mmu->pgt;
if (mmu_invalidate_retry(kvm, mmu_seq)) { if (mmu_invalidate_retry(kvm, mmu_seq)) {
@ -1985,8 +2073,15 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
goto out_unlock; goto out_unlock;
} }
ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva, VM_WARN_ON_ONCE(kvm_vcpu_trap_is_permission_fault(vcpu) &&
esr_fsc_is_permission_fault(esr)); !write_fault && !kvm_vcpu_trap_is_exec_fault(vcpu));
if (kvm_slot_has_gmem(memslot))
ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
esr_fsc_is_permission_fault(esr));
else
ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
esr_fsc_is_permission_fault(esr));
if (ret == 0) if (ret == 0)
ret = 1; ret = 1;
out: out:
@ -2218,6 +2313,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
if ((new->base_gfn + new->npages) > (kvm_phys_size(&kvm->arch.mmu) >> PAGE_SHIFT)) if ((new->base_gfn + new->npages) > (kvm_phys_size(&kvm->arch.mmu) >> PAGE_SHIFT))
return -EFAULT; return -EFAULT;
/*
* Only support guest_memfd backed memslots with mappable memory, since
* there aren't any CoCo VMs that support only private memory on arm64.
*/
if (kvm_slot_has_gmem(new) && !kvm_memslot_is_gmem_only(new))
return -EINVAL;
hva = new->userspace_addr; hva = new->userspace_addr;
reg_end = hva + (new->npages << PAGE_SHIFT); reg_end = hva + (new->npages << PAGE_SHIFT);

View File

@ -349,7 +349,7 @@ static void vtcr_to_walk_info(u64 vtcr, struct s2_walk_info *wi)
wi->sl = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr); wi->sl = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
/* Global limit for now, should eventually be per-VM */ /* Global limit for now, should eventually be per-VM */
wi->max_oa_bits = min(get_kvm_ipa_limit(), wi->max_oa_bits = min(get_kvm_ipa_limit(),
ps_to_output_size(FIELD_GET(VTCR_EL2_PS_MASK, vtcr))); ps_to_output_size(FIELD_GET(VTCR_EL2_PS_MASK, vtcr), false));
} }
int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa, int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
@ -1172,8 +1172,9 @@ static u64 read_vncr_el2(struct kvm_vcpu *vcpu)
return (u64)sign_extend64(__vcpu_sys_reg(vcpu, VNCR_EL2), 48); return (u64)sign_extend64(__vcpu_sys_reg(vcpu, VNCR_EL2), 48);
} }
static int kvm_translate_vncr(struct kvm_vcpu *vcpu) static int kvm_translate_vncr(struct kvm_vcpu *vcpu, bool *is_gmem)
{ {
struct kvm_memory_slot *memslot;
bool write_fault, writable; bool write_fault, writable;
unsigned long mmu_seq; unsigned long mmu_seq;
struct vncr_tlb *vt; struct vncr_tlb *vt;
@ -1216,10 +1217,25 @@ static int kvm_translate_vncr(struct kvm_vcpu *vcpu)
smp_rmb(); smp_rmb();
gfn = vt->wr.pa >> PAGE_SHIFT; gfn = vt->wr.pa >> PAGE_SHIFT;
pfn = kvm_faultin_pfn(vcpu, gfn, write_fault, &writable, &page); memslot = gfn_to_memslot(vcpu->kvm, gfn);
if (is_error_noslot_pfn(pfn) || (write_fault && !writable)) if (!memslot)
return -EFAULT; return -EFAULT;
*is_gmem = kvm_slot_has_gmem(memslot);
if (!*is_gmem) {
pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0,
&writable, &page);
if (is_error_noslot_pfn(pfn) || (write_fault && !writable))
return -EFAULT;
} else {
ret = kvm_gmem_get_pfn(vcpu->kvm, memslot, gfn, &pfn, &page, NULL);
if (ret) {
kvm_prepare_memory_fault_exit(vcpu, vt->wr.pa, PAGE_SIZE,
write_fault, false, false);
return ret;
}
}
scoped_guard(write_lock, &vcpu->kvm->mmu_lock) { scoped_guard(write_lock, &vcpu->kvm->mmu_lock) {
if (mmu_invalidate_retry(vcpu->kvm, mmu_seq)) if (mmu_invalidate_retry(vcpu->kvm, mmu_seq))
return -EAGAIN; return -EAGAIN;
@ -1295,23 +1311,36 @@ int kvm_handle_vncr_abort(struct kvm_vcpu *vcpu)
if (esr_fsc_is_permission_fault(esr)) { if (esr_fsc_is_permission_fault(esr)) {
inject_vncr_perm(vcpu); inject_vncr_perm(vcpu);
} else if (esr_fsc_is_translation_fault(esr)) { } else if (esr_fsc_is_translation_fault(esr)) {
bool valid; bool valid, is_gmem = false;
int ret; int ret;
scoped_guard(read_lock, &vcpu->kvm->mmu_lock) scoped_guard(read_lock, &vcpu->kvm->mmu_lock)
valid = kvm_vncr_tlb_lookup(vcpu); valid = kvm_vncr_tlb_lookup(vcpu);
if (!valid) if (!valid)
ret = kvm_translate_vncr(vcpu); ret = kvm_translate_vncr(vcpu, &is_gmem);
else else
ret = -EPERM; ret = -EPERM;
switch (ret) { switch (ret) {
case -EAGAIN: case -EAGAIN:
case -ENOMEM:
/* Let's try again... */ /* Let's try again... */
break; break;
case -ENOMEM:
/*
* For guest_memfd, this indicates that it failed to
* create a folio to back the memory. Inform userspace.
*/
if (is_gmem)
return 0;
/* Otherwise, let's try again... */
break;
case -EFAULT: case -EFAULT:
case -EIO:
case -EHWPOISON:
if (is_gmem)
return 0;
fallthrough;
case -EINVAL: case -EINVAL:
case -ENOENT: case -ENOENT:
case -EACCES: case -EACCES:
@ -1462,9 +1491,16 @@ u64 limit_nv_id_reg(struct kvm *kvm, u32 reg, u64 val)
case SYS_ID_AA64PFR1_EL1: case SYS_ID_AA64PFR1_EL1:
/* Only support BTI, SSBS, CSV2_frac */ /* Only support BTI, SSBS, CSV2_frac */
val &= (ID_AA64PFR1_EL1_BT | val &= ~(ID_AA64PFR1_EL1_PFAR |
ID_AA64PFR1_EL1_SSBS | ID_AA64PFR1_EL1_MTEX |
ID_AA64PFR1_EL1_CSV2_frac); ID_AA64PFR1_EL1_THE |
ID_AA64PFR1_EL1_GCS |
ID_AA64PFR1_EL1_MTE_frac |
ID_AA64PFR1_EL1_NMI |
ID_AA64PFR1_EL1_SME |
ID_AA64PFR1_EL1_RES0 |
ID_AA64PFR1_EL1_MPAM_frac |
ID_AA64PFR1_EL1_MTE);
break; break;
case SYS_ID_AA64MMFR0_EL1: case SYS_ID_AA64MMFR0_EL1:
@ -1517,12 +1553,11 @@ u64 limit_nv_id_reg(struct kvm *kvm, u32 reg, u64 val)
break; break;
case SYS_ID_AA64MMFR1_EL1: case SYS_ID_AA64MMFR1_EL1:
val &= (ID_AA64MMFR1_EL1_HCX | val &= ~(ID_AA64MMFR1_EL1_CMOW |
ID_AA64MMFR1_EL1_PAN | ID_AA64MMFR1_EL1_nTLBPA |
ID_AA64MMFR1_EL1_LO | ID_AA64MMFR1_EL1_ETS |
ID_AA64MMFR1_EL1_HPDS | ID_AA64MMFR1_EL1_XNX |
ID_AA64MMFR1_EL1_VH | ID_AA64MMFR1_EL1_HAFDBS);
ID_AA64MMFR1_EL1_VMIDBits);
/* FEAT_E2H0 implies no VHE */ /* FEAT_E2H0 implies no VHE */
if (test_bit(KVM_ARM_VCPU_HAS_EL2_E2H0, kvm->arch.vcpu_features)) if (test_bit(KVM_ARM_VCPU_HAS_EL2_E2H0, kvm->arch.vcpu_features))
val &= ~ID_AA64MMFR1_EL1_VH; val &= ~ID_AA64MMFR1_EL1_VH;
@ -1564,14 +1599,22 @@ u64 limit_nv_id_reg(struct kvm *kvm, u32 reg, u64 val)
case SYS_ID_AA64DFR0_EL1: case SYS_ID_AA64DFR0_EL1:
/* Only limited support for PMU, Debug, BPs, WPs, and HPMN0 */ /* Only limited support for PMU, Debug, BPs, WPs, and HPMN0 */
val &= (ID_AA64DFR0_EL1_PMUVer | val &= ~(ID_AA64DFR0_EL1_ExtTrcBuff |
ID_AA64DFR0_EL1_WRPs | ID_AA64DFR0_EL1_BRBE |
ID_AA64DFR0_EL1_BRPs | ID_AA64DFR0_EL1_MTPMU |
ID_AA64DFR0_EL1_DebugVer| ID_AA64DFR0_EL1_TraceBuffer |
ID_AA64DFR0_EL1_HPMN0); ID_AA64DFR0_EL1_TraceFilt |
ID_AA64DFR0_EL1_PMSVer |
ID_AA64DFR0_EL1_CTX_CMPs |
ID_AA64DFR0_EL1_SEBEP |
ID_AA64DFR0_EL1_PMSS |
ID_AA64DFR0_EL1_TraceVer);
/* Cap Debug to ARMv8.1 */ /*
val = ID_REG_LIMIT_FIELD_ENUM(val, ID_AA64DFR0_EL1, DebugVer, VHE); * FEAT_Debugv8p9 requires support for extended breakpoints /
* watchpoints.
*/
val = ID_REG_LIMIT_FIELD_ENUM(val, ID_AA64DFR0_EL1, DebugVer, V8P8);
break; break;
} }
@ -1796,3 +1839,33 @@ void kvm_nested_sync_hwstate(struct kvm_vcpu *vcpu)
if (unlikely(vcpu_test_and_clear_flag(vcpu, NESTED_SERROR_PENDING))) if (unlikely(vcpu_test_and_clear_flag(vcpu, NESTED_SERROR_PENDING)))
kvm_inject_serror_esr(vcpu, vcpu_get_vsesr(vcpu)); kvm_inject_serror_esr(vcpu, vcpu_get_vsesr(vcpu));
} }
/*
* KVM unconditionally sets most of these traps anyway but use an allowlist
* to document the guest hypervisor traps that may take precedence and guard
* against future changes to the non-nested trap configuration.
*/
#define NV_MDCR_GUEST_INCLUDE (MDCR_EL2_TDE | \
MDCR_EL2_TDA | \
MDCR_EL2_TDRA | \
MDCR_EL2_TTRF | \
MDCR_EL2_TPMS | \
MDCR_EL2_TPM | \
MDCR_EL2_TPMCR | \
MDCR_EL2_TDCC | \
MDCR_EL2_TDOSA)
void kvm_nested_setup_mdcr_el2(struct kvm_vcpu *vcpu)
{
u64 guest_mdcr = __vcpu_sys_reg(vcpu, MDCR_EL2);
/*
* In yet another example where FEAT_NV2 is fscking broken, accesses
* to MDSCR_EL1 are redirected to the VNCR despite having an effect
* at EL2. Use a big hammer to apply sanity.
*/
if (is_hyp_ctxt(vcpu))
vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
else
vcpu->arch.mdcr_el2 |= (guest_mdcr & NV_MDCR_GUEST_INCLUDE);
}

View File

@ -85,16 +85,23 @@ void __init kvm_hyp_reserve(void)
hyp_mem_base); hyp_mem_base);
} }
static void __pkvm_destroy_hyp_vm(struct kvm *host_kvm) static void __pkvm_destroy_hyp_vm(struct kvm *kvm)
{ {
if (host_kvm->arch.pkvm.handle) { if (pkvm_hyp_vm_is_created(kvm)) {
WARN_ON(kvm_call_hyp_nvhe(__pkvm_teardown_vm, WARN_ON(kvm_call_hyp_nvhe(__pkvm_teardown_vm,
host_kvm->arch.pkvm.handle)); kvm->arch.pkvm.handle));
} else if (kvm->arch.pkvm.handle) {
/*
* The VM could have been reserved but hyp initialization has
* failed. Make sure to unreserve it.
*/
kvm_call_hyp_nvhe(__pkvm_unreserve_vm, kvm->arch.pkvm.handle);
} }
host_kvm->arch.pkvm.handle = 0; kvm->arch.pkvm.handle = 0;
free_hyp_memcache(&host_kvm->arch.pkvm.teardown_mc); kvm->arch.pkvm.is_created = false;
free_hyp_memcache(&host_kvm->arch.pkvm.stage2_teardown_mc); free_hyp_memcache(&kvm->arch.pkvm.teardown_mc);
free_hyp_memcache(&kvm->arch.pkvm.stage2_teardown_mc);
} }
static int __pkvm_create_hyp_vcpu(struct kvm_vcpu *vcpu) static int __pkvm_create_hyp_vcpu(struct kvm_vcpu *vcpu)
@ -129,16 +136,16 @@ static int __pkvm_create_hyp_vcpu(struct kvm_vcpu *vcpu)
* *
* Return 0 on success, negative error code on failure. * Return 0 on success, negative error code on failure.
*/ */
static int __pkvm_create_hyp_vm(struct kvm *host_kvm) static int __pkvm_create_hyp_vm(struct kvm *kvm)
{ {
size_t pgd_sz, hyp_vm_sz; size_t pgd_sz, hyp_vm_sz;
void *pgd, *hyp_vm; void *pgd, *hyp_vm;
int ret; int ret;
if (host_kvm->created_vcpus < 1) if (kvm->created_vcpus < 1)
return -EINVAL; return -EINVAL;
pgd_sz = kvm_pgtable_stage2_pgd_size(host_kvm->arch.mmu.vtcr); pgd_sz = kvm_pgtable_stage2_pgd_size(kvm->arch.mmu.vtcr);
/* /*
* The PGD pages will be reclaimed using a hyp_memcache which implies * The PGD pages will be reclaimed using a hyp_memcache which implies
@ -152,7 +159,7 @@ static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
/* Allocate memory to donate to hyp for vm and vcpu pointers. */ /* Allocate memory to donate to hyp for vm and vcpu pointers. */
hyp_vm_sz = PAGE_ALIGN(size_add(PKVM_HYP_VM_SIZE, hyp_vm_sz = PAGE_ALIGN(size_add(PKVM_HYP_VM_SIZE,
size_mul(sizeof(void *), size_mul(sizeof(void *),
host_kvm->created_vcpus))); kvm->created_vcpus)));
hyp_vm = alloc_pages_exact(hyp_vm_sz, GFP_KERNEL_ACCOUNT); hyp_vm = alloc_pages_exact(hyp_vm_sz, GFP_KERNEL_ACCOUNT);
if (!hyp_vm) { if (!hyp_vm) {
ret = -ENOMEM; ret = -ENOMEM;
@ -160,12 +167,12 @@ static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
} }
/* Donate the VM memory to hyp and let hyp initialize it. */ /* Donate the VM memory to hyp and let hyp initialize it. */
ret = kvm_call_hyp_nvhe(__pkvm_init_vm, host_kvm, hyp_vm, pgd); ret = kvm_call_hyp_nvhe(__pkvm_init_vm, kvm, hyp_vm, pgd);
if (ret < 0) if (ret)
goto free_vm; goto free_vm;
host_kvm->arch.pkvm.handle = ret; kvm->arch.pkvm.is_created = true;
host_kvm->arch.pkvm.stage2_teardown_mc.flags |= HYP_MEMCACHE_ACCOUNT_STAGE2; kvm->arch.pkvm.stage2_teardown_mc.flags |= HYP_MEMCACHE_ACCOUNT_STAGE2;
kvm_account_pgtable_pages(pgd, pgd_sz / PAGE_SIZE); kvm_account_pgtable_pages(pgd, pgd_sz / PAGE_SIZE);
return 0; return 0;
@ -176,14 +183,19 @@ free_pgd:
return ret; return ret;
} }
int pkvm_create_hyp_vm(struct kvm *host_kvm) bool pkvm_hyp_vm_is_created(struct kvm *kvm)
{
return READ_ONCE(kvm->arch.pkvm.is_created);
}
int pkvm_create_hyp_vm(struct kvm *kvm)
{ {
int ret = 0; int ret = 0;
mutex_lock(&host_kvm->arch.config_lock); mutex_lock(&kvm->arch.config_lock);
if (!host_kvm->arch.pkvm.handle) if (!pkvm_hyp_vm_is_created(kvm))
ret = __pkvm_create_hyp_vm(host_kvm); ret = __pkvm_create_hyp_vm(kvm);
mutex_unlock(&host_kvm->arch.config_lock); mutex_unlock(&kvm->arch.config_lock);
return ret; return ret;
} }
@ -200,15 +212,31 @@ int pkvm_create_hyp_vcpu(struct kvm_vcpu *vcpu)
return ret; return ret;
} }
void pkvm_destroy_hyp_vm(struct kvm *host_kvm) void pkvm_destroy_hyp_vm(struct kvm *kvm)
{ {
mutex_lock(&host_kvm->arch.config_lock); mutex_lock(&kvm->arch.config_lock);
__pkvm_destroy_hyp_vm(host_kvm); __pkvm_destroy_hyp_vm(kvm);
mutex_unlock(&host_kvm->arch.config_lock); mutex_unlock(&kvm->arch.config_lock);
} }
int pkvm_init_host_vm(struct kvm *host_kvm) int pkvm_init_host_vm(struct kvm *kvm)
{ {
int ret;
if (pkvm_hyp_vm_is_created(kvm))
return -EINVAL;
/* VM is already reserved, no need to proceed. */
if (kvm->arch.pkvm.handle)
return 0;
/* Reserve the VM in hyp and obtain a hyp handle for the VM. */
ret = kvm_call_hyp_nvhe(__pkvm_reserve_vm);
if (ret < 0)
return ret;
kvm->arch.pkvm.handle = ret;
return 0; return 0;
} }

View File

@ -32,23 +32,23 @@ static const struct ptdump_prot_bits stage2_pte_bits[] = {
.set = " ", .set = " ",
.clear = "F", .clear = "F",
}, { }, {
.mask = KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R | PTE_VALID, .mask = KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R,
.val = KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R | PTE_VALID, .val = KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R,
.set = "R", .set = "R",
.clear = " ", .clear = " ",
}, { }, {
.mask = KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | PTE_VALID, .mask = KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W,
.val = KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | PTE_VALID, .val = KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W,
.set = "W", .set = "W",
.clear = " ", .clear = " ",
}, { }, {
.mask = KVM_PTE_LEAF_ATTR_HI_S2_XN | PTE_VALID, .mask = KVM_PTE_LEAF_ATTR_HI_S2_XN,
.val = PTE_VALID, .val = KVM_PTE_LEAF_ATTR_HI_S2_XN,
.set = " ", .set = "NX",
.clear = "X", .clear = "x ",
}, { }, {
.mask = KVM_PTE_LEAF_ATTR_LO_S2_AF | PTE_VALID, .mask = KVM_PTE_LEAF_ATTR_LO_S2_AF,
.val = KVM_PTE_LEAF_ATTR_LO_S2_AF | PTE_VALID, .val = KVM_PTE_LEAF_ATTR_LO_S2_AF,
.set = "AF", .set = "AF",
.clear = " ", .clear = " ",
}, { }, {

View File

@ -1757,7 +1757,8 @@ static u64 __kvm_read_sanitised_id_reg(const struct kvm_vcpu *vcpu,
val &= ~ID_AA64ISAR2_EL1_WFxT; val &= ~ID_AA64ISAR2_EL1_WFxT;
break; break;
case SYS_ID_AA64ISAR3_EL1: case SYS_ID_AA64ISAR3_EL1:
val &= ID_AA64ISAR3_EL1_FPRCVT | ID_AA64ISAR3_EL1_FAMINMAX; val &= ID_AA64ISAR3_EL1_FPRCVT | ID_AA64ISAR3_EL1_LSFE |
ID_AA64ISAR3_EL1_FAMINMAX;
break; break;
case SYS_ID_AA64MMFR2_EL1: case SYS_ID_AA64MMFR2_EL1:
val &= ~ID_AA64MMFR2_EL1_CCIDX_MASK; val &= ~ID_AA64MMFR2_EL1_CCIDX_MASK;
@ -1997,6 +1998,26 @@ static u64 sanitise_id_aa64dfr0_el1(const struct kvm_vcpu *vcpu, u64 val)
return val; return val;
} }
/*
* Older versions of KVM erroneously claim support for FEAT_DoubleLock with
* NV-enabled VMs on unsupporting hardware. Silently ignore the incorrect
* value if it is consistent with the bug.
*/
static bool ignore_feat_doublelock(struct kvm_vcpu *vcpu, u64 val)
{
u8 host, user;
if (!vcpu_has_nv(vcpu))
return false;
host = SYS_FIELD_GET(ID_AA64DFR0_EL1, DoubleLock,
read_sanitised_ftr_reg(SYS_ID_AA64DFR0_EL1));
user = SYS_FIELD_GET(ID_AA64DFR0_EL1, DoubleLock, val);
return host == ID_AA64DFR0_EL1_DoubleLock_NI &&
user == ID_AA64DFR0_EL1_DoubleLock_IMP;
}
static int set_id_aa64dfr0_el1(struct kvm_vcpu *vcpu, static int set_id_aa64dfr0_el1(struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd, const struct sys_reg_desc *rd,
u64 val) u64 val)
@ -2028,6 +2049,11 @@ static int set_id_aa64dfr0_el1(struct kvm_vcpu *vcpu,
if (debugver < ID_AA64DFR0_EL1_DebugVer_IMP) if (debugver < ID_AA64DFR0_EL1_DebugVer_IMP)
return -EINVAL; return -EINVAL;
if (ignore_feat_doublelock(vcpu, val)) {
val &= ~ID_AA64DFR0_EL1_DoubleLock;
val |= SYS_FIELD_PREP_ENUM(ID_AA64DFR0_EL1, DoubleLock, NI);
}
return set_id_reg(vcpu, rd, val); return set_id_reg(vcpu, rd, val);
} }
@ -2148,16 +2174,29 @@ static int set_id_aa64pfr1_el1(struct kvm_vcpu *vcpu,
return set_id_reg(vcpu, rd, user_val); return set_id_reg(vcpu, rd, user_val);
} }
/*
* Allow userspace to de-feature a stage-2 translation granule but prevent it
* from claiming the impossible.
*/
#define tgran2_val_allowed(tg, safe, user) \
({ \
u8 __s = SYS_FIELD_GET(ID_AA64MMFR0_EL1, tg, safe); \
u8 __u = SYS_FIELD_GET(ID_AA64MMFR0_EL1, tg, user); \
\
__s == __u || __u == ID_AA64MMFR0_EL1_##tg##_NI; \
})
static int set_id_aa64mmfr0_el1(struct kvm_vcpu *vcpu, static int set_id_aa64mmfr0_el1(struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd, u64 user_val) const struct sys_reg_desc *rd, u64 user_val)
{ {
u64 sanitized_val = kvm_read_sanitised_id_reg(vcpu, rd); u64 sanitized_val = kvm_read_sanitised_id_reg(vcpu, rd);
u64 tgran2_mask = ID_AA64MMFR0_EL1_TGRAN4_2_MASK |
ID_AA64MMFR0_EL1_TGRAN16_2_MASK |
ID_AA64MMFR0_EL1_TGRAN64_2_MASK;
if (vcpu_has_nv(vcpu) && if (!vcpu_has_nv(vcpu))
((sanitized_val & tgran2_mask) != (user_val & tgran2_mask))) return set_id_reg(vcpu, rd, user_val);
if (!tgran2_val_allowed(TGRAN4_2, sanitized_val, user_val) ||
!tgran2_val_allowed(TGRAN16_2, sanitized_val, user_val) ||
!tgran2_val_allowed(TGRAN64_2, sanitized_val, user_val))
return -EINVAL; return -EINVAL;
return set_id_reg(vcpu, rd, user_val); return set_id_reg(vcpu, rd, user_val);
@ -3141,6 +3180,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
ID_AA64ISAR2_EL1_APA3 | ID_AA64ISAR2_EL1_APA3 |
ID_AA64ISAR2_EL1_GPA3)), ID_AA64ISAR2_EL1_GPA3)),
ID_WRITABLE(ID_AA64ISAR3_EL1, (ID_AA64ISAR3_EL1_FPRCVT | ID_WRITABLE(ID_AA64ISAR3_EL1, (ID_AA64ISAR3_EL1_FPRCVT |
ID_AA64ISAR3_EL1_LSFE |
ID_AA64ISAR3_EL1_FAMINMAX)), ID_AA64ISAR3_EL1_FAMINMAX)),
ID_UNALLOCATED(6,4), ID_UNALLOCATED(6,4),
ID_UNALLOCATED(6,5), ID_UNALLOCATED(6,5),
@ -3152,8 +3192,6 @@ static const struct sys_reg_desc sys_reg_descs[] = {
~(ID_AA64MMFR0_EL1_RES0 | ~(ID_AA64MMFR0_EL1_RES0 |
ID_AA64MMFR0_EL1_ASIDBITS)), ID_AA64MMFR0_EL1_ASIDBITS)),
ID_WRITABLE(ID_AA64MMFR1_EL1, ~(ID_AA64MMFR1_EL1_RES0 | ID_WRITABLE(ID_AA64MMFR1_EL1, ~(ID_AA64MMFR1_EL1_RES0 |
ID_AA64MMFR1_EL1_HCX |
ID_AA64MMFR1_EL1_TWED |
ID_AA64MMFR1_EL1_XNX | ID_AA64MMFR1_EL1_XNX |
ID_AA64MMFR1_EL1_VH | ID_AA64MMFR1_EL1_VH |
ID_AA64MMFR1_EL1_VMIDBits)), ID_AA64MMFR1_EL1_VMIDBits)),
@ -3238,6 +3276,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_PMBLIMITR_EL1), undef_access }, { SYS_DESC(SYS_PMBLIMITR_EL1), undef_access },
{ SYS_DESC(SYS_PMBPTR_EL1), undef_access }, { SYS_DESC(SYS_PMBPTR_EL1), undef_access },
{ SYS_DESC(SYS_PMBSR_EL1), undef_access }, { SYS_DESC(SYS_PMBSR_EL1), undef_access },
{ SYS_DESC(SYS_PMSDSFR_EL1), undef_access },
/* PMBIDR_EL1 is not trapped */ /* PMBIDR_EL1 is not trapped */
{ PMU_SYS_REG(PMINTENSET_EL1), { PMU_SYS_REG(PMINTENSET_EL1),

View File

@ -554,7 +554,6 @@ int vgic_lazy_init(struct kvm *kvm)
* Also map the virtual CPU interface into the VM. * Also map the virtual CPU interface into the VM.
* v2 calls vgic_init() if not already done. * v2 calls vgic_init() if not already done.
* v3 and derivatives return an error if the VGIC is not initialized. * v3 and derivatives return an error if the VGIC is not initialized.
* vgic_ready() returns true if this function has succeeded.
*/ */
int kvm_vgic_map_resources(struct kvm *kvm) int kvm_vgic_map_resources(struct kvm *kvm)
{ {
@ -563,12 +562,12 @@ int kvm_vgic_map_resources(struct kvm *kvm)
gpa_t dist_base; gpa_t dist_base;
int ret = 0; int ret = 0;
if (likely(vgic_ready(kvm))) if (likely(smp_load_acquire(&dist->ready)))
return 0; return 0;
mutex_lock(&kvm->slots_lock); mutex_lock(&kvm->slots_lock);
mutex_lock(&kvm->arch.config_lock); mutex_lock(&kvm->arch.config_lock);
if (vgic_ready(kvm)) if (dist->ready)
goto out; goto out;
if (!irqchip_in_kernel(kvm)) if (!irqchip_in_kernel(kvm))
@ -594,14 +593,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
goto out_slots; goto out_slots;
} }
/* smp_store_release(&dist->ready, true);
* kvm_io_bus_register_dev() guarantees all readers see the new MMIO
* registration before returning through synchronize_srcu(), which also
* implies a full memory barrier. As such, marking the distributor as
* 'ready' here is guaranteed to be ordered after all vCPUs having seen
* a completely configured distributor.
*/
dist->ready = true;
goto out_slots; goto out_slots;
out: out:
mutex_unlock(&kvm->arch.config_lock); mutex_unlock(&kvm->arch.config_lock);

View File

@ -588,6 +588,7 @@ int vgic_v3_map_resources(struct kvm *kvm)
} }
DEFINE_STATIC_KEY_FALSE(vgic_v3_cpuif_trap); DEFINE_STATIC_KEY_FALSE(vgic_v3_cpuif_trap);
DEFINE_STATIC_KEY_FALSE(vgic_v3_has_v2_compat);
static int __init early_group0_trap_cfg(char *buf) static int __init early_group0_trap_cfg(char *buf)
{ {
@ -697,6 +698,13 @@ int vgic_v3_probe(const struct gic_kvm_info *info)
if (kvm_vgic_global_state.vcpu_base == 0) if (kvm_vgic_global_state.vcpu_base == 0)
kvm_info("disabling GICv2 emulation\n"); kvm_info("disabling GICv2 emulation\n");
/*
* Flip the static branch if the HW supports v2, even if we're
* not using it (such as in protected mode).
*/
if (has_v2)
static_branch_enable(&vgic_v3_has_v2_compat);
if (cpus_have_final_cap(ARM64_WORKAROUND_CAVIUM_30115)) { if (cpus_have_final_cap(ARM64_WORKAROUND_CAVIUM_30115)) {
group0_trap = true; group0_trap = true;
group1_trap = true; group1_trap = true;

View File

@ -15,7 +15,7 @@ int vgic_v5_probe(const struct gic_kvm_info *info)
u64 ich_vtr_el2; u64 ich_vtr_el2;
int ret; int ret;
if (!info->has_gcie_v3_compat) if (!cpus_have_final_cap(ARM64_HAS_GICV5_LEGACY))
return -ENODEV; return -ENODEV;
kvm_vgic_global_state.type = VGIC_V5; kvm_vgic_global_state.type = VGIC_V5;

View File

@ -37,6 +37,7 @@ HAS_GENERIC_AUTH_ARCH_QARMA5
HAS_GENERIC_AUTH_IMP_DEF HAS_GENERIC_AUTH_IMP_DEF
HAS_GICV3_CPUIF HAS_GICV3_CPUIF
HAS_GICV5_CPUIF HAS_GICV5_CPUIF
HAS_GICV5_LEGACY
HAS_GIC_PRIO_MASKING HAS_GIC_PRIO_MASKING
HAS_GIC_PRIO_RELAXED_SYNC HAS_GIC_PRIO_RELAXED_SYNC
HAS_HCR_NV1 HAS_HCR_NV1

View File

@ -34,13 +34,26 @@
#define PCH_PIC_INT_ISR_END 0x3af #define PCH_PIC_INT_ISR_END 0x3af
#define PCH_PIC_POLARITY_START 0x3e0 #define PCH_PIC_POLARITY_START 0x3e0
#define PCH_PIC_POLARITY_END 0x3e7 #define PCH_PIC_POLARITY_END 0x3e7
#define PCH_PIC_INT_ID_VAL 0x7000000UL #define PCH_PIC_INT_ID_VAL 0x7UL
#define PCH_PIC_INT_ID_VER 0x1UL #define PCH_PIC_INT_ID_VER 0x1UL
union pch_pic_id {
struct {
uint8_t reserved_0[3];
uint8_t id;
uint8_t version;
uint8_t reserved_1;
uint8_t irq_num;
uint8_t reserved_2;
} desc;
uint64_t data;
};
struct loongarch_pch_pic { struct loongarch_pch_pic {
spinlock_t lock; spinlock_t lock;
struct kvm *kvm; struct kvm *kvm;
struct kvm_io_device device; struct kvm_io_device device;
union pch_pic_id id;
uint64_t mask; /* 1:disable irq, 0:enable irq */ uint64_t mask; /* 1:disable irq, 0:enable irq */
uint64_t htmsi_en; /* 1:msi */ uint64_t htmsi_en; /* 1:msi */
uint64_t edge; /* 1:edge triggered, 0:level triggered */ uint64_t edge; /* 1:edge triggered, 0:level triggered */

View File

@ -103,6 +103,7 @@ struct kvm_fpu {
#define KVM_LOONGARCH_VM_FEAT_PMU 5 #define KVM_LOONGARCH_VM_FEAT_PMU 5
#define KVM_LOONGARCH_VM_FEAT_PV_IPI 6 #define KVM_LOONGARCH_VM_FEAT_PV_IPI 6
#define KVM_LOONGARCH_VM_FEAT_PV_STEALTIME 7 #define KVM_LOONGARCH_VM_FEAT_PV_STEALTIME 7
#define KVM_LOONGARCH_VM_FEAT_PTW 8
/* Device Control API on vcpu fd */ /* Device Control API on vcpu fd */
#define KVM_LOONGARCH_VCPU_CPUCFG 0 #define KVM_LOONGARCH_VCPU_CPUCFG 0

View File

@ -218,16 +218,16 @@ int kvm_emu_iocsr(larch_inst inst, struct kvm_run *run, struct kvm_vcpu *vcpu)
} }
trace_kvm_iocsr(KVM_TRACE_IOCSR_WRITE, run->iocsr_io.len, addr, val); trace_kvm_iocsr(KVM_TRACE_IOCSR_WRITE, run->iocsr_io.len, addr, val);
} else { } else {
vcpu->arch.io_gpr = rd; /* Set register id for iocsr read completion */
idx = srcu_read_lock(&vcpu->kvm->srcu); idx = srcu_read_lock(&vcpu->kvm->srcu);
ret = kvm_io_bus_read(vcpu, KVM_IOCSR_BUS, addr, run->iocsr_io.len, val); ret = kvm_io_bus_read(vcpu, KVM_IOCSR_BUS, addr,
run->iocsr_io.len, run->iocsr_io.data);
srcu_read_unlock(&vcpu->kvm->srcu, idx); srcu_read_unlock(&vcpu->kvm->srcu, idx);
if (ret == 0) if (ret == 0) {
kvm_complete_iocsr_read(vcpu, run);
ret = EMULATE_DONE; ret = EMULATE_DONE;
else { } else
ret = EMULATE_DO_IOCSR; ret = EMULATE_DO_IOCSR;
/* Save register id for iocsr read completion */
vcpu->arch.io_gpr = rd;
}
trace_kvm_iocsr(KVM_TRACE_IOCSR_READ, run->iocsr_io.len, addr, NULL); trace_kvm_iocsr(KVM_TRACE_IOCSR_READ, run->iocsr_io.len, addr, NULL);
} }
@ -468,6 +468,8 @@ int kvm_emu_mmio_read(struct kvm_vcpu *vcpu, larch_inst inst)
if (ret == EMULATE_DO_MMIO) { if (ret == EMULATE_DO_MMIO) {
trace_kvm_mmio(KVM_TRACE_MMIO_READ, run->mmio.len, run->mmio.phys_addr, NULL); trace_kvm_mmio(KVM_TRACE_MMIO_READ, run->mmio.len, run->mmio.phys_addr, NULL);
vcpu->arch.io_gpr = rd; /* Set for kvm_complete_mmio_read() use */
/* /*
* If mmio device such as PCH-PIC is emulated in KVM, * If mmio device such as PCH-PIC is emulated in KVM,
* it need not return to user space to handle the mmio * it need not return to user space to handle the mmio
@ -475,16 +477,15 @@ int kvm_emu_mmio_read(struct kvm_vcpu *vcpu, larch_inst inst)
*/ */
idx = srcu_read_lock(&vcpu->kvm->srcu); idx = srcu_read_lock(&vcpu->kvm->srcu);
ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, vcpu->arch.badv, ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, vcpu->arch.badv,
run->mmio.len, &vcpu->arch.gprs[rd]); run->mmio.len, run->mmio.data);
srcu_read_unlock(&vcpu->kvm->srcu, idx); srcu_read_unlock(&vcpu->kvm->srcu, idx);
if (!ret) { if (!ret) {
kvm_complete_mmio_read(vcpu, run);
update_pc(&vcpu->arch); update_pc(&vcpu->arch);
vcpu->mmio_needed = 0; vcpu->mmio_needed = 0;
return EMULATE_DONE; return EMULATE_DONE;
} }
/* Set for kvm_complete_mmio_read() use */
vcpu->arch.io_gpr = rd;
run->mmio.is_write = 0; run->mmio.is_write = 0;
vcpu->mmio_is_write = 0; vcpu->mmio_is_write = 0;
return EMULATE_DO_MMIO; return EMULATE_DO_MMIO;

View File

@ -7,12 +7,25 @@
#include <asm/kvm_ipi.h> #include <asm/kvm_ipi.h>
#include <asm/kvm_vcpu.h> #include <asm/kvm_vcpu.h>
static void ipi_set(struct kvm_vcpu *vcpu, uint32_t data)
{
uint32_t status;
struct kvm_interrupt irq;
spin_lock(&vcpu->arch.ipi_state.lock);
status = vcpu->arch.ipi_state.status;
vcpu->arch.ipi_state.status |= data;
spin_unlock(&vcpu->arch.ipi_state.lock);
if ((status == 0) && data) {
irq.irq = LARCH_INT_IPI;
kvm_vcpu_ioctl_interrupt(vcpu, &irq);
}
}
static void ipi_send(struct kvm *kvm, uint64_t data) static void ipi_send(struct kvm *kvm, uint64_t data)
{ {
int cpu, action; int cpu;
uint32_t status;
struct kvm_vcpu *vcpu; struct kvm_vcpu *vcpu;
struct kvm_interrupt irq;
cpu = ((data & 0xffffffff) >> 16) & 0x3ff; cpu = ((data & 0xffffffff) >> 16) & 0x3ff;
vcpu = kvm_get_vcpu_by_cpuid(kvm, cpu); vcpu = kvm_get_vcpu_by_cpuid(kvm, cpu);
@ -21,15 +34,7 @@ static void ipi_send(struct kvm *kvm, uint64_t data)
return; return;
} }
action = BIT(data & 0x1f); ipi_set(vcpu, BIT(data & 0x1f));
spin_lock(&vcpu->arch.ipi_state.lock);
status = vcpu->arch.ipi_state.status;
vcpu->arch.ipi_state.status |= action;
spin_unlock(&vcpu->arch.ipi_state.lock);
if (status == 0) {
irq.irq = LARCH_INT_IPI;
kvm_vcpu_ioctl_interrupt(vcpu, &irq);
}
} }
static void ipi_clear(struct kvm_vcpu *vcpu, uint64_t data) static void ipi_clear(struct kvm_vcpu *vcpu, uint64_t data)
@ -96,6 +101,34 @@ static void write_mailbox(struct kvm_vcpu *vcpu, int offset, uint64_t data, int
spin_unlock(&vcpu->arch.ipi_state.lock); spin_unlock(&vcpu->arch.ipi_state.lock);
} }
static int mail_send(struct kvm *kvm, uint64_t data)
{
int i, cpu, mailbox, offset;
uint32_t val = 0, mask = 0;
struct kvm_vcpu *vcpu;
cpu = ((data & 0xffffffff) >> 16) & 0x3ff;
vcpu = kvm_get_vcpu_by_cpuid(kvm, cpu);
if (unlikely(vcpu == NULL)) {
kvm_err("%s: invalid target cpu: %d\n", __func__, cpu);
return -EINVAL;
}
mailbox = ((data & 0xffffffff) >> 2) & 0x7;
offset = IOCSR_IPI_BUF_20 + mailbox * 4;
if ((data >> 27) & 0xf) {
val = read_mailbox(vcpu, offset, 4);
for (i = 0; i < 4; i++)
if (data & (BIT(27 + i)))
mask |= (0xff << (i * 8));
val &= mask;
}
val |= ((uint32_t)(data >> 32) & ~mask);
write_mailbox(vcpu, offset, val, 4);
return 0;
}
static int send_ipi_data(struct kvm_vcpu *vcpu, gpa_t addr, uint64_t data) static int send_ipi_data(struct kvm_vcpu *vcpu, gpa_t addr, uint64_t data)
{ {
int i, idx, ret; int i, idx, ret;
@ -132,23 +165,6 @@ static int send_ipi_data(struct kvm_vcpu *vcpu, gpa_t addr, uint64_t data)
return ret; return ret;
} }
static int mail_send(struct kvm *kvm, uint64_t data)
{
int cpu, mailbox, offset;
struct kvm_vcpu *vcpu;
cpu = ((data & 0xffffffff) >> 16) & 0x3ff;
vcpu = kvm_get_vcpu_by_cpuid(kvm, cpu);
if (unlikely(vcpu == NULL)) {
kvm_err("%s: invalid target cpu: %d\n", __func__, cpu);
return -EINVAL;
}
mailbox = ((data & 0xffffffff) >> 2) & 0x7;
offset = IOCSR_IPI_BASE + IOCSR_IPI_BUF_20 + mailbox * 4;
return send_ipi_data(vcpu, offset, data);
}
static int any_send(struct kvm *kvm, uint64_t data) static int any_send(struct kvm *kvm, uint64_t data)
{ {
int cpu, offset; int cpu, offset;
@ -231,7 +247,7 @@ static int loongarch_ipi_writel(struct kvm_vcpu *vcpu, gpa_t addr, int len, cons
spin_unlock(&vcpu->arch.ipi_state.lock); spin_unlock(&vcpu->arch.ipi_state.lock);
break; break;
case IOCSR_IPI_SET: case IOCSR_IPI_SET:
ret = -EINVAL; ipi_set(vcpu, data);
break; break;
case IOCSR_IPI_CLEAR: case IOCSR_IPI_CLEAR:
/* Just clear the status of the current vcpu */ /* Just clear the status of the current vcpu */
@ -250,10 +266,10 @@ static int loongarch_ipi_writel(struct kvm_vcpu *vcpu, gpa_t addr, int len, cons
ipi_send(vcpu->kvm, data); ipi_send(vcpu->kvm, data);
break; break;
case IOCSR_MAIL_SEND: case IOCSR_MAIL_SEND:
ret = mail_send(vcpu->kvm, *(uint64_t *)val); ret = mail_send(vcpu->kvm, data);
break; break;
case IOCSR_ANY_SEND: case IOCSR_ANY_SEND:
ret = any_send(vcpu->kvm, *(uint64_t *)val); ret = any_send(vcpu->kvm, data);
break; break;
default: default:
kvm_err("%s: unknown addr: %llx\n", __func__, addr); kvm_err("%s: unknown addr: %llx\n", __func__, addr);

View File

@ -35,16 +35,11 @@ static void pch_pic_update_irq(struct loongarch_pch_pic *s, int irq, int level)
/* update batch irqs, the irq_mask is a bitmap of irqs */ /* update batch irqs, the irq_mask is a bitmap of irqs */
static void pch_pic_update_batch_irqs(struct loongarch_pch_pic *s, u64 irq_mask, int level) static void pch_pic_update_batch_irqs(struct loongarch_pch_pic *s, u64 irq_mask, int level)
{ {
int irq, bits; unsigned int irq;
DECLARE_BITMAP(irqs, 64) = { BITMAP_FROM_U64(irq_mask) };
/* find each irq by irqs bitmap and update each irq */ for_each_set_bit(irq, irqs, 64)
bits = sizeof(irq_mask) * 8;
irq = find_first_bit((void *)&irq_mask, bits);
while (irq < bits) {
pch_pic_update_irq(s, irq, level); pch_pic_update_irq(s, irq, level);
bitmap_clear((void *)&irq_mask, irq, 1);
irq = find_first_bit((void *)&irq_mask, bits);
}
} }
/* called when a irq is triggered in pch pic */ /* called when a irq is triggered in pch pic */
@ -77,109 +72,65 @@ void pch_msi_set_irq(struct kvm *kvm, int irq, int level)
eiointc_set_irq(kvm->arch.eiointc, irq, level); eiointc_set_irq(kvm->arch.eiointc, irq, level);
} }
/*
* pch pic register is 64-bit, but it is accessed by 32-bit,
* so we use high to get whether low or high 32 bits we want
* to read.
*/
static u32 pch_pic_read_reg(u64 *s, int high)
{
u64 val = *s;
/* read the high 32 bits when high is 1 */
return high ? (u32)(val >> 32) : (u32)val;
}
/*
* pch pic register is 64-bit, but it is accessed by 32-bit,
* so we use high to get whether low or high 32 bits we want
* to write.
*/
static u32 pch_pic_write_reg(u64 *s, int high, u32 v)
{
u64 val = *s, data = v;
if (high) {
/*
* Clear val high 32 bits
* Write the high 32 bits when the high is 1
*/
*s = (val << 32 >> 32) | (data << 32);
val >>= 32;
} else
/*
* Clear val low 32 bits
* Write the low 32 bits when the high is 0
*/
*s = (val >> 32 << 32) | v;
return (u32)val;
}
static int loongarch_pch_pic_read(struct loongarch_pch_pic *s, gpa_t addr, int len, void *val) static int loongarch_pch_pic_read(struct loongarch_pch_pic *s, gpa_t addr, int len, void *val)
{ {
int offset, index, ret = 0; int ret = 0, offset;
u32 data = 0; u64 data = 0;
u64 int_id = 0; void *ptemp;
offset = addr - s->pch_pic_base; offset = addr - s->pch_pic_base;
offset -= offset & 7;
spin_lock(&s->lock); spin_lock(&s->lock);
switch (offset) { switch (offset) {
case PCH_PIC_INT_ID_START ... PCH_PIC_INT_ID_END: case PCH_PIC_INT_ID_START ... PCH_PIC_INT_ID_END:
/* int id version */ data = s->id.data;
int_id |= (u64)PCH_PIC_INT_ID_VER << 32;
/* irq number */
int_id |= (u64)31 << (32 + 16);
/* int id value */
int_id |= PCH_PIC_INT_ID_VAL;
*(u64 *)val = int_id;
break; break;
case PCH_PIC_MASK_START ... PCH_PIC_MASK_END: case PCH_PIC_MASK_START ... PCH_PIC_MASK_END:
offset -= PCH_PIC_MASK_START; data = s->mask;
index = offset >> 2;
/* read mask reg */
data = pch_pic_read_reg(&s->mask, index);
*(u32 *)val = data;
break; break;
case PCH_PIC_HTMSI_EN_START ... PCH_PIC_HTMSI_EN_END: case PCH_PIC_HTMSI_EN_START ... PCH_PIC_HTMSI_EN_END:
offset -= PCH_PIC_HTMSI_EN_START;
index = offset >> 2;
/* read htmsi enable reg */ /* read htmsi enable reg */
data = pch_pic_read_reg(&s->htmsi_en, index); data = s->htmsi_en;
*(u32 *)val = data;
break; break;
case PCH_PIC_EDGE_START ... PCH_PIC_EDGE_END: case PCH_PIC_EDGE_START ... PCH_PIC_EDGE_END:
offset -= PCH_PIC_EDGE_START;
index = offset >> 2;
/* read edge enable reg */ /* read edge enable reg */
data = pch_pic_read_reg(&s->edge, index); data = s->edge;
*(u32 *)val = data;
break; break;
case PCH_PIC_AUTO_CTRL0_START ... PCH_PIC_AUTO_CTRL0_END: case PCH_PIC_AUTO_CTRL0_START ... PCH_PIC_AUTO_CTRL0_END:
case PCH_PIC_AUTO_CTRL1_START ... PCH_PIC_AUTO_CTRL1_END: case PCH_PIC_AUTO_CTRL1_START ... PCH_PIC_AUTO_CTRL1_END:
/* we only use default mode: fixed interrupt distribution mode */ /* we only use default mode: fixed interrupt distribution mode */
*(u32 *)val = 0;
break; break;
case PCH_PIC_ROUTE_ENTRY_START ... PCH_PIC_ROUTE_ENTRY_END: case PCH_PIC_ROUTE_ENTRY_START ... PCH_PIC_ROUTE_ENTRY_END:
/* only route to int0: eiointc */ /* only route to int0: eiointc */
*(u8 *)val = 1; ptemp = s->route_entry + (offset - PCH_PIC_ROUTE_ENTRY_START);
data = *(u64 *)ptemp;
break; break;
case PCH_PIC_HTMSI_VEC_START ... PCH_PIC_HTMSI_VEC_END: case PCH_PIC_HTMSI_VEC_START ... PCH_PIC_HTMSI_VEC_END:
offset -= PCH_PIC_HTMSI_VEC_START;
/* read htmsi vector */ /* read htmsi vector */
data = s->htmsi_vector[offset]; ptemp = s->htmsi_vector + (offset - PCH_PIC_HTMSI_VEC_START);
*(u8 *)val = data; data = *(u64 *)ptemp;
break; break;
case PCH_PIC_POLARITY_START ... PCH_PIC_POLARITY_END: case PCH_PIC_POLARITY_START ... PCH_PIC_POLARITY_END:
/* we only use defalut value 0: high level triggered */ data = s->polarity;
*(u32 *)val = 0; break;
case PCH_PIC_INT_IRR_START:
data = s->irr;
break;
case PCH_PIC_INT_ISR_START:
data = s->isr;
break; break;
default: default:
ret = -EINVAL; ret = -EINVAL;
} }
spin_unlock(&s->lock); spin_unlock(&s->lock);
if (ret == 0) {
offset = (addr - s->pch_pic_base) & 7;
data = data >> (offset * 8);
memcpy(val, &data, len);
}
return ret; return ret;
} }
@ -210,81 +161,69 @@ static int kvm_pch_pic_read(struct kvm_vcpu *vcpu,
static int loongarch_pch_pic_write(struct loongarch_pch_pic *s, gpa_t addr, static int loongarch_pch_pic_write(struct loongarch_pch_pic *s, gpa_t addr,
int len, const void *val) int len, const void *val)
{ {
int ret; int ret = 0, offset;
u32 old, data, offset, index; u64 old, data, mask;
u64 irq; void *ptemp;
ret = 0; switch (len) {
data = *(u32 *)val; case 1:
offset = addr - s->pch_pic_base; data = *(u8 *)val;
mask = 0xFF;
break;
case 2:
data = *(u16 *)val;
mask = USHRT_MAX;
break;
case 4:
data = *(u32 *)val;
mask = UINT_MAX;
break;
case 8:
default:
data = *(u64 *)val;
mask = ULONG_MAX;
break;
}
offset = (addr - s->pch_pic_base) & 7;
mask = mask << (offset * 8);
data = data << (offset * 8);
offset = (addr - s->pch_pic_base) - offset;
spin_lock(&s->lock); spin_lock(&s->lock);
switch (offset) { switch (offset) {
case PCH_PIC_MASK_START ... PCH_PIC_MASK_END: case PCH_PIC_MASK_START:
offset -= PCH_PIC_MASK_START; old = s->mask;
/* get whether high or low 32 bits we want to write */ s->mask = (old & ~mask) | data;
index = offset >> 2; if (old & ~data)
old = pch_pic_write_reg(&s->mask, index, data); pch_pic_update_batch_irqs(s, old & ~data, 1);
/* enable irq when mask value change to 0 */ if (~old & data)
irq = (old & ~data) << (32 * index); pch_pic_update_batch_irqs(s, ~old & data, 0);
pch_pic_update_batch_irqs(s, irq, 1);
/* disable irq when mask value change to 1 */
irq = (~old & data) << (32 * index);
pch_pic_update_batch_irqs(s, irq, 0);
break; break;
case PCH_PIC_HTMSI_EN_START ... PCH_PIC_HTMSI_EN_END: case PCH_PIC_HTMSI_EN_START:
offset -= PCH_PIC_HTMSI_EN_START; s->htmsi_en = (s->htmsi_en & ~mask) | data;
index = offset >> 2;
pch_pic_write_reg(&s->htmsi_en, index, data);
break; break;
case PCH_PIC_EDGE_START ... PCH_PIC_EDGE_END: case PCH_PIC_EDGE_START:
offset -= PCH_PIC_EDGE_START; s->edge = (s->edge & ~mask) | data;
index = offset >> 2;
/* 1: edge triggered, 0: level triggered */
pch_pic_write_reg(&s->edge, index, data);
break; break;
case PCH_PIC_CLEAR_START ... PCH_PIC_CLEAR_END: case PCH_PIC_POLARITY_START:
offset -= PCH_PIC_CLEAR_START; s->polarity = (s->polarity & ~mask) | data;
index = offset >> 2;
/* write 1 to clear edge irq */
old = pch_pic_read_reg(&s->irr, index);
/*
* get the irq bitmap which is edge triggered and
* already set and to be cleared
*/
irq = old & pch_pic_read_reg(&s->edge, index) & data;
/* write irr to the new state where irqs have been cleared */
pch_pic_write_reg(&s->irr, index, old & ~irq);
/* update cleared irqs */
pch_pic_update_batch_irqs(s, irq, 0);
break; break;
case PCH_PIC_AUTO_CTRL0_START ... PCH_PIC_AUTO_CTRL0_END: case PCH_PIC_CLEAR_START:
offset -= PCH_PIC_AUTO_CTRL0_START; old = s->irr & s->edge & data;
index = offset >> 2; if (old) {
/* we only use default mode: fixed interrupt distribution mode */ s->irr &= ~old;
pch_pic_write_reg(&s->auto_ctrl0, index, 0); pch_pic_update_batch_irqs(s, old, 0);
break; }
case PCH_PIC_AUTO_CTRL1_START ... PCH_PIC_AUTO_CTRL1_END:
offset -= PCH_PIC_AUTO_CTRL1_START;
index = offset >> 2;
/* we only use default mode: fixed interrupt distribution mode */
pch_pic_write_reg(&s->auto_ctrl1, index, 0);
break;
case PCH_PIC_ROUTE_ENTRY_START ... PCH_PIC_ROUTE_ENTRY_END:
offset -= PCH_PIC_ROUTE_ENTRY_START;
/* only route to int0: eiointc */
s->route_entry[offset] = 1;
break; break;
case PCH_PIC_HTMSI_VEC_START ... PCH_PIC_HTMSI_VEC_END: case PCH_PIC_HTMSI_VEC_START ... PCH_PIC_HTMSI_VEC_END:
/* route table to eiointc */ ptemp = s->htmsi_vector + (offset - PCH_PIC_HTMSI_VEC_START);
offset -= PCH_PIC_HTMSI_VEC_START; *(u64 *)ptemp = (*(u64 *)ptemp & ~mask) | data;
s->htmsi_vector[offset] = (u8)data;
break; break;
case PCH_PIC_POLARITY_START ... PCH_PIC_POLARITY_END: /* Not implemented */
offset -= PCH_PIC_POLARITY_START; case PCH_PIC_AUTO_CTRL0_START:
index = offset >> 2; case PCH_PIC_AUTO_CTRL1_START:
/* we only use defalut value 0: high level triggered */ case PCH_PIC_ROUTE_ENTRY_START ... PCH_PIC_ROUTE_ENTRY_END:
pch_pic_write_reg(&s->polarity, index, 0);
break; break;
default: default:
ret = -EINVAL; ret = -EINVAL;
@ -484,7 +423,7 @@ static int kvm_setup_default_irq_routing(struct kvm *kvm)
static int kvm_pch_pic_create(struct kvm_device *dev, u32 type) static int kvm_pch_pic_create(struct kvm_device *dev, u32 type)
{ {
int ret; int i, ret, irq_num;
struct kvm *kvm = dev->kvm; struct kvm *kvm = dev->kvm;
struct loongarch_pch_pic *s; struct loongarch_pch_pic *s;
@ -500,6 +439,22 @@ static int kvm_pch_pic_create(struct kvm_device *dev, u32 type)
if (!s) if (!s)
return -ENOMEM; return -ENOMEM;
/*
* Interrupt controller identification register 1
* Bit 24-31 Interrupt Controller ID
* Interrupt controller identification register 2
* Bit 0-7 Interrupt Controller version number
* Bit 16-23 The number of interrupt sources supported
*/
irq_num = 32;
s->mask = -1UL;
s->id.desc.id = PCH_PIC_INT_ID_VAL;
s->id.desc.version = PCH_PIC_INT_ID_VER;
s->id.desc.irq_num = irq_num - 1;
for (i = 0; i < irq_num; i++) {
s->route_entry[i] = 1;
s->htmsi_vector[i] = i;
}
spin_lock_init(&s->lock); spin_lock_init(&s->lock);
s->kvm = kvm; s->kvm = kvm;
kvm->arch.pch_pic = s; kvm->arch.pch_pic = s;

View File

@ -161,6 +161,41 @@ TRACE_EVENT(kvm_aux,
__entry->pc) __entry->pc)
); );
#define KVM_TRACE_IOCSR_READ_UNSATISFIED 0
#define KVM_TRACE_IOCSR_READ 1
#define KVM_TRACE_IOCSR_WRITE 2
#define kvm_trace_symbol_iocsr \
{ KVM_TRACE_IOCSR_READ_UNSATISFIED, "unsatisfied-read" }, \
{ KVM_TRACE_IOCSR_READ, "read" }, \
{ KVM_TRACE_IOCSR_WRITE, "write" }
TRACE_EVENT(kvm_iocsr,
TP_PROTO(int type, int len, u64 gpa, void *val),
TP_ARGS(type, len, gpa, val),
TP_STRUCT__entry(
__field( u32, type )
__field( u32, len )
__field( u64, gpa )
__field( u64, val )
),
TP_fast_assign(
__entry->type = type;
__entry->len = len;
__entry->gpa = gpa;
__entry->val = 0;
if (val)
memcpy(&__entry->val, val,
min_t(u32, sizeof(__entry->val), len));
),
TP_printk("iocsr %s len %u gpa 0x%llx val 0x%llx",
__print_symbolic(__entry->type, kvm_trace_symbol_iocsr),
__entry->len, __entry->gpa, __entry->val)
);
TRACE_EVENT(kvm_vpid_change, TRACE_EVENT(kvm_vpid_change,
TP_PROTO(struct kvm_vcpu *vcpu, unsigned long vpid), TP_PROTO(struct kvm_vcpu *vcpu, unsigned long vpid),
TP_ARGS(vcpu, vpid), TP_ARGS(vcpu, vpid),

View File

@ -680,6 +680,8 @@ static int _kvm_get_cpucfg_mask(int id, u64 *v)
*v |= CPUCFG2_ARMBT; *v |= CPUCFG2_ARMBT;
if (cpu_has_lbt_mips) if (cpu_has_lbt_mips)
*v |= CPUCFG2_MIPSBT; *v |= CPUCFG2_MIPSBT;
if (cpu_has_ptw)
*v |= CPUCFG2_PTW;
return 0; return 0;
case LOONGARCH_CPUCFG3: case LOONGARCH_CPUCFG3:

View File

@ -146,6 +146,10 @@ static int kvm_vm_feature_has_attr(struct kvm *kvm, struct kvm_device_attr *attr
if (kvm_pvtime_supported()) if (kvm_pvtime_supported())
return 0; return 0;
return -ENXIO; return -ENXIO;
case KVM_LOONGARCH_VM_FEAT_PTW:
if (cpu_has_ptw)
return 0;
return -ENXIO;
default: default:
return -ENXIO; return -ENXIO;
} }

View File

@ -21,6 +21,7 @@
#include <asm/kvm_vcpu_fp.h> #include <asm/kvm_vcpu_fp.h>
#include <asm/kvm_vcpu_insn.h> #include <asm/kvm_vcpu_insn.h>
#include <asm/kvm_vcpu_sbi.h> #include <asm/kvm_vcpu_sbi.h>
#include <asm/kvm_vcpu_sbi_fwft.h>
#include <asm/kvm_vcpu_timer.h> #include <asm/kvm_vcpu_timer.h>
#include <asm/kvm_vcpu_pmu.h> #include <asm/kvm_vcpu_pmu.h>
@ -263,6 +264,9 @@ struct kvm_vcpu_arch {
/* Performance monitoring context */ /* Performance monitoring context */
struct kvm_pmu pmu_context; struct kvm_pmu pmu_context;
/* Firmware feature SBI extension context */
struct kvm_sbi_fwft fwft_context;
/* 'static' configurations which are set only once */ /* 'static' configurations which are set only once */
struct kvm_vcpu_config cfg; struct kvm_vcpu_config cfg;

View File

@ -98,6 +98,9 @@ void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu);
int kvm_riscv_vcpu_pmu_snapshot_set_shmem(struct kvm_vcpu *vcpu, unsigned long saddr_low, int kvm_riscv_vcpu_pmu_snapshot_set_shmem(struct kvm_vcpu *vcpu, unsigned long saddr_low,
unsigned long saddr_high, unsigned long flags, unsigned long saddr_high, unsigned long flags,
struct kvm_vcpu_sbi_return *retdata); struct kvm_vcpu_sbi_return *retdata);
int kvm_riscv_vcpu_pmu_event_info(struct kvm_vcpu *vcpu, unsigned long saddr_low,
unsigned long saddr_high, unsigned long num_events,
unsigned long flags, struct kvm_vcpu_sbi_return *retdata);
void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu); void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu);
void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu); void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu);

View File

@ -11,7 +11,7 @@
#define KVM_SBI_IMPID 3 #define KVM_SBI_IMPID 3
#define KVM_SBI_VERSION_MAJOR 2 #define KVM_SBI_VERSION_MAJOR 3
#define KVM_SBI_VERSION_MINOR 0 #define KVM_SBI_VERSION_MINOR 0
enum kvm_riscv_sbi_ext_status { enum kvm_riscv_sbi_ext_status {
@ -59,6 +59,14 @@ struct kvm_vcpu_sbi_extension {
void (*deinit)(struct kvm_vcpu *vcpu); void (*deinit)(struct kvm_vcpu *vcpu);
void (*reset)(struct kvm_vcpu *vcpu); void (*reset)(struct kvm_vcpu *vcpu);
unsigned long state_reg_subtype;
unsigned long (*get_state_reg_count)(struct kvm_vcpu *vcpu);
int (*get_state_reg_id)(struct kvm_vcpu *vcpu, int index, u64 *reg_id);
int (*get_state_reg)(struct kvm_vcpu *vcpu, unsigned long reg_num,
unsigned long reg_size, void *reg_val);
int (*set_state_reg)(struct kvm_vcpu *vcpu, unsigned long reg_num,
unsigned long reg_size, const void *reg_val);
}; };
void kvm_riscv_vcpu_sbi_forward(struct kvm_vcpu *vcpu, struct kvm_run *run); void kvm_riscv_vcpu_sbi_forward(struct kvm_vcpu *vcpu, struct kvm_run *run);
@ -69,27 +77,21 @@ void kvm_riscv_vcpu_sbi_request_reset(struct kvm_vcpu *vcpu,
unsigned long pc, unsigned long a1); unsigned long pc, unsigned long a1);
void kvm_riscv_vcpu_sbi_load_reset_state(struct kvm_vcpu *vcpu); void kvm_riscv_vcpu_sbi_load_reset_state(struct kvm_vcpu *vcpu);
int kvm_riscv_vcpu_sbi_return(struct kvm_vcpu *vcpu, struct kvm_run *run); int kvm_riscv_vcpu_sbi_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
int kvm_riscv_vcpu_reg_indices_sbi_ext(struct kvm_vcpu *vcpu, u64 __user *uindices);
int kvm_riscv_vcpu_set_reg_sbi_ext(struct kvm_vcpu *vcpu, int kvm_riscv_vcpu_set_reg_sbi_ext(struct kvm_vcpu *vcpu,
const struct kvm_one_reg *reg); const struct kvm_one_reg *reg);
int kvm_riscv_vcpu_get_reg_sbi_ext(struct kvm_vcpu *vcpu, int kvm_riscv_vcpu_get_reg_sbi_ext(struct kvm_vcpu *vcpu,
const struct kvm_one_reg *reg); const struct kvm_one_reg *reg);
int kvm_riscv_vcpu_set_reg_sbi(struct kvm_vcpu *vcpu, int kvm_riscv_vcpu_reg_indices_sbi(struct kvm_vcpu *vcpu, u64 __user *uindices);
const struct kvm_one_reg *reg); int kvm_riscv_vcpu_set_reg_sbi(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
int kvm_riscv_vcpu_get_reg_sbi(struct kvm_vcpu *vcpu, int kvm_riscv_vcpu_get_reg_sbi(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
const struct kvm_one_reg *reg);
const struct kvm_vcpu_sbi_extension *kvm_vcpu_sbi_find_ext( const struct kvm_vcpu_sbi_extension *kvm_vcpu_sbi_find_ext(
struct kvm_vcpu *vcpu, unsigned long extid); struct kvm_vcpu *vcpu, unsigned long extid);
bool riscv_vcpu_supports_sbi_ext(struct kvm_vcpu *vcpu, int idx);
int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run); int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run);
void kvm_riscv_vcpu_sbi_init(struct kvm_vcpu *vcpu); void kvm_riscv_vcpu_sbi_init(struct kvm_vcpu *vcpu);
void kvm_riscv_vcpu_sbi_deinit(struct kvm_vcpu *vcpu); void kvm_riscv_vcpu_sbi_deinit(struct kvm_vcpu *vcpu);
void kvm_riscv_vcpu_sbi_reset(struct kvm_vcpu *vcpu); void kvm_riscv_vcpu_sbi_reset(struct kvm_vcpu *vcpu);
int kvm_riscv_vcpu_get_reg_sbi_sta(struct kvm_vcpu *vcpu, unsigned long reg_num,
unsigned long *reg_val);
int kvm_riscv_vcpu_set_reg_sbi_sta(struct kvm_vcpu *vcpu, unsigned long reg_num,
unsigned long reg_val);
#ifdef CONFIG_RISCV_SBI_V01 #ifdef CONFIG_RISCV_SBI_V01
extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_v01; extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_v01;
#endif #endif
@ -102,6 +104,7 @@ extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_hsm;
extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_dbcn; extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_dbcn;
extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_susp; extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_susp;
extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_sta; extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_sta;
extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_fwft;
extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_experimental; extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_experimental;
extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_vendor; extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_vendor;

View File

@ -0,0 +1,34 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright (c) 2025 Rivos Inc.
*
* Authors:
* Clément Léger <cleger@rivosinc.com>
*/
#ifndef __KVM_VCPU_RISCV_FWFT_H
#define __KVM_VCPU_RISCV_FWFT_H
#include <asm/sbi.h>
struct kvm_sbi_fwft_feature;
struct kvm_sbi_fwft_config {
const struct kvm_sbi_fwft_feature *feature;
bool supported;
bool enabled;
unsigned long flags;
};
/* FWFT data structure per vcpu */
struct kvm_sbi_fwft {
struct kvm_sbi_fwft_config *configs;
#ifndef CONFIG_32BIT
bool have_vs_pmlen_7;
bool have_vs_pmlen_16;
#endif
};
#define vcpu_to_fwft(vcpu) (&(vcpu)->arch.fwft_context)
#endif /* !__KVM_VCPU_RISCV_FWFT_H */

View File

@ -136,6 +136,7 @@ enum sbi_ext_pmu_fid {
SBI_EXT_PMU_COUNTER_FW_READ, SBI_EXT_PMU_COUNTER_FW_READ,
SBI_EXT_PMU_COUNTER_FW_READ_HI, SBI_EXT_PMU_COUNTER_FW_READ_HI,
SBI_EXT_PMU_SNAPSHOT_SET_SHMEM, SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
SBI_EXT_PMU_EVENT_GET_INFO,
}; };
union sbi_pmu_ctr_info { union sbi_pmu_ctr_info {
@ -159,9 +160,20 @@ struct riscv_pmu_snapshot_data {
u64 reserved[447]; u64 reserved[447];
}; };
struct riscv_pmu_event_info {
u32 event_idx;
u32 output;
u64 event_data;
};
#define RISCV_PMU_EVENT_INFO_OUTPUT_MASK 0x01
#define RISCV_PMU_RAW_EVENT_MASK GENMASK_ULL(47, 0) #define RISCV_PMU_RAW_EVENT_MASK GENMASK_ULL(47, 0)
#define RISCV_PMU_PLAT_FW_EVENT_MASK GENMASK_ULL(61, 0) #define RISCV_PMU_PLAT_FW_EVENT_MASK GENMASK_ULL(61, 0)
/* SBI v3.0 allows extended hpmeventX width value */
#define RISCV_PMU_RAW_EVENT_V2_MASK GENMASK_ULL(55, 0)
#define RISCV_PMU_RAW_EVENT_IDX 0x20000 #define RISCV_PMU_RAW_EVENT_IDX 0x20000
#define RISCV_PMU_RAW_EVENT_V2_IDX 0x30000
#define RISCV_PLAT_FW_EVENT 0xFFFF #define RISCV_PLAT_FW_EVENT 0xFFFF
/** General pmu event codes specified in SBI PMU extension */ /** General pmu event codes specified in SBI PMU extension */
@ -219,6 +231,7 @@ enum sbi_pmu_event_type {
SBI_PMU_EVENT_TYPE_HW = 0x0, SBI_PMU_EVENT_TYPE_HW = 0x0,
SBI_PMU_EVENT_TYPE_CACHE = 0x1, SBI_PMU_EVENT_TYPE_CACHE = 0x1,
SBI_PMU_EVENT_TYPE_RAW = 0x2, SBI_PMU_EVENT_TYPE_RAW = 0x2,
SBI_PMU_EVENT_TYPE_RAW_V2 = 0x3,
SBI_PMU_EVENT_TYPE_FW = 0xf, SBI_PMU_EVENT_TYPE_FW = 0xf,
}; };

View File

@ -56,6 +56,7 @@ struct kvm_riscv_config {
unsigned long mimpid; unsigned long mimpid;
unsigned long zicboz_block_size; unsigned long zicboz_block_size;
unsigned long satp_mode; unsigned long satp_mode;
unsigned long zicbop_block_size;
}; };
/* CORE registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */ /* CORE registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */
@ -185,6 +186,10 @@ enum KVM_RISCV_ISA_EXT_ID {
KVM_RISCV_ISA_EXT_ZICCRSE, KVM_RISCV_ISA_EXT_ZICCRSE,
KVM_RISCV_ISA_EXT_ZAAMO, KVM_RISCV_ISA_EXT_ZAAMO,
KVM_RISCV_ISA_EXT_ZALRSC, KVM_RISCV_ISA_EXT_ZALRSC,
KVM_RISCV_ISA_EXT_ZICBOP,
KVM_RISCV_ISA_EXT_ZFBFMIN,
KVM_RISCV_ISA_EXT_ZVFBFMIN,
KVM_RISCV_ISA_EXT_ZVFBFWMA,
KVM_RISCV_ISA_EXT_MAX, KVM_RISCV_ISA_EXT_MAX,
}; };
@ -205,6 +210,7 @@ enum KVM_RISCV_SBI_EXT_ID {
KVM_RISCV_SBI_EXT_DBCN, KVM_RISCV_SBI_EXT_DBCN,
KVM_RISCV_SBI_EXT_STA, KVM_RISCV_SBI_EXT_STA,
KVM_RISCV_SBI_EXT_SUSP, KVM_RISCV_SBI_EXT_SUSP,
KVM_RISCV_SBI_EXT_FWFT,
KVM_RISCV_SBI_EXT_MAX, KVM_RISCV_SBI_EXT_MAX,
}; };
@ -214,6 +220,18 @@ struct kvm_riscv_sbi_sta {
unsigned long shmem_hi; unsigned long shmem_hi;
}; };
struct kvm_riscv_sbi_fwft_feature {
unsigned long enable;
unsigned long flags;
unsigned long value;
};
/* SBI FWFT extension registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */
struct kvm_riscv_sbi_fwft {
struct kvm_riscv_sbi_fwft_feature misaligned_deleg;
struct kvm_riscv_sbi_fwft_feature pointer_masking;
};
/* Possible states for kvm_riscv_timer */ /* Possible states for kvm_riscv_timer */
#define KVM_RISCV_TIMER_STATE_OFF 0 #define KVM_RISCV_TIMER_STATE_OFF 0
#define KVM_RISCV_TIMER_STATE_ON 1 #define KVM_RISCV_TIMER_STATE_ON 1
@ -297,6 +315,9 @@ struct kvm_riscv_sbi_sta {
#define KVM_REG_RISCV_SBI_STA (0x0 << KVM_REG_RISCV_SUBTYPE_SHIFT) #define KVM_REG_RISCV_SBI_STA (0x0 << KVM_REG_RISCV_SUBTYPE_SHIFT)
#define KVM_REG_RISCV_SBI_STA_REG(name) \ #define KVM_REG_RISCV_SBI_STA_REG(name) \
(offsetof(struct kvm_riscv_sbi_sta, name) / sizeof(unsigned long)) (offsetof(struct kvm_riscv_sbi_sta, name) / sizeof(unsigned long))
#define KVM_REG_RISCV_SBI_FWFT (0x1 << KVM_REG_RISCV_SUBTYPE_SHIFT)
#define KVM_REG_RISCV_SBI_FWFT_REG(name) \
(offsetof(struct kvm_riscv_sbi_fwft, name) / sizeof(unsigned long))
/* Device Control API: RISC-V AIA */ /* Device Control API: RISC-V AIA */
#define KVM_DEV_RISCV_APLIC_ALIGN 0x1000 #define KVM_DEV_RISCV_APLIC_ALIGN 0x1000

View File

@ -27,6 +27,7 @@ kvm-y += vcpu_onereg.o
kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_pmu.o kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_pmu.o
kvm-y += vcpu_sbi.o kvm-y += vcpu_sbi.o
kvm-y += vcpu_sbi_base.o kvm-y += vcpu_sbi_base.o
kvm-y += vcpu_sbi_fwft.o
kvm-y += vcpu_sbi_hsm.o kvm-y += vcpu_sbi_hsm.o
kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_sbi_pmu.o kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_sbi_pmu.o
kvm-y += vcpu_sbi_replace.o kvm-y += vcpu_sbi_replace.o

View File

@ -321,7 +321,7 @@ void __init kvm_riscv_gstage_mode_detect(void)
if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV57X4) { if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV57X4) {
kvm_riscv_gstage_mode = HGATP_MODE_SV57X4; kvm_riscv_gstage_mode = HGATP_MODE_SV57X4;
kvm_riscv_gstage_pgd_levels = 5; kvm_riscv_gstage_pgd_levels = 5;
goto skip_sv48x4_test; goto done;
} }
/* Try Sv48x4 G-stage mode */ /* Try Sv48x4 G-stage mode */
@ -329,10 +329,31 @@ void __init kvm_riscv_gstage_mode_detect(void)
if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV48X4) { if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV48X4) {
kvm_riscv_gstage_mode = HGATP_MODE_SV48X4; kvm_riscv_gstage_mode = HGATP_MODE_SV48X4;
kvm_riscv_gstage_pgd_levels = 4; kvm_riscv_gstage_pgd_levels = 4;
goto done;
} }
skip_sv48x4_test:
/* Try Sv39x4 G-stage mode */
csr_write(CSR_HGATP, HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT);
if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV39X4) {
kvm_riscv_gstage_mode = HGATP_MODE_SV39X4;
kvm_riscv_gstage_pgd_levels = 3;
goto done;
}
#else /* CONFIG_32BIT */
/* Try Sv32x4 G-stage mode */
csr_write(CSR_HGATP, HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT);
if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV32X4) {
kvm_riscv_gstage_mode = HGATP_MODE_SV32X4;
kvm_riscv_gstage_pgd_levels = 2;
goto done;
}
#endif
/* KVM depends on !HGATP_MODE_OFF */
kvm_riscv_gstage_mode = HGATP_MODE_OFF;
kvm_riscv_gstage_pgd_levels = 0;
done:
csr_write(CSR_HGATP, 0); csr_write(CSR_HGATP, 0);
kvm_riscv_local_hfence_gvma_all(); kvm_riscv_local_hfence_gvma_all();
#endif
} }

View File

@ -93,6 +93,23 @@ static int __init riscv_kvm_init(void)
return rc; return rc;
kvm_riscv_gstage_mode_detect(); kvm_riscv_gstage_mode_detect();
switch (kvm_riscv_gstage_mode) {
case HGATP_MODE_SV32X4:
str = "Sv32x4";
break;
case HGATP_MODE_SV39X4:
str = "Sv39x4";
break;
case HGATP_MODE_SV48X4:
str = "Sv48x4";
break;
case HGATP_MODE_SV57X4:
str = "Sv57x4";
break;
default:
kvm_riscv_nacl_exit();
return -ENODEV;
}
kvm_riscv_gstage_vmid_detect(); kvm_riscv_gstage_vmid_detect();
@ -135,22 +152,6 @@ static int __init riscv_kvm_init(void)
(rc) ? slist : "no features"); (rc) ? slist : "no features");
} }
switch (kvm_riscv_gstage_mode) {
case HGATP_MODE_SV32X4:
str = "Sv32x4";
break;
case HGATP_MODE_SV39X4:
str = "Sv39x4";
break;
case HGATP_MODE_SV48X4:
str = "Sv48x4";
break;
case HGATP_MODE_SV57X4:
str = "Sv57x4";
break;
default:
return -ENODEV;
}
kvm_info("using %s G-stage page table format\n", str); kvm_info("using %s G-stage page table format\n", str);
kvm_info("VMID %ld bits available\n", kvm_riscv_gstage_vmid_bits()); kvm_info("VMID %ld bits available\n", kvm_riscv_gstage_vmid_bits());

View File

@ -133,6 +133,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
/* Mark this VCPU never ran */ /* Mark this VCPU never ran */
vcpu->arch.ran_atleast_once = false; vcpu->arch.ran_atleast_once = false;
vcpu->arch.cfg.hedeleg = KVM_HEDELEG_DEFAULT;
vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO; vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
bitmap_zero(vcpu->arch.isa, RISCV_ISA_EXT_MAX); bitmap_zero(vcpu->arch.isa, RISCV_ISA_EXT_MAX);
@ -570,7 +572,6 @@ static void kvm_riscv_vcpu_setup_config(struct kvm_vcpu *vcpu)
cfg->hstateen0 |= SMSTATEEN0_SSTATEEN0; cfg->hstateen0 |= SMSTATEEN0_SSTATEEN0;
} }
cfg->hedeleg = KVM_HEDELEG_DEFAULT;
if (vcpu->guest_debug) if (vcpu->guest_debug)
cfg->hedeleg &= ~BIT(EXC_BREAKPOINT); cfg->hedeleg &= ~BIT(EXC_BREAKPOINT);
} }

View File

@ -65,9 +65,11 @@ static const unsigned long kvm_isa_ext_arr[] = {
KVM_ISA_EXT_ARR(ZCF), KVM_ISA_EXT_ARR(ZCF),
KVM_ISA_EXT_ARR(ZCMOP), KVM_ISA_EXT_ARR(ZCMOP),
KVM_ISA_EXT_ARR(ZFA), KVM_ISA_EXT_ARR(ZFA),
KVM_ISA_EXT_ARR(ZFBFMIN),
KVM_ISA_EXT_ARR(ZFH), KVM_ISA_EXT_ARR(ZFH),
KVM_ISA_EXT_ARR(ZFHMIN), KVM_ISA_EXT_ARR(ZFHMIN),
KVM_ISA_EXT_ARR(ZICBOM), KVM_ISA_EXT_ARR(ZICBOM),
KVM_ISA_EXT_ARR(ZICBOP),
KVM_ISA_EXT_ARR(ZICBOZ), KVM_ISA_EXT_ARR(ZICBOZ),
KVM_ISA_EXT_ARR(ZICCRSE), KVM_ISA_EXT_ARR(ZICCRSE),
KVM_ISA_EXT_ARR(ZICNTR), KVM_ISA_EXT_ARR(ZICNTR),
@ -88,6 +90,8 @@ static const unsigned long kvm_isa_ext_arr[] = {
KVM_ISA_EXT_ARR(ZTSO), KVM_ISA_EXT_ARR(ZTSO),
KVM_ISA_EXT_ARR(ZVBB), KVM_ISA_EXT_ARR(ZVBB),
KVM_ISA_EXT_ARR(ZVBC), KVM_ISA_EXT_ARR(ZVBC),
KVM_ISA_EXT_ARR(ZVFBFMIN),
KVM_ISA_EXT_ARR(ZVFBFWMA),
KVM_ISA_EXT_ARR(ZVFH), KVM_ISA_EXT_ARR(ZVFH),
KVM_ISA_EXT_ARR(ZVFHMIN), KVM_ISA_EXT_ARR(ZVFHMIN),
KVM_ISA_EXT_ARR(ZVKB), KVM_ISA_EXT_ARR(ZVKB),
@ -173,7 +177,6 @@ static bool kvm_riscv_vcpu_isa_disable_allowed(unsigned long ext)
case KVM_RISCV_ISA_EXT_C: case KVM_RISCV_ISA_EXT_C:
case KVM_RISCV_ISA_EXT_I: case KVM_RISCV_ISA_EXT_I:
case KVM_RISCV_ISA_EXT_M: case KVM_RISCV_ISA_EXT_M:
case KVM_RISCV_ISA_EXT_SMNPM:
/* There is not architectural config bit to disable sscofpmf completely */ /* There is not architectural config bit to disable sscofpmf completely */
case KVM_RISCV_ISA_EXT_SSCOFPMF: case KVM_RISCV_ISA_EXT_SSCOFPMF:
case KVM_RISCV_ISA_EXT_SSNPM: case KVM_RISCV_ISA_EXT_SSNPM:
@ -199,8 +202,10 @@ static bool kvm_riscv_vcpu_isa_disable_allowed(unsigned long ext)
case KVM_RISCV_ISA_EXT_ZCF: case KVM_RISCV_ISA_EXT_ZCF:
case KVM_RISCV_ISA_EXT_ZCMOP: case KVM_RISCV_ISA_EXT_ZCMOP:
case KVM_RISCV_ISA_EXT_ZFA: case KVM_RISCV_ISA_EXT_ZFA:
case KVM_RISCV_ISA_EXT_ZFBFMIN:
case KVM_RISCV_ISA_EXT_ZFH: case KVM_RISCV_ISA_EXT_ZFH:
case KVM_RISCV_ISA_EXT_ZFHMIN: case KVM_RISCV_ISA_EXT_ZFHMIN:
case KVM_RISCV_ISA_EXT_ZICBOP:
case KVM_RISCV_ISA_EXT_ZICCRSE: case KVM_RISCV_ISA_EXT_ZICCRSE:
case KVM_RISCV_ISA_EXT_ZICNTR: case KVM_RISCV_ISA_EXT_ZICNTR:
case KVM_RISCV_ISA_EXT_ZICOND: case KVM_RISCV_ISA_EXT_ZICOND:
@ -220,6 +225,8 @@ static bool kvm_riscv_vcpu_isa_disable_allowed(unsigned long ext)
case KVM_RISCV_ISA_EXT_ZTSO: case KVM_RISCV_ISA_EXT_ZTSO:
case KVM_RISCV_ISA_EXT_ZVBB: case KVM_RISCV_ISA_EXT_ZVBB:
case KVM_RISCV_ISA_EXT_ZVBC: case KVM_RISCV_ISA_EXT_ZVBC:
case KVM_RISCV_ISA_EXT_ZVFBFMIN:
case KVM_RISCV_ISA_EXT_ZVFBFWMA:
case KVM_RISCV_ISA_EXT_ZVFH: case KVM_RISCV_ISA_EXT_ZVFH:
case KVM_RISCV_ISA_EXT_ZVFHMIN: case KVM_RISCV_ISA_EXT_ZVFHMIN:
case KVM_RISCV_ISA_EXT_ZVKB: case KVM_RISCV_ISA_EXT_ZVKB:
@ -277,15 +284,20 @@ static int kvm_riscv_vcpu_get_reg_config(struct kvm_vcpu *vcpu,
reg_val = vcpu->arch.isa[0] & KVM_RISCV_BASE_ISA_MASK; reg_val = vcpu->arch.isa[0] & KVM_RISCV_BASE_ISA_MASK;
break; break;
case KVM_REG_RISCV_CONFIG_REG(zicbom_block_size): case KVM_REG_RISCV_CONFIG_REG(zicbom_block_size):
if (!riscv_isa_extension_available(vcpu->arch.isa, ZICBOM)) if (!riscv_isa_extension_available(NULL, ZICBOM))
return -ENOENT; return -ENOENT;
reg_val = riscv_cbom_block_size; reg_val = riscv_cbom_block_size;
break; break;
case KVM_REG_RISCV_CONFIG_REG(zicboz_block_size): case KVM_REG_RISCV_CONFIG_REG(zicboz_block_size):
if (!riscv_isa_extension_available(vcpu->arch.isa, ZICBOZ)) if (!riscv_isa_extension_available(NULL, ZICBOZ))
return -ENOENT; return -ENOENT;
reg_val = riscv_cboz_block_size; reg_val = riscv_cboz_block_size;
break; break;
case KVM_REG_RISCV_CONFIG_REG(zicbop_block_size):
if (!riscv_isa_extension_available(NULL, ZICBOP))
return -ENOENT;
reg_val = riscv_cbop_block_size;
break;
case KVM_REG_RISCV_CONFIG_REG(mvendorid): case KVM_REG_RISCV_CONFIG_REG(mvendorid):
reg_val = vcpu->arch.mvendorid; reg_val = vcpu->arch.mvendorid;
break; break;
@ -366,17 +378,23 @@ static int kvm_riscv_vcpu_set_reg_config(struct kvm_vcpu *vcpu,
} }
break; break;
case KVM_REG_RISCV_CONFIG_REG(zicbom_block_size): case KVM_REG_RISCV_CONFIG_REG(zicbom_block_size):
if (!riscv_isa_extension_available(vcpu->arch.isa, ZICBOM)) if (!riscv_isa_extension_available(NULL, ZICBOM))
return -ENOENT; return -ENOENT;
if (reg_val != riscv_cbom_block_size) if (reg_val != riscv_cbom_block_size)
return -EINVAL; return -EINVAL;
break; break;
case KVM_REG_RISCV_CONFIG_REG(zicboz_block_size): case KVM_REG_RISCV_CONFIG_REG(zicboz_block_size):
if (!riscv_isa_extension_available(vcpu->arch.isa, ZICBOZ)) if (!riscv_isa_extension_available(NULL, ZICBOZ))
return -ENOENT; return -ENOENT;
if (reg_val != riscv_cboz_block_size) if (reg_val != riscv_cboz_block_size)
return -EINVAL; return -EINVAL;
break; break;
case KVM_REG_RISCV_CONFIG_REG(zicbop_block_size):
if (!riscv_isa_extension_available(NULL, ZICBOP))
return -ENOENT;
if (reg_val != riscv_cbop_block_size)
return -EINVAL;
break;
case KVM_REG_RISCV_CONFIG_REG(mvendorid): case KVM_REG_RISCV_CONFIG_REG(mvendorid):
if (reg_val == vcpu->arch.mvendorid) if (reg_val == vcpu->arch.mvendorid)
break; break;
@ -817,10 +835,13 @@ static int copy_config_reg_indices(const struct kvm_vcpu *vcpu,
* was not available. * was not available.
*/ */
if (i == KVM_REG_RISCV_CONFIG_REG(zicbom_block_size) && if (i == KVM_REG_RISCV_CONFIG_REG(zicbom_block_size) &&
!riscv_isa_extension_available(vcpu->arch.isa, ZICBOM)) !riscv_isa_extension_available(NULL, ZICBOM))
continue; continue;
else if (i == KVM_REG_RISCV_CONFIG_REG(zicboz_block_size) && else if (i == KVM_REG_RISCV_CONFIG_REG(zicboz_block_size) &&
!riscv_isa_extension_available(vcpu->arch.isa, ZICBOZ)) !riscv_isa_extension_available(NULL, ZICBOZ))
continue;
else if (i == KVM_REG_RISCV_CONFIG_REG(zicbop_block_size) &&
!riscv_isa_extension_available(NULL, ZICBOP))
continue; continue;
size = IS_ENABLED(CONFIG_32BIT) ? KVM_REG_SIZE_U32 : KVM_REG_SIZE_U64; size = IS_ENABLED(CONFIG_32BIT) ? KVM_REG_SIZE_U32 : KVM_REG_SIZE_U64;
@ -1061,66 +1082,14 @@ static inline unsigned long num_isa_ext_regs(const struct kvm_vcpu *vcpu)
return copy_isa_ext_reg_indices(vcpu, NULL); return copy_isa_ext_reg_indices(vcpu, NULL);
} }
static int copy_sbi_ext_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
{
unsigned int n = 0;
for (int i = 0; i < KVM_RISCV_SBI_EXT_MAX; i++) {
u64 size = IS_ENABLED(CONFIG_32BIT) ?
KVM_REG_SIZE_U32 : KVM_REG_SIZE_U64;
u64 reg = KVM_REG_RISCV | size | KVM_REG_RISCV_SBI_EXT |
KVM_REG_RISCV_SBI_SINGLE | i;
if (!riscv_vcpu_supports_sbi_ext(vcpu, i))
continue;
if (uindices) {
if (put_user(reg, uindices))
return -EFAULT;
uindices++;
}
n++;
}
return n;
}
static unsigned long num_sbi_ext_regs(struct kvm_vcpu *vcpu) static unsigned long num_sbi_ext_regs(struct kvm_vcpu *vcpu)
{ {
return copy_sbi_ext_reg_indices(vcpu, NULL); return kvm_riscv_vcpu_reg_indices_sbi_ext(vcpu, NULL);
}
static int copy_sbi_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
{
struct kvm_vcpu_sbi_context *scontext = &vcpu->arch.sbi_context;
int total = 0;
if (scontext->ext_status[KVM_RISCV_SBI_EXT_STA] == KVM_RISCV_SBI_EXT_STATUS_ENABLED) {
u64 size = IS_ENABLED(CONFIG_32BIT) ? KVM_REG_SIZE_U32 : KVM_REG_SIZE_U64;
int n = sizeof(struct kvm_riscv_sbi_sta) / sizeof(unsigned long);
for (int i = 0; i < n; i++) {
u64 reg = KVM_REG_RISCV | size |
KVM_REG_RISCV_SBI_STATE |
KVM_REG_RISCV_SBI_STA | i;
if (uindices) {
if (put_user(reg, uindices))
return -EFAULT;
uindices++;
}
}
total += n;
}
return total;
} }
static inline unsigned long num_sbi_regs(struct kvm_vcpu *vcpu) static inline unsigned long num_sbi_regs(struct kvm_vcpu *vcpu)
{ {
return copy_sbi_reg_indices(vcpu, NULL); return kvm_riscv_vcpu_reg_indices_sbi(vcpu, NULL);
} }
static inline unsigned long num_vector_regs(const struct kvm_vcpu *vcpu) static inline unsigned long num_vector_regs(const struct kvm_vcpu *vcpu)
@ -1243,12 +1212,12 @@ int kvm_riscv_vcpu_copy_reg_indices(struct kvm_vcpu *vcpu,
return ret; return ret;
uindices += ret; uindices += ret;
ret = copy_sbi_ext_reg_indices(vcpu, uindices); ret = kvm_riscv_vcpu_reg_indices_sbi_ext(vcpu, uindices);
if (ret < 0) if (ret < 0)
return ret; return ret;
uindices += ret; uindices += ret;
ret = copy_sbi_reg_indices(vcpu, uindices); ret = kvm_riscv_vcpu_reg_indices_sbi(vcpu, uindices);
if (ret < 0) if (ret < 0)
return ret; return ret;
uindices += ret; uindices += ret;

View File

@ -60,6 +60,7 @@ static u32 kvm_pmu_get_perf_event_type(unsigned long eidx)
type = PERF_TYPE_HW_CACHE; type = PERF_TYPE_HW_CACHE;
break; break;
case SBI_PMU_EVENT_TYPE_RAW: case SBI_PMU_EVENT_TYPE_RAW:
case SBI_PMU_EVENT_TYPE_RAW_V2:
case SBI_PMU_EVENT_TYPE_FW: case SBI_PMU_EVENT_TYPE_FW:
type = PERF_TYPE_RAW; type = PERF_TYPE_RAW;
break; break;
@ -128,6 +129,9 @@ static u64 kvm_pmu_get_perf_event_config(unsigned long eidx, uint64_t evt_data)
case SBI_PMU_EVENT_TYPE_RAW: case SBI_PMU_EVENT_TYPE_RAW:
config = evt_data & RISCV_PMU_RAW_EVENT_MASK; config = evt_data & RISCV_PMU_RAW_EVENT_MASK;
break; break;
case SBI_PMU_EVENT_TYPE_RAW_V2:
config = evt_data & RISCV_PMU_RAW_EVENT_V2_MASK;
break;
case SBI_PMU_EVENT_TYPE_FW: case SBI_PMU_EVENT_TYPE_FW:
if (ecode < SBI_PMU_FW_MAX) if (ecode < SBI_PMU_FW_MAX)
config = (1ULL << 63) | ecode; config = (1ULL << 63) | ecode;
@ -405,8 +409,6 @@ int kvm_riscv_vcpu_pmu_snapshot_set_shmem(struct kvm_vcpu *vcpu, unsigned long s
int snapshot_area_size = sizeof(struct riscv_pmu_snapshot_data); int snapshot_area_size = sizeof(struct riscv_pmu_snapshot_data);
int sbiret = 0; int sbiret = 0;
gpa_t saddr; gpa_t saddr;
unsigned long hva;
bool writable;
if (!kvpmu || flags) { if (!kvpmu || flags) {
sbiret = SBI_ERR_INVALID_PARAM; sbiret = SBI_ERR_INVALID_PARAM;
@ -428,19 +430,14 @@ int kvm_riscv_vcpu_pmu_snapshot_set_shmem(struct kvm_vcpu *vcpu, unsigned long s
goto out; goto out;
} }
hva = kvm_vcpu_gfn_to_hva_prot(vcpu, saddr >> PAGE_SHIFT, &writable);
if (kvm_is_error_hva(hva) || !writable) {
sbiret = SBI_ERR_INVALID_ADDRESS;
goto out;
}
kvpmu->sdata = kzalloc(snapshot_area_size, GFP_ATOMIC); kvpmu->sdata = kzalloc(snapshot_area_size, GFP_ATOMIC);
if (!kvpmu->sdata) if (!kvpmu->sdata)
return -ENOMEM; return -ENOMEM;
/* No need to check writable slot explicitly as kvm_vcpu_write_guest does it internally */
if (kvm_vcpu_write_guest(vcpu, saddr, kvpmu->sdata, snapshot_area_size)) { if (kvm_vcpu_write_guest(vcpu, saddr, kvpmu->sdata, snapshot_area_size)) {
kfree(kvpmu->sdata); kfree(kvpmu->sdata);
sbiret = SBI_ERR_FAILURE; sbiret = SBI_ERR_INVALID_ADDRESS;
goto out; goto out;
} }
@ -452,6 +449,65 @@ out:
return 0; return 0;
} }
int kvm_riscv_vcpu_pmu_event_info(struct kvm_vcpu *vcpu, unsigned long saddr_low,
unsigned long saddr_high, unsigned long num_events,
unsigned long flags, struct kvm_vcpu_sbi_return *retdata)
{
struct riscv_pmu_event_info *einfo = NULL;
int shmem_size = num_events * sizeof(*einfo);
gpa_t shmem;
u32 eidx, etype;
u64 econfig;
int ret;
if (flags != 0 || (saddr_low & (SZ_16 - 1) || num_events == 0)) {
ret = SBI_ERR_INVALID_PARAM;
goto out;
}
shmem = saddr_low;
if (saddr_high != 0) {
if (IS_ENABLED(CONFIG_32BIT)) {
shmem |= ((gpa_t)saddr_high << 32);
} else {
ret = SBI_ERR_INVALID_ADDRESS;
goto out;
}
}
einfo = kzalloc(shmem_size, GFP_KERNEL);
if (!einfo)
return -ENOMEM;
ret = kvm_vcpu_read_guest(vcpu, shmem, einfo, shmem_size);
if (ret) {
ret = SBI_ERR_FAILURE;
goto free_mem;
}
for (int i = 0; i < num_events; i++) {
eidx = einfo[i].event_idx;
etype = kvm_pmu_get_perf_event_type(eidx);
econfig = kvm_pmu_get_perf_event_config(eidx, einfo[i].event_data);
ret = riscv_pmu_get_event_info(etype, econfig, NULL);
einfo[i].output = (ret > 0) ? 1 : 0;
}
ret = kvm_vcpu_write_guest(vcpu, shmem, einfo, shmem_size);
if (ret) {
ret = SBI_ERR_INVALID_ADDRESS;
goto free_mem;
}
ret = 0;
free_mem:
kfree(einfo);
out:
retdata->err_val = ret;
return 0;
}
int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu,
struct kvm_vcpu_sbi_return *retdata) struct kvm_vcpu_sbi_return *retdata)
{ {

View File

@ -78,6 +78,10 @@ static const struct kvm_riscv_sbi_extension_entry sbi_ext[] = {
.ext_idx = KVM_RISCV_SBI_EXT_STA, .ext_idx = KVM_RISCV_SBI_EXT_STA,
.ext_ptr = &vcpu_sbi_ext_sta, .ext_ptr = &vcpu_sbi_ext_sta,
}, },
{
.ext_idx = KVM_RISCV_SBI_EXT_FWFT,
.ext_ptr = &vcpu_sbi_ext_fwft,
},
{ {
.ext_idx = KVM_RISCV_SBI_EXT_EXPERIMENTAL, .ext_idx = KVM_RISCV_SBI_EXT_EXPERIMENTAL,
.ext_ptr = &vcpu_sbi_ext_experimental, .ext_ptr = &vcpu_sbi_ext_experimental,
@ -106,7 +110,7 @@ riscv_vcpu_get_sbi_ext(struct kvm_vcpu *vcpu, unsigned long idx)
return sext; return sext;
} }
bool riscv_vcpu_supports_sbi_ext(struct kvm_vcpu *vcpu, int idx) static bool riscv_vcpu_supports_sbi_ext(struct kvm_vcpu *vcpu, int idx)
{ {
struct kvm_vcpu_sbi_context *scontext = &vcpu->arch.sbi_context; struct kvm_vcpu_sbi_context *scontext = &vcpu->arch.sbi_context;
const struct kvm_riscv_sbi_extension_entry *sext; const struct kvm_riscv_sbi_extension_entry *sext;
@ -284,6 +288,31 @@ static int riscv_vcpu_get_sbi_ext_multi(struct kvm_vcpu *vcpu,
return 0; return 0;
} }
int kvm_riscv_vcpu_reg_indices_sbi_ext(struct kvm_vcpu *vcpu, u64 __user *uindices)
{
unsigned int n = 0;
for (int i = 0; i < KVM_RISCV_SBI_EXT_MAX; i++) {
u64 size = IS_ENABLED(CONFIG_32BIT) ?
KVM_REG_SIZE_U32 : KVM_REG_SIZE_U64;
u64 reg = KVM_REG_RISCV | size | KVM_REG_RISCV_SBI_EXT |
KVM_REG_RISCV_SBI_SINGLE | i;
if (!riscv_vcpu_supports_sbi_ext(vcpu, i))
continue;
if (uindices) {
if (put_user(reg, uindices))
return -EFAULT;
uindices++;
}
n++;
}
return n;
}
int kvm_riscv_vcpu_set_reg_sbi_ext(struct kvm_vcpu *vcpu, int kvm_riscv_vcpu_set_reg_sbi_ext(struct kvm_vcpu *vcpu,
const struct kvm_one_reg *reg) const struct kvm_one_reg *reg)
{ {
@ -360,64 +389,163 @@ int kvm_riscv_vcpu_get_reg_sbi_ext(struct kvm_vcpu *vcpu,
return 0; return 0;
} }
int kvm_riscv_vcpu_set_reg_sbi(struct kvm_vcpu *vcpu, int kvm_riscv_vcpu_reg_indices_sbi(struct kvm_vcpu *vcpu, u64 __user *uindices)
const struct kvm_one_reg *reg)
{ {
unsigned long __user *uaddr = struct kvm_vcpu_sbi_context *scontext = &vcpu->arch.sbi_context;
(unsigned long __user *)(unsigned long)reg->addr; const struct kvm_riscv_sbi_extension_entry *entry;
unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | const struct kvm_vcpu_sbi_extension *ext;
KVM_REG_SIZE_MASK | unsigned long state_reg_count;
KVM_REG_RISCV_SBI_STATE); int i, j, rc, count = 0;
unsigned long reg_subtype, reg_val; u64 reg;
if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long)) for (i = 0; i < ARRAY_SIZE(sbi_ext); i++) {
return -EINVAL; entry = &sbi_ext[i];
ext = entry->ext_ptr;
if (copy_from_user(&reg_val, uaddr, KVM_REG_SIZE(reg->id))) if (!ext->get_state_reg_count ||
return -EFAULT; scontext->ext_status[entry->ext_idx] != KVM_RISCV_SBI_EXT_STATUS_ENABLED)
continue;
reg_subtype = reg_num & KVM_REG_RISCV_SUBTYPE_MASK; state_reg_count = ext->get_state_reg_count(vcpu);
reg_num &= ~KVM_REG_RISCV_SUBTYPE_MASK; if (!uindices)
goto skip_put_user;
switch (reg_subtype) { for (j = 0; j < state_reg_count; j++) {
case KVM_REG_RISCV_SBI_STA: if (ext->get_state_reg_id) {
return kvm_riscv_vcpu_set_reg_sbi_sta(vcpu, reg_num, reg_val); rc = ext->get_state_reg_id(vcpu, j, &reg);
default: if (rc)
return -EINVAL; return rc;
} else {
reg = KVM_REG_RISCV |
(IS_ENABLED(CONFIG_32BIT) ?
KVM_REG_SIZE_U32 : KVM_REG_SIZE_U64) |
KVM_REG_RISCV_SBI_STATE |
ext->state_reg_subtype | j;
}
if (put_user(reg, uindices))
return -EFAULT;
uindices++;
}
skip_put_user:
count += state_reg_count;
} }
return 0; return count;
} }
int kvm_riscv_vcpu_get_reg_sbi(struct kvm_vcpu *vcpu, static const struct kvm_vcpu_sbi_extension *kvm_vcpu_sbi_find_ext_withstate(struct kvm_vcpu *vcpu,
const struct kvm_one_reg *reg) unsigned long subtype)
{
struct kvm_vcpu_sbi_context *scontext = &vcpu->arch.sbi_context;
const struct kvm_riscv_sbi_extension_entry *entry;
const struct kvm_vcpu_sbi_extension *ext;
int i;
for (i = 0; i < ARRAY_SIZE(sbi_ext); i++) {
entry = &sbi_ext[i];
ext = entry->ext_ptr;
if (ext->get_state_reg_count &&
ext->state_reg_subtype == subtype &&
scontext->ext_status[entry->ext_idx] == KVM_RISCV_SBI_EXT_STATUS_ENABLED)
return ext;
}
return NULL;
}
int kvm_riscv_vcpu_set_reg_sbi(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
{ {
unsigned long __user *uaddr = unsigned long __user *uaddr =
(unsigned long __user *)(unsigned long)reg->addr; (unsigned long __user *)(unsigned long)reg->addr;
unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
KVM_REG_SIZE_MASK | KVM_REG_SIZE_MASK |
KVM_REG_RISCV_SBI_STATE); KVM_REG_RISCV_SBI_STATE);
unsigned long reg_subtype, reg_val; const struct kvm_vcpu_sbi_extension *ext;
int ret; unsigned long reg_subtype;
void *reg_val;
u64 data64;
u32 data32;
u16 data16;
u8 data8;
if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long)) switch (KVM_REG_SIZE(reg->id)) {
return -EINVAL; case 1:
reg_val = &data8;
reg_subtype = reg_num & KVM_REG_RISCV_SUBTYPE_MASK; break;
reg_num &= ~KVM_REG_RISCV_SUBTYPE_MASK; case 2:
reg_val = &data16;
switch (reg_subtype) { break;
case KVM_REG_RISCV_SBI_STA: case 4:
ret = kvm_riscv_vcpu_get_reg_sbi_sta(vcpu, reg_num, &reg_val); reg_val = &data32;
break;
case 8:
reg_val = &data64;
break; break;
default: default:
return -EINVAL; return -EINVAL;
} }
if (copy_from_user(reg_val, uaddr, KVM_REG_SIZE(reg->id)))
return -EFAULT;
reg_subtype = reg_num & KVM_REG_RISCV_SUBTYPE_MASK;
reg_num &= ~KVM_REG_RISCV_SUBTYPE_MASK;
ext = kvm_vcpu_sbi_find_ext_withstate(vcpu, reg_subtype);
if (!ext || !ext->set_state_reg)
return -EINVAL;
return ext->set_state_reg(vcpu, reg_num, KVM_REG_SIZE(reg->id), reg_val);
}
int kvm_riscv_vcpu_get_reg_sbi(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
{
unsigned long __user *uaddr =
(unsigned long __user *)(unsigned long)reg->addr;
unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
KVM_REG_SIZE_MASK |
KVM_REG_RISCV_SBI_STATE);
const struct kvm_vcpu_sbi_extension *ext;
unsigned long reg_subtype;
void *reg_val;
u64 data64;
u32 data32;
u16 data16;
u8 data8;
int ret;
switch (KVM_REG_SIZE(reg->id)) {
case 1:
reg_val = &data8;
break;
case 2:
reg_val = &data16;
break;
case 4:
reg_val = &data32;
break;
case 8:
reg_val = &data64;
break;
default:
return -EINVAL;
}
reg_subtype = reg_num & KVM_REG_RISCV_SUBTYPE_MASK;
reg_num &= ~KVM_REG_RISCV_SUBTYPE_MASK;
ext = kvm_vcpu_sbi_find_ext_withstate(vcpu, reg_subtype);
if (!ext || !ext->get_state_reg)
return -EINVAL;
ret = ext->get_state_reg(vcpu, reg_num, KVM_REG_SIZE(reg->id), reg_val);
if (ret) if (ret)
return ret; return ret;
if (copy_to_user(uaddr, &reg_val, KVM_REG_SIZE(reg->id))) if (copy_to_user(uaddr, reg_val, KVM_REG_SIZE(reg->id)))
return -EFAULT; return -EFAULT;
return 0; return 0;

View File

@ -0,0 +1,544 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (c) 2025 Rivos Inc.
*
* Authors:
* Clément Léger <cleger@rivosinc.com>
*/
#include <linux/errno.h>
#include <linux/err.h>
#include <linux/kvm_host.h>
#include <asm/cpufeature.h>
#include <asm/sbi.h>
#include <asm/kvm_vcpu_sbi.h>
#include <asm/kvm_vcpu_sbi_fwft.h>
#define MIS_DELEG (BIT_ULL(EXC_LOAD_MISALIGNED) | BIT_ULL(EXC_STORE_MISALIGNED))
struct kvm_sbi_fwft_feature {
/**
* @id: Feature ID
*/
enum sbi_fwft_feature_t id;
/**
* @first_reg_num: ONE_REG index of the first ONE_REG register
*/
unsigned long first_reg_num;
/**
* @supported: Check if the feature is supported on the vcpu
*
* This callback is optional, if not provided the feature is assumed to
* be supported
*/
bool (*supported)(struct kvm_vcpu *vcpu);
/**
* @reset: Reset the feature value irrespective whether feature is supported or not
*
* This callback is mandatory
*/
void (*reset)(struct kvm_vcpu *vcpu);
/**
* @set: Set the feature value
*
* Return SBI_SUCCESS on success or an SBI error (SBI_ERR_*)
*
* This callback is mandatory
*/
long (*set)(struct kvm_vcpu *vcpu, struct kvm_sbi_fwft_config *conf,
bool one_reg_access, unsigned long value);
/**
* @get: Get the feature current value
*
* Return SBI_SUCCESS on success or an SBI error (SBI_ERR_*)
*
* This callback is mandatory
*/
long (*get)(struct kvm_vcpu *vcpu, struct kvm_sbi_fwft_config *conf,
bool one_reg_access, unsigned long *value);
};
static const enum sbi_fwft_feature_t kvm_fwft_defined_features[] = {
SBI_FWFT_MISALIGNED_EXC_DELEG,
SBI_FWFT_LANDING_PAD,
SBI_FWFT_SHADOW_STACK,
SBI_FWFT_DOUBLE_TRAP,
SBI_FWFT_PTE_AD_HW_UPDATING,
SBI_FWFT_POINTER_MASKING_PMLEN,
};
static bool kvm_fwft_is_defined_feature(enum sbi_fwft_feature_t feature)
{
int i;
for (i = 0; i < ARRAY_SIZE(kvm_fwft_defined_features); i++) {
if (kvm_fwft_defined_features[i] == feature)
return true;
}
return false;
}
static bool kvm_sbi_fwft_misaligned_delegation_supported(struct kvm_vcpu *vcpu)
{
return misaligned_traps_can_delegate();
}
static void kvm_sbi_fwft_reset_misaligned_delegation(struct kvm_vcpu *vcpu)
{
struct kvm_vcpu_config *cfg = &vcpu->arch.cfg;
cfg->hedeleg &= ~MIS_DELEG;
}
static long kvm_sbi_fwft_set_misaligned_delegation(struct kvm_vcpu *vcpu,
struct kvm_sbi_fwft_config *conf,
bool one_reg_access, unsigned long value)
{
struct kvm_vcpu_config *cfg = &vcpu->arch.cfg;
if (value == 1) {
cfg->hedeleg |= MIS_DELEG;
if (!one_reg_access)
csr_set(CSR_HEDELEG, MIS_DELEG);
} else if (value == 0) {
cfg->hedeleg &= ~MIS_DELEG;
if (!one_reg_access)
csr_clear(CSR_HEDELEG, MIS_DELEG);
} else {
return SBI_ERR_INVALID_PARAM;
}
return SBI_SUCCESS;
}
static long kvm_sbi_fwft_get_misaligned_delegation(struct kvm_vcpu *vcpu,
struct kvm_sbi_fwft_config *conf,
bool one_reg_access, unsigned long *value)
{
struct kvm_vcpu_config *cfg = &vcpu->arch.cfg;
*value = (cfg->hedeleg & MIS_DELEG) == MIS_DELEG;
return SBI_SUCCESS;
}
#ifndef CONFIG_32BIT
static bool try_to_set_pmm(unsigned long value)
{
csr_set(CSR_HENVCFG, value);
return (csr_read_clear(CSR_HENVCFG, ENVCFG_PMM) & ENVCFG_PMM) == value;
}
static bool kvm_sbi_fwft_pointer_masking_pmlen_supported(struct kvm_vcpu *vcpu)
{
struct kvm_sbi_fwft *fwft = vcpu_to_fwft(vcpu);
if (!riscv_isa_extension_available(vcpu->arch.isa, SMNPM))
return false;
fwft->have_vs_pmlen_7 = try_to_set_pmm(ENVCFG_PMM_PMLEN_7);
fwft->have_vs_pmlen_16 = try_to_set_pmm(ENVCFG_PMM_PMLEN_16);
return fwft->have_vs_pmlen_7 || fwft->have_vs_pmlen_16;
}
static void kvm_sbi_fwft_reset_pointer_masking_pmlen(struct kvm_vcpu *vcpu)
{
vcpu->arch.cfg.henvcfg &= ~ENVCFG_PMM;
}
static long kvm_sbi_fwft_set_pointer_masking_pmlen(struct kvm_vcpu *vcpu,
struct kvm_sbi_fwft_config *conf,
bool one_reg_access, unsigned long value)
{
struct kvm_sbi_fwft *fwft = vcpu_to_fwft(vcpu);
unsigned long pmm;
switch (value) {
case 0:
pmm = ENVCFG_PMM_PMLEN_0;
break;
case 7:
if (!fwft->have_vs_pmlen_7)
return SBI_ERR_INVALID_PARAM;
pmm = ENVCFG_PMM_PMLEN_7;
break;
case 16:
if (!fwft->have_vs_pmlen_16)
return SBI_ERR_INVALID_PARAM;
pmm = ENVCFG_PMM_PMLEN_16;
break;
default:
return SBI_ERR_INVALID_PARAM;
}
vcpu->arch.cfg.henvcfg &= ~ENVCFG_PMM;
vcpu->arch.cfg.henvcfg |= pmm;
/*
* Instead of waiting for vcpu_load/put() to update HENVCFG CSR,
* update here so that VCPU see's pointer masking mode change
* immediately.
*/
if (!one_reg_access)
csr_write(CSR_HENVCFG, vcpu->arch.cfg.henvcfg);
return SBI_SUCCESS;
}
static long kvm_sbi_fwft_get_pointer_masking_pmlen(struct kvm_vcpu *vcpu,
struct kvm_sbi_fwft_config *conf,
bool one_reg_access, unsigned long *value)
{
switch (vcpu->arch.cfg.henvcfg & ENVCFG_PMM) {
case ENVCFG_PMM_PMLEN_0:
*value = 0;
break;
case ENVCFG_PMM_PMLEN_7:
*value = 7;
break;
case ENVCFG_PMM_PMLEN_16:
*value = 16;
break;
default:
return SBI_ERR_FAILURE;
}
return SBI_SUCCESS;
}
#endif
static const struct kvm_sbi_fwft_feature features[] = {
{
.id = SBI_FWFT_MISALIGNED_EXC_DELEG,
.first_reg_num = offsetof(struct kvm_riscv_sbi_fwft, misaligned_deleg.enable) /
sizeof(unsigned long),
.supported = kvm_sbi_fwft_misaligned_delegation_supported,
.reset = kvm_sbi_fwft_reset_misaligned_delegation,
.set = kvm_sbi_fwft_set_misaligned_delegation,
.get = kvm_sbi_fwft_get_misaligned_delegation,
},
#ifndef CONFIG_32BIT
{
.id = SBI_FWFT_POINTER_MASKING_PMLEN,
.first_reg_num = offsetof(struct kvm_riscv_sbi_fwft, pointer_masking.enable) /
sizeof(unsigned long),
.supported = kvm_sbi_fwft_pointer_masking_pmlen_supported,
.reset = kvm_sbi_fwft_reset_pointer_masking_pmlen,
.set = kvm_sbi_fwft_set_pointer_masking_pmlen,
.get = kvm_sbi_fwft_get_pointer_masking_pmlen,
},
#endif
};
static const struct kvm_sbi_fwft_feature *kvm_sbi_fwft_regnum_to_feature(unsigned long reg_num)
{
const struct kvm_sbi_fwft_feature *feature;
int i;
for (i = 0; i < ARRAY_SIZE(features); i++) {
feature = &features[i];
if (feature->first_reg_num <= reg_num && reg_num < (feature->first_reg_num + 3))
return feature;
}
return NULL;
}
static struct kvm_sbi_fwft_config *
kvm_sbi_fwft_get_config(struct kvm_vcpu *vcpu, enum sbi_fwft_feature_t feature)
{
int i;
struct kvm_sbi_fwft *fwft = vcpu_to_fwft(vcpu);
for (i = 0; i < ARRAY_SIZE(features); i++) {
if (fwft->configs[i].feature->id == feature)
return &fwft->configs[i];
}
return NULL;
}
static int kvm_fwft_get_feature(struct kvm_vcpu *vcpu, u32 feature,
struct kvm_sbi_fwft_config **conf)
{
struct kvm_sbi_fwft_config *tconf;
tconf = kvm_sbi_fwft_get_config(vcpu, feature);
if (!tconf) {
if (kvm_fwft_is_defined_feature(feature))
return SBI_ERR_NOT_SUPPORTED;
return SBI_ERR_DENIED;
}
if (!tconf->supported || !tconf->enabled)
return SBI_ERR_NOT_SUPPORTED;
*conf = tconf;
return SBI_SUCCESS;
}
static int kvm_sbi_fwft_set(struct kvm_vcpu *vcpu, u32 feature,
unsigned long value, unsigned long flags)
{
int ret;
struct kvm_sbi_fwft_config *conf;
ret = kvm_fwft_get_feature(vcpu, feature, &conf);
if (ret)
return ret;
if ((flags & ~SBI_FWFT_SET_FLAG_LOCK) != 0)
return SBI_ERR_INVALID_PARAM;
if (conf->flags & SBI_FWFT_SET_FLAG_LOCK)
return SBI_ERR_DENIED_LOCKED;
conf->flags = flags;
return conf->feature->set(vcpu, conf, false, value);
}
static int kvm_sbi_fwft_get(struct kvm_vcpu *vcpu, unsigned long feature,
unsigned long *value)
{
int ret;
struct kvm_sbi_fwft_config *conf;
ret = kvm_fwft_get_feature(vcpu, feature, &conf);
if (ret)
return ret;
return conf->feature->get(vcpu, conf, false, value);
}
static int kvm_sbi_ext_fwft_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
struct kvm_vcpu_sbi_return *retdata)
{
int ret;
struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
unsigned long funcid = cp->a6;
switch (funcid) {
case SBI_EXT_FWFT_SET:
ret = kvm_sbi_fwft_set(vcpu, cp->a0, cp->a1, cp->a2);
break;
case SBI_EXT_FWFT_GET:
ret = kvm_sbi_fwft_get(vcpu, cp->a0, &retdata->out_val);
break;
default:
ret = SBI_ERR_NOT_SUPPORTED;
break;
}
retdata->err_val = ret;
return 0;
}
static int kvm_sbi_ext_fwft_init(struct kvm_vcpu *vcpu)
{
struct kvm_sbi_fwft *fwft = vcpu_to_fwft(vcpu);
const struct kvm_sbi_fwft_feature *feature;
struct kvm_sbi_fwft_config *conf;
int i;
fwft->configs = kcalloc(ARRAY_SIZE(features), sizeof(struct kvm_sbi_fwft_config),
GFP_KERNEL);
if (!fwft->configs)
return -ENOMEM;
for (i = 0; i < ARRAY_SIZE(features); i++) {
feature = &features[i];
conf = &fwft->configs[i];
if (feature->supported)
conf->supported = feature->supported(vcpu);
else
conf->supported = true;
conf->enabled = conf->supported;
conf->feature = feature;
}
return 0;
}
static void kvm_sbi_ext_fwft_deinit(struct kvm_vcpu *vcpu)
{
struct kvm_sbi_fwft *fwft = vcpu_to_fwft(vcpu);
kfree(fwft->configs);
}
static void kvm_sbi_ext_fwft_reset(struct kvm_vcpu *vcpu)
{
struct kvm_sbi_fwft *fwft = vcpu_to_fwft(vcpu);
const struct kvm_sbi_fwft_feature *feature;
int i;
for (i = 0; i < ARRAY_SIZE(features); i++) {
fwft->configs[i].flags = 0;
feature = &features[i];
if (feature->reset)
feature->reset(vcpu);
}
}
static unsigned long kvm_sbi_ext_fwft_get_reg_count(struct kvm_vcpu *vcpu)
{
unsigned long max_reg_count = sizeof(struct kvm_riscv_sbi_fwft) / sizeof(unsigned long);
const struct kvm_sbi_fwft_feature *feature;
struct kvm_sbi_fwft_config *conf;
unsigned long reg, ret = 0;
for (reg = 0; reg < max_reg_count; reg++) {
feature = kvm_sbi_fwft_regnum_to_feature(reg);
if (!feature)
continue;
conf = kvm_sbi_fwft_get_config(vcpu, feature->id);
if (!conf || !conf->supported)
continue;
ret++;
}
return ret;
}
static int kvm_sbi_ext_fwft_get_reg_id(struct kvm_vcpu *vcpu, int index, u64 *reg_id)
{
int reg, max_reg_count = sizeof(struct kvm_riscv_sbi_fwft) / sizeof(unsigned long);
const struct kvm_sbi_fwft_feature *feature;
struct kvm_sbi_fwft_config *conf;
int idx = 0;
for (reg = 0; reg < max_reg_count; reg++) {
feature = kvm_sbi_fwft_regnum_to_feature(reg);
if (!feature)
continue;
conf = kvm_sbi_fwft_get_config(vcpu, feature->id);
if (!conf || !conf->supported)
continue;
if (index == idx) {
*reg_id = KVM_REG_RISCV |
(IS_ENABLED(CONFIG_32BIT) ?
KVM_REG_SIZE_U32 : KVM_REG_SIZE_U64) |
KVM_REG_RISCV_SBI_STATE |
KVM_REG_RISCV_SBI_FWFT | reg;
return 0;
}
idx++;
}
return -ENOENT;
}
static int kvm_sbi_ext_fwft_get_reg(struct kvm_vcpu *vcpu, unsigned long reg_num,
unsigned long reg_size, void *reg_val)
{
const struct kvm_sbi_fwft_feature *feature;
struct kvm_sbi_fwft_config *conf;
unsigned long *value;
int ret = 0;
if (reg_size != sizeof(unsigned long))
return -EINVAL;
value = reg_val;
feature = kvm_sbi_fwft_regnum_to_feature(reg_num);
if (!feature)
return -ENOENT;
conf = kvm_sbi_fwft_get_config(vcpu, feature->id);
if (!conf || !conf->supported)
return -ENOENT;
switch (reg_num - feature->first_reg_num) {
case 0:
*value = conf->enabled;
break;
case 1:
*value = conf->flags;
break;
case 2:
ret = conf->feature->get(vcpu, conf, true, value);
break;
default:
return -ENOENT;
}
return sbi_err_map_linux_errno(ret);
}
static int kvm_sbi_ext_fwft_set_reg(struct kvm_vcpu *vcpu, unsigned long reg_num,
unsigned long reg_size, const void *reg_val)
{
const struct kvm_sbi_fwft_feature *feature;
struct kvm_sbi_fwft_config *conf;
unsigned long value;
int ret = 0;
if (reg_size != sizeof(unsigned long))
return -EINVAL;
value = *(const unsigned long *)reg_val;
feature = kvm_sbi_fwft_regnum_to_feature(reg_num);
if (!feature)
return -ENOENT;
conf = kvm_sbi_fwft_get_config(vcpu, feature->id);
if (!conf || !conf->supported)
return -ENOENT;
switch (reg_num - feature->first_reg_num) {
case 0:
switch (value) {
case 0:
conf->enabled = false;
break;
case 1:
conf->enabled = true;
break;
default:
return -EINVAL;
}
break;
case 1:
conf->flags = value & SBI_FWFT_SET_FLAG_LOCK;
break;
case 2:
ret = conf->feature->set(vcpu, conf, true, value);
break;
default:
return -ENOENT;
}
return sbi_err_map_linux_errno(ret);
}
const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_fwft = {
.extid_start = SBI_EXT_FWFT,
.extid_end = SBI_EXT_FWFT,
.handler = kvm_sbi_ext_fwft_handler,
.init = kvm_sbi_ext_fwft_init,
.deinit = kvm_sbi_ext_fwft_deinit,
.reset = kvm_sbi_ext_fwft_reset,
.state_reg_subtype = KVM_REG_RISCV_SBI_FWFT,
.get_state_reg_count = kvm_sbi_ext_fwft_get_reg_count,
.get_state_reg_id = kvm_sbi_ext_fwft_get_reg_id,
.get_state_reg = kvm_sbi_ext_fwft_get_reg,
.set_state_reg = kvm_sbi_ext_fwft_set_reg,
};

View File

@ -73,6 +73,9 @@ static int kvm_sbi_ext_pmu_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
case SBI_EXT_PMU_SNAPSHOT_SET_SHMEM: case SBI_EXT_PMU_SNAPSHOT_SET_SHMEM:
ret = kvm_riscv_vcpu_pmu_snapshot_set_shmem(vcpu, cp->a0, cp->a1, cp->a2, retdata); ret = kvm_riscv_vcpu_pmu_snapshot_set_shmem(vcpu, cp->a0, cp->a1, cp->a2, retdata);
break; break;
case SBI_EXT_PMU_EVENT_GET_INFO:
ret = kvm_riscv_vcpu_pmu_event_info(vcpu, cp->a0, cp->a1, cp->a2, cp->a3, retdata);
break;
default: default:
retdata->err_val = SBI_ERR_NOT_SUPPORTED; retdata->err_val = SBI_ERR_NOT_SUPPORTED;
} }

View File

@ -85,8 +85,6 @@ static int kvm_sbi_sta_steal_time_set_shmem(struct kvm_vcpu *vcpu)
unsigned long shmem_phys_hi = cp->a1; unsigned long shmem_phys_hi = cp->a1;
u32 flags = cp->a2; u32 flags = cp->a2;
struct sbi_sta_struct zero_sta = {0}; struct sbi_sta_struct zero_sta = {0};
unsigned long hva;
bool writable;
gpa_t shmem; gpa_t shmem;
int ret; int ret;
@ -111,13 +109,10 @@ static int kvm_sbi_sta_steal_time_set_shmem(struct kvm_vcpu *vcpu)
return SBI_ERR_INVALID_ADDRESS; return SBI_ERR_INVALID_ADDRESS;
} }
hva = kvm_vcpu_gfn_to_hva_prot(vcpu, shmem >> PAGE_SHIFT, &writable); /* No need to check writable slot explicitly as kvm_vcpu_write_guest does it internally */
if (kvm_is_error_hva(hva) || !writable)
return SBI_ERR_INVALID_ADDRESS;
ret = kvm_vcpu_write_guest(vcpu, shmem, &zero_sta, sizeof(zero_sta)); ret = kvm_vcpu_write_guest(vcpu, shmem, &zero_sta, sizeof(zero_sta));
if (ret) if (ret)
return SBI_ERR_FAILURE; return SBI_ERR_INVALID_ADDRESS;
vcpu->arch.sta.shmem = shmem; vcpu->arch.sta.shmem = shmem;
vcpu->arch.sta.last_steal = current->sched_info.run_delay; vcpu->arch.sta.last_steal = current->sched_info.run_delay;
@ -151,63 +146,82 @@ static unsigned long kvm_sbi_ext_sta_probe(struct kvm_vcpu *vcpu)
return !!sched_info_on(); return !!sched_info_on();
} }
const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_sta = { static unsigned long kvm_sbi_ext_sta_get_state_reg_count(struct kvm_vcpu *vcpu)
.extid_start = SBI_EXT_STA,
.extid_end = SBI_EXT_STA,
.handler = kvm_sbi_ext_sta_handler,
.probe = kvm_sbi_ext_sta_probe,
.reset = kvm_riscv_vcpu_sbi_sta_reset,
};
int kvm_riscv_vcpu_get_reg_sbi_sta(struct kvm_vcpu *vcpu,
unsigned long reg_num,
unsigned long *reg_val)
{ {
return sizeof(struct kvm_riscv_sbi_sta) / sizeof(unsigned long);
}
static int kvm_sbi_ext_sta_get_reg(struct kvm_vcpu *vcpu, unsigned long reg_num,
unsigned long reg_size, void *reg_val)
{
unsigned long *value;
if (reg_size != sizeof(unsigned long))
return -EINVAL;
value = reg_val;
switch (reg_num) { switch (reg_num) {
case KVM_REG_RISCV_SBI_STA_REG(shmem_lo): case KVM_REG_RISCV_SBI_STA_REG(shmem_lo):
*reg_val = (unsigned long)vcpu->arch.sta.shmem; *value = (unsigned long)vcpu->arch.sta.shmem;
break; break;
case KVM_REG_RISCV_SBI_STA_REG(shmem_hi): case KVM_REG_RISCV_SBI_STA_REG(shmem_hi):
if (IS_ENABLED(CONFIG_32BIT)) if (IS_ENABLED(CONFIG_32BIT))
*reg_val = upper_32_bits(vcpu->arch.sta.shmem); *value = upper_32_bits(vcpu->arch.sta.shmem);
else else
*reg_val = 0; *value = 0;
break; break;
default: default:
return -EINVAL; return -ENOENT;
} }
return 0; return 0;
} }
int kvm_riscv_vcpu_set_reg_sbi_sta(struct kvm_vcpu *vcpu, static int kvm_sbi_ext_sta_set_reg(struct kvm_vcpu *vcpu, unsigned long reg_num,
unsigned long reg_num, unsigned long reg_size, const void *reg_val)
unsigned long reg_val)
{ {
unsigned long value;
if (reg_size != sizeof(unsigned long))
return -EINVAL;
value = *(const unsigned long *)reg_val;
switch (reg_num) { switch (reg_num) {
case KVM_REG_RISCV_SBI_STA_REG(shmem_lo): case KVM_REG_RISCV_SBI_STA_REG(shmem_lo):
if (IS_ENABLED(CONFIG_32BIT)) { if (IS_ENABLED(CONFIG_32BIT)) {
gpa_t hi = upper_32_bits(vcpu->arch.sta.shmem); gpa_t hi = upper_32_bits(vcpu->arch.sta.shmem);
vcpu->arch.sta.shmem = reg_val; vcpu->arch.sta.shmem = value;
vcpu->arch.sta.shmem |= hi << 32; vcpu->arch.sta.shmem |= hi << 32;
} else { } else {
vcpu->arch.sta.shmem = reg_val; vcpu->arch.sta.shmem = value;
} }
break; break;
case KVM_REG_RISCV_SBI_STA_REG(shmem_hi): case KVM_REG_RISCV_SBI_STA_REG(shmem_hi):
if (IS_ENABLED(CONFIG_32BIT)) { if (IS_ENABLED(CONFIG_32BIT)) {
gpa_t lo = lower_32_bits(vcpu->arch.sta.shmem); gpa_t lo = lower_32_bits(vcpu->arch.sta.shmem);
vcpu->arch.sta.shmem = ((gpa_t)reg_val << 32); vcpu->arch.sta.shmem = ((gpa_t)value << 32);
vcpu->arch.sta.shmem |= lo; vcpu->arch.sta.shmem |= lo;
} else if (reg_val != 0) { } else if (value != 0) {
return -EINVAL; return -EINVAL;
} }
break; break;
default: default:
return -EINVAL; return -ENOENT;
} }
return 0; return 0;
} }
const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_sta = {
.extid_start = SBI_EXT_STA,
.extid_end = SBI_EXT_STA,
.handler = kvm_sbi_ext_sta_handler,
.probe = kvm_sbi_ext_sta_probe,
.reset = kvm_riscv_vcpu_sbi_sta_reset,
.state_reg_subtype = KVM_REG_RISCV_SBI_STA,
.get_state_reg_count = kvm_sbi_ext_sta_get_state_reg_count,
.get_state_reg = kvm_sbi_ext_sta_get_reg,
.set_state_reg = kvm_sbi_ext_sta_set_reg,
};

View File

@ -14,6 +14,7 @@
#include <linux/smp.h> #include <linux/smp.h>
#include <linux/kvm_host.h> #include <linux/kvm_host.h>
#include <asm/csr.h> #include <asm/csr.h>
#include <asm/kvm_mmu.h>
#include <asm/kvm_tlb.h> #include <asm/kvm_tlb.h>
#include <asm/kvm_vmid.h> #include <asm/kvm_vmid.h>
@ -24,15 +25,12 @@ static DEFINE_SPINLOCK(vmid_lock);
void __init kvm_riscv_gstage_vmid_detect(void) void __init kvm_riscv_gstage_vmid_detect(void)
{ {
unsigned long old;
/* Figure-out number of VMID bits in HW */ /* Figure-out number of VMID bits in HW */
old = csr_read(CSR_HGATP); csr_write(CSR_HGATP, (kvm_riscv_gstage_mode << HGATP_MODE_SHIFT) | HGATP_VMID);
csr_write(CSR_HGATP, old | HGATP_VMID);
vmid_bits = csr_read(CSR_HGATP); vmid_bits = csr_read(CSR_HGATP);
vmid_bits = (vmid_bits & HGATP_VMID) >> HGATP_VMID_SHIFT; vmid_bits = (vmid_bits & HGATP_VMID) >> HGATP_VMID_SHIFT;
vmid_bits = fls_long(vmid_bits); vmid_bits = fls_long(vmid_bits);
csr_write(CSR_HGATP, old); csr_write(CSR_HGATP, 0);
/* We polluted local TLB so flush all guest TLB */ /* We polluted local TLB so flush all guest TLB */
kvm_riscv_local_hfence_gvma_all(); kvm_riscv_local_hfence_gvma_all();

View File

@ -356,7 +356,7 @@ struct kvm_s390_float_interrupt {
int counters[FIRQ_MAX_COUNT]; int counters[FIRQ_MAX_COUNT];
struct kvm_s390_mchk_info mchk; struct kvm_s390_mchk_info mchk;
struct kvm_s390_ext_info srv_signal; struct kvm_s390_ext_info srv_signal;
int next_rr_cpu; int last_sleep_cpu;
struct mutex ais_lock; struct mutex ais_lock;
u8 simm; u8 simm;
u8 nimm; u8 nimm;

View File

@ -2055,4 +2055,26 @@ static inline unsigned long gmap_pgste_get_pgt_addr(unsigned long *pgt)
return res; return res;
} }
static inline pgste_t pgste_get_lock(pte_t *ptep)
{
unsigned long value = 0;
#ifdef CONFIG_PGSTE
unsigned long *ptr = (unsigned long *)(ptep + PTRS_PER_PTE);
do {
value = __atomic64_or_barrier(PGSTE_PCL_BIT, ptr);
} while (value & PGSTE_PCL_BIT);
value |= PGSTE_PCL_BIT;
#endif
return __pgste(value);
}
static inline void pgste_set_unlock(pte_t *ptep, pgste_t pgste)
{
#ifdef CONFIG_PGSTE
barrier();
WRITE_ONCE(*(unsigned long *)(ptep + PTRS_PER_PTE), pgste_val(pgste) & ~PGSTE_PCL_BIT);
#endif
}
#endif /* _S390_PAGE_H */ #endif /* _S390_PAGE_H */

View File

@ -1323,6 +1323,7 @@ int kvm_s390_handle_wait(struct kvm_vcpu *vcpu)
VCPU_EVENT(vcpu, 4, "enabled wait: %llu ns", sltime); VCPU_EVENT(vcpu, 4, "enabled wait: %llu ns", sltime);
no_timer: no_timer:
kvm_vcpu_srcu_read_unlock(vcpu); kvm_vcpu_srcu_read_unlock(vcpu);
vcpu->kvm->arch.float_int.last_sleep_cpu = vcpu->vcpu_idx;
kvm_vcpu_halt(vcpu); kvm_vcpu_halt(vcpu);
vcpu->valid_wakeup = false; vcpu->valid_wakeup = false;
__unset_cpu_idle(vcpu); __unset_cpu_idle(vcpu);
@ -1949,18 +1950,15 @@ static void __floating_irq_kick(struct kvm *kvm, u64 type)
if (!online_vcpus) if (!online_vcpus)
return; return;
/* find idle VCPUs first, then round robin */ for (sigcpu = kvm->arch.float_int.last_sleep_cpu; ; sigcpu++) {
sigcpu = find_first_bit(kvm->arch.idle_mask, online_vcpus); sigcpu %= online_vcpus;
if (sigcpu == online_vcpus) { dst_vcpu = kvm_get_vcpu(kvm, sigcpu);
do { if (!is_vcpu_stopped(dst_vcpu))
sigcpu = kvm->arch.float_int.next_rr_cpu++; break;
kvm->arch.float_int.next_rr_cpu %= online_vcpus; /* avoid endless loops if all vcpus are stopped */
/* avoid endless loops if all vcpus are stopped */ if (nr_tries++ >= online_vcpus)
if (nr_tries++ >= online_vcpus) return;
return;
} while (is_vcpu_stopped(kvm_get_vcpu(kvm, sigcpu)));
} }
dst_vcpu = kvm_get_vcpu(kvm, sigcpu);
/* make the VCPU drop out of the SIE, or wake it up if sleeping */ /* make the VCPU drop out of the SIE, or wake it up if sleeping */
switch (type) { switch (type) {

View File

@ -15,6 +15,7 @@
#include <linux/pagewalk.h> #include <linux/pagewalk.h>
#include <linux/ksm.h> #include <linux/ksm.h>
#include <asm/gmap_helpers.h> #include <asm/gmap_helpers.h>
#include <asm/pgtable.h>
/** /**
* ptep_zap_swap_entry() - discard a swap entry. * ptep_zap_swap_entry() - discard a swap entry.
@ -47,6 +48,7 @@ void gmap_helper_zap_one_page(struct mm_struct *mm, unsigned long vmaddr)
{ {
struct vm_area_struct *vma; struct vm_area_struct *vma;
spinlock_t *ptl; spinlock_t *ptl;
pgste_t pgste;
pte_t *ptep; pte_t *ptep;
mmap_assert_locked(mm); mmap_assert_locked(mm);
@ -60,8 +62,16 @@ void gmap_helper_zap_one_page(struct mm_struct *mm, unsigned long vmaddr)
ptep = get_locked_pte(mm, vmaddr, &ptl); ptep = get_locked_pte(mm, vmaddr, &ptl);
if (unlikely(!ptep)) if (unlikely(!ptep))
return; return;
if (pte_swap(*ptep)) if (pte_swap(*ptep)) {
preempt_disable();
pgste = pgste_get_lock(ptep);
ptep_zap_swap_entry(mm, pte_to_swp_entry(*ptep)); ptep_zap_swap_entry(mm, pte_to_swp_entry(*ptep));
pte_clear(mm, vmaddr, ptep);
pgste_set_unlock(ptep, pgste);
preempt_enable();
}
pte_unmap_unlock(ptep, ptl); pte_unmap_unlock(ptep, ptl);
} }
EXPORT_SYMBOL_GPL(gmap_helper_zap_one_page); EXPORT_SYMBOL_GPL(gmap_helper_zap_one_page);

View File

@ -24,6 +24,7 @@
#include <asm/tlbflush.h> #include <asm/tlbflush.h>
#include <asm/mmu_context.h> #include <asm/mmu_context.h>
#include <asm/page-states.h> #include <asm/page-states.h>
#include <asm/pgtable.h>
#include <asm/machine.h> #include <asm/machine.h>
pgprot_t pgprot_writecombine(pgprot_t prot) pgprot_t pgprot_writecombine(pgprot_t prot)
@ -115,28 +116,6 @@ static inline pte_t ptep_flush_lazy(struct mm_struct *mm,
return old; return old;
} }
static inline pgste_t pgste_get_lock(pte_t *ptep)
{
unsigned long value = 0;
#ifdef CONFIG_PGSTE
unsigned long *ptr = (unsigned long *)(ptep + PTRS_PER_PTE);
do {
value = __atomic64_or_barrier(PGSTE_PCL_BIT, ptr);
} while (value & PGSTE_PCL_BIT);
value |= PGSTE_PCL_BIT;
#endif
return __pgste(value);
}
static inline void pgste_set_unlock(pte_t *ptep, pgste_t pgste)
{
#ifdef CONFIG_PGSTE
barrier();
WRITE_ONCE(*(unsigned long *)(ptep + PTRS_PER_PTE), pgste_val(pgste) & ~PGSTE_PCL_BIT);
#endif
}
static inline pgste_t pgste_get(pte_t *ptep) static inline pgste_t pgste_get(pte_t *ptep)
{ {
unsigned long pgste = 0; unsigned long pgste = 0;

View File

@ -145,7 +145,7 @@ KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
KVM_X86_OP_OPTIONAL(get_untagged_addr) KVM_X86_OP_OPTIONAL(get_untagged_addr)
KVM_X86_OP_OPTIONAL(alloc_apic_backing_page) KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
KVM_X86_OP_OPTIONAL_RET0(gmem_prepare) KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
KVM_X86_OP_OPTIONAL_RET0(private_max_mapping_level) KVM_X86_OP_OPTIONAL_RET0(gmem_max_mapping_level)
KVM_X86_OP_OPTIONAL(gmem_invalidate) KVM_X86_OP_OPTIONAL(gmem_invalidate)
#undef KVM_X86_OP #undef KVM_X86_OP

View File

@ -1922,7 +1922,7 @@ struct kvm_x86_ops {
void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu); void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order); int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end); void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end);
int (*private_max_mapping_level)(struct kvm *kvm, kvm_pfn_t pfn); int (*gmem_max_mapping_level)(struct kvm *kvm, kvm_pfn_t pfn, bool is_private);
}; };
struct kvm_x86_nested_ops { struct kvm_x86_nested_ops {
@ -2276,10 +2276,8 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
int tdp_max_root_level, int tdp_huge_page_level); int tdp_max_root_level, int tdp_huge_page_level);
#ifdef CONFIG_KVM_PRIVATE_MEM #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem) #define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
#else
#define kvm_arch_has_private_mem(kvm) false
#endif #endif
#define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state) #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)

View File

@ -124,7 +124,6 @@ bool kvm_para_available(void);
unsigned int kvm_arch_para_features(void); unsigned int kvm_arch_para_features(void);
unsigned int kvm_arch_para_hints(void); unsigned int kvm_arch_para_hints(void);
void kvm_async_pf_task_wait_schedule(u32 token); void kvm_async_pf_task_wait_schedule(u32 token);
void kvm_async_pf_task_wake(u32 token);
u32 kvm_read_and_reset_apf_flags(void); u32 kvm_read_and_reset_apf_flags(void);
bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token); bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token);
@ -148,7 +147,6 @@ static inline void kvm_spinlock_init(void)
#else /* CONFIG_KVM_GUEST */ #else /* CONFIG_KVM_GUEST */
#define kvm_async_pf_task_wait_schedule(T) do {} while(0) #define kvm_async_pf_task_wait_schedule(T) do {} while(0)
#define kvm_async_pf_task_wake(T) do {} while(0)
static inline bool kvm_para_available(void) static inline bool kvm_para_available(void)
{ {

View File

@ -190,7 +190,7 @@ static void apf_task_wake_all(void)
} }
} }
void kvm_async_pf_task_wake(u32 token) static void kvm_async_pf_task_wake(u32 token)
{ {
u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS); u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS);
struct kvm_task_sleep_head *b = &async_pf_sleepers[key]; struct kvm_task_sleep_head *b = &async_pf_sleepers[key];
@ -241,7 +241,6 @@ again:
/* A dummy token might be allocated and ultimately not used. */ /* A dummy token might be allocated and ultimately not used. */
kfree(dummy); kfree(dummy);
} }
EXPORT_SYMBOL_GPL(kvm_async_pf_task_wake);
noinstr u32 kvm_read_and_reset_apf_flags(void) noinstr u32 kvm_read_and_reset_apf_flags(void)
{ {
@ -933,6 +932,19 @@ static void kvm_sev_hc_page_enc_status(unsigned long pfn, int npages, bool enc)
static void __init kvm_init_platform(void) static void __init kvm_init_platform(void)
{ {
u64 tolud = PFN_PHYS(e820__end_of_low_ram_pfn());
/*
* Note, hardware requires variable MTRR ranges to be power-of-2 sized
* and naturally aligned. But when forcing guest MTRR state, Linux
* doesn't program the forced ranges into hardware. Don't bother doing
* the math to generate a technically-legal range.
*/
struct mtrr_var_range pci_hole = {
.base_lo = tolud | X86_MEMTYPE_UC,
.mask_lo = (u32)(~(SZ_4G - tolud - 1)) | MTRR_PHYSMASK_V,
.mask_hi = (BIT_ULL(boot_cpu_data.x86_phys_bits) - 1) >> 32,
};
if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) && if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) &&
kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL)) { kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL)) {
unsigned long nr_pages; unsigned long nr_pages;
@ -982,8 +994,12 @@ static void __init kvm_init_platform(void)
kvmclock_init(); kvmclock_init();
x86_platform.apic_post_init = kvm_apic_init; x86_platform.apic_post_init = kvm_apic_init;
/* Set WB as the default cache mode for SEV-SNP and TDX */ /*
guest_force_mtrr_state(NULL, 0, MTRR_TYPE_WRBACK); * Set WB as the default cache mode for SEV-SNP and TDX, with a single
* UC range for the legacy PCI hole, e.g. so that devices that expect
* to get UC/WC mappings don't get surprised with WB.
*/
guest_force_mtrr_state(&pci_hole, 1, MTRR_TYPE_WRBACK);
} }
#if defined(CONFIG_AMD_MEM_ENCRYPT) #if defined(CONFIG_AMD_MEM_ENCRYPT)
@ -1072,16 +1088,6 @@ static void kvm_wait(u8 *ptr, u8 val)
*/ */
void __init kvm_spinlock_init(void) void __init kvm_spinlock_init(void)
{ {
/*
* In case host doesn't support KVM_FEATURE_PV_UNHALT there is still an
* advantage of keeping virt_spin_lock_key enabled: virt_spin_lock() is
* preferred over native qspinlock when vCPU is preempted.
*/
if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT)) {
pr_info("PV spinlocks disabled, no host support\n");
return;
}
/* /*
* Disable PV spinlocks and use native qspinlock when dedicated pCPUs * Disable PV spinlocks and use native qspinlock when dedicated pCPUs
* are available. * are available.
@ -1101,6 +1107,16 @@ void __init kvm_spinlock_init(void)
goto out; goto out;
} }
/*
* In case host doesn't support KVM_FEATURE_PV_UNHALT there is still an
* advantage of keeping virt_spin_lock_key enabled: virt_spin_lock() is
* preferred over native qspinlock when vCPU is preempted.
*/
if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT)) {
pr_info("PV spinlocks disabled, no host support\n");
return;
}
pr_info("PV spinlocks enabled\n"); pr_info("PV spinlocks enabled\n");
__pv_init_lock_hash(); __pv_init_lock_hash();

View File

@ -46,8 +46,8 @@ config KVM_X86
select HAVE_KVM_PM_NOTIFIER if PM select HAVE_KVM_PM_NOTIFIER if PM
select KVM_GENERIC_HARDWARE_ENABLING select KVM_GENERIC_HARDWARE_ENABLING
select KVM_GENERIC_PRE_FAULT_MEMORY select KVM_GENERIC_PRE_FAULT_MEMORY
select KVM_GENERIC_PRIVATE_MEM if KVM_SW_PROTECTED_VM
select KVM_WERROR if WERROR select KVM_WERROR if WERROR
select KVM_GUEST_MEMFD if X86_64
config KVM config KVM
tristate "Kernel-based Virtual Machine (KVM) support" tristate "Kernel-based Virtual Machine (KVM) support"
@ -74,7 +74,7 @@ config KVM_WERROR
# FRAME_WARN, i.e. KVM_WERROR=y with KASAN=y requires special tuning. # FRAME_WARN, i.e. KVM_WERROR=y with KASAN=y requires special tuning.
# Building KVM with -Werror and KASAN is still doable via enabling # Building KVM with -Werror and KASAN is still doable via enabling
# the kernel-wide WERROR=y. # the kernel-wide WERROR=y.
depends on KVM && ((EXPERT && !KASAN) || WERROR) depends on KVM_X86 && ((EXPERT && !KASAN) || WERROR)
help help
Add -Werror to the build flags for KVM. Add -Werror to the build flags for KVM.
@ -83,7 +83,8 @@ config KVM_WERROR
config KVM_SW_PROTECTED_VM config KVM_SW_PROTECTED_VM
bool "Enable support for KVM software-protected VMs" bool "Enable support for KVM software-protected VMs"
depends on EXPERT depends on EXPERT
depends on KVM && X86_64 depends on KVM_X86 && X86_64
select KVM_GENERIC_MEMORY_ATTRIBUTES
help help
Enable support for KVM software-protected VMs. Currently, software- Enable support for KVM software-protected VMs. Currently, software-
protected VMs are purely a development and testing vehicle for protected VMs are purely a development and testing vehicle for
@ -95,8 +96,6 @@ config KVM_SW_PROTECTED_VM
config KVM_INTEL config KVM_INTEL
tristate "KVM for Intel (and compatible) processors support" tristate "KVM for Intel (and compatible) processors support"
depends on KVM && IA32_FEAT_CTL depends on KVM && IA32_FEAT_CTL
select KVM_GENERIC_PRIVATE_MEM if INTEL_TDX_HOST
select KVM_GENERIC_MEMORY_ATTRIBUTES if INTEL_TDX_HOST
help help
Provides support for KVM on processors equipped with Intel's VT Provides support for KVM on processors equipped with Intel's VT
extensions, a.k.a. Virtual Machine Extensions (VMX). extensions, a.k.a. Virtual Machine Extensions (VMX).
@ -135,6 +134,8 @@ config KVM_INTEL_TDX
bool "Intel Trust Domain Extensions (TDX) support" bool "Intel Trust Domain Extensions (TDX) support"
default y default y
depends on INTEL_TDX_HOST depends on INTEL_TDX_HOST
select KVM_GENERIC_MEMORY_ATTRIBUTES
select HAVE_KVM_ARCH_GMEM_POPULATE
help help
Provides support for launching Intel Trust Domain Extensions (TDX) Provides support for launching Intel Trust Domain Extensions (TDX)
confidential VMs on Intel processors. confidential VMs on Intel processors.
@ -157,9 +158,10 @@ config KVM_AMD_SEV
depends on KVM_AMD && X86_64 depends on KVM_AMD && X86_64
depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m) depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
select ARCH_HAS_CC_PLATFORM select ARCH_HAS_CC_PLATFORM
select KVM_GENERIC_PRIVATE_MEM select KVM_GENERIC_MEMORY_ATTRIBUTES
select HAVE_KVM_ARCH_GMEM_PREPARE select HAVE_KVM_ARCH_GMEM_PREPARE
select HAVE_KVM_ARCH_GMEM_INVALIDATE select HAVE_KVM_ARCH_GMEM_INVALIDATE
select HAVE_KVM_ARCH_GMEM_POPULATE
help help
Provides support for launching encrypted VMs which use Secure Provides support for launching encrypted VMs which use Secure
Encrypted Virtualization (SEV), Secure Encrypted Virtualization with Encrypted Virtualization (SEV), Secure Encrypted Virtualization with
@ -169,7 +171,7 @@ config KVM_AMD_SEV
config KVM_IOAPIC config KVM_IOAPIC
bool "I/O APIC, PIC, and PIT emulation" bool "I/O APIC, PIC, and PIT emulation"
default y default y
depends on KVM depends on KVM_X86
help help
Provides support for KVM to emulate an I/O APIC, PIC, and PIT, i.e. Provides support for KVM to emulate an I/O APIC, PIC, and PIT, i.e.
for full in-kernel APIC emulation. for full in-kernel APIC emulation.
@ -179,7 +181,7 @@ config KVM_IOAPIC
config KVM_SMM config KVM_SMM
bool "System Management Mode emulation" bool "System Management Mode emulation"
default y default y
depends on KVM depends on KVM_X86
help help
Provides support for KVM to emulate System Management Mode (SMM) Provides support for KVM to emulate System Management Mode (SMM)
in virtual machines. This can be used by the virtual machine in virtual machines. This can be used by the virtual machine
@ -189,7 +191,7 @@ config KVM_SMM
config KVM_HYPERV config KVM_HYPERV
bool "Support for Microsoft Hyper-V emulation" bool "Support for Microsoft Hyper-V emulation"
depends on KVM depends on KVM_X86
default y default y
help help
Provides KVM support for emulating Microsoft Hyper-V. This allows KVM Provides KVM support for emulating Microsoft Hyper-V. This allows KVM
@ -203,7 +205,7 @@ config KVM_HYPERV
config KVM_XEN config KVM_XEN
bool "Support for Xen hypercall interface" bool "Support for Xen hypercall interface"
depends on KVM depends on KVM_X86
help help
Provides KVM support for the hosting Xen HVM guests and Provides KVM support for the hosting Xen HVM guests and
passing Xen hypercalls to userspace. passing Xen hypercalls to userspace.
@ -213,7 +215,7 @@ config KVM_XEN
config KVM_PROVE_MMU config KVM_PROVE_MMU
bool "Prove KVM MMU correctness" bool "Prove KVM MMU correctness"
depends on DEBUG_KERNEL depends on DEBUG_KERNEL
depends on KVM depends on KVM_X86
depends on EXPERT depends on EXPERT
help help
Enables runtime assertions in KVM's MMU that are too costly to enable Enables runtime assertions in KVM's MMU that are too costly to enable
@ -228,7 +230,7 @@ config KVM_EXTERNAL_WRITE_TRACKING
config KVM_MAX_NR_VCPUS config KVM_MAX_NR_VCPUS
int "Maximum number of vCPUs per KVM guest" int "Maximum number of vCPUs per KVM guest"
depends on KVM depends on KVM_X86
range 1024 4096 range 1024 4096
default 4096 if MAXSMP default 4096 if MAXSMP
default 1024 default 1024

View File

@ -3285,12 +3285,72 @@ out:
return level; return level;
} }
static int __kvm_mmu_max_mapping_level(struct kvm *kvm, static u8 kvm_max_level_for_order(int order)
const struct kvm_memory_slot *slot, {
gfn_t gfn, int max_level, bool is_private) BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G);
KVM_MMU_WARN_ON(order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G) &&
order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M) &&
order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K));
if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G))
return PG_LEVEL_1G;
if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
return PG_LEVEL_2M;
return PG_LEVEL_4K;
}
static u8 kvm_gmem_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,
const struct kvm_memory_slot *slot, gfn_t gfn,
bool is_private)
{
u8 max_level, coco_level;
kvm_pfn_t pfn;
/* For faults, use the gmem information that was resolved earlier. */
if (fault) {
pfn = fault->pfn;
max_level = fault->max_level;
} else {
/* TODO: Call into guest_memfd once hugepages are supported. */
WARN_ONCE(1, "Get pfn+order from guest_memfd");
pfn = KVM_PFN_ERR_FAULT;
max_level = PG_LEVEL_4K;
}
if (max_level == PG_LEVEL_4K)
return max_level;
/*
* CoCo may influence the max mapping level, e.g. due to RMP or S-EPT
* restrictions. A return of '0' means "no additional restrictions", to
* allow for using an optional "ret0" static call.
*/
coco_level = kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private);
if (coco_level)
max_level = min(max_level, coco_level);
return max_level;
}
int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,
const struct kvm_memory_slot *slot, gfn_t gfn)
{ {
struct kvm_lpage_info *linfo; struct kvm_lpage_info *linfo;
int host_level; int host_level, max_level;
bool is_private;
lockdep_assert_held(&kvm->mmu_lock);
if (fault) {
max_level = fault->max_level;
is_private = fault->is_private;
} else {
max_level = PG_LEVEL_NUM;
is_private = kvm_mem_is_private(kvm, gfn);
}
max_level = min(max_level, max_huge_page_level); max_level = min(max_level, max_huge_page_level);
for ( ; max_level > PG_LEVEL_4K; max_level--) { for ( ; max_level > PG_LEVEL_4K; max_level--) {
@ -3299,25 +3359,17 @@ static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
break; break;
} }
if (is_private)
return max_level;
if (max_level == PG_LEVEL_4K) if (max_level == PG_LEVEL_4K)
return PG_LEVEL_4K; return PG_LEVEL_4K;
host_level = host_pfn_mapping_level(kvm, gfn, slot); if (is_private || kvm_memslot_is_gmem_only(slot))
host_level = kvm_gmem_max_mapping_level(kvm, fault, slot, gfn,
is_private);
else
host_level = host_pfn_mapping_level(kvm, gfn, slot);
return min(host_level, max_level); return min(host_level, max_level);
} }
int kvm_mmu_max_mapping_level(struct kvm *kvm,
const struct kvm_memory_slot *slot, gfn_t gfn)
{
bool is_private = kvm_slot_can_be_private(slot) &&
kvm_mem_is_private(kvm, gfn);
return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_private);
}
void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{ {
struct kvm_memory_slot *slot = fault->slot; struct kvm_memory_slot *slot = fault->slot;
@ -3338,9 +3390,8 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
* Enforce the iTLB multihit workaround after capturing the requested * Enforce the iTLB multihit workaround after capturing the requested
* level, which will be used to do precise, accurate accounting. * level, which will be used to do precise, accurate accounting.
*/ */
fault->req_level = __kvm_mmu_max_mapping_level(vcpu->kvm, slot, fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, fault,
fault->gfn, fault->max_level, fault->slot, fault->gfn);
fault->is_private);
if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed) if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
return; return;
@ -4503,42 +4554,6 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
vcpu->stat.pf_fixed++; vcpu->stat.pf_fixed++;
} }
static inline u8 kvm_max_level_for_order(int order)
{
BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G);
KVM_MMU_WARN_ON(order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G) &&
order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M) &&
order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K));
if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G))
return PG_LEVEL_1G;
if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
return PG_LEVEL_2M;
return PG_LEVEL_4K;
}
static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
u8 max_level, int gmem_order)
{
u8 req_max_level;
if (max_level == PG_LEVEL_4K)
return PG_LEVEL_4K;
max_level = min(kvm_max_level_for_order(gmem_order), max_level);
if (max_level == PG_LEVEL_4K)
return PG_LEVEL_4K;
req_max_level = kvm_x86_call(private_max_mapping_level)(kvm, pfn);
if (req_max_level)
max_level = min(max_level, req_max_level);
return max_level;
}
static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu, static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
struct kvm_page_fault *fault, int r) struct kvm_page_fault *fault, int r)
{ {
@ -4546,12 +4561,12 @@ static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
r == RET_PF_RETRY, fault->map_writable); r == RET_PF_RETRY, fault->map_writable);
} }
static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu, static int kvm_mmu_faultin_pfn_gmem(struct kvm_vcpu *vcpu,
struct kvm_page_fault *fault) struct kvm_page_fault *fault)
{ {
int max_order, r; int max_order, r;
if (!kvm_slot_can_be_private(fault->slot)) { if (!kvm_slot_has_gmem(fault->slot)) {
kvm_mmu_prepare_memory_fault_exit(vcpu, fault); kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
return -EFAULT; return -EFAULT;
} }
@ -4564,8 +4579,7 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
} }
fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY); fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
fault->max_level = kvm_max_private_mapping_level(vcpu->kvm, fault->pfn, fault->max_level = kvm_max_level_for_order(max_order);
fault->max_level, max_order);
return RET_PF_CONTINUE; return RET_PF_CONTINUE;
} }
@ -4575,8 +4589,8 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
{ {
unsigned int foll = fault->write ? FOLL_WRITE : 0; unsigned int foll = fault->write ? FOLL_WRITE : 0;
if (fault->is_private) if (fault->is_private || kvm_memslot_is_gmem_only(fault->slot))
return kvm_mmu_faultin_pfn_private(vcpu, fault); return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
foll |= FOLL_NOWAIT; foll |= FOLL_NOWAIT;
fault->pfn = __kvm_faultin_pfn(fault->slot, fault->gfn, foll, fault->pfn = __kvm_faultin_pfn(fault->slot, fault->gfn, foll,
@ -7165,7 +7179,7 @@ restart:
* mapping if the indirect sp has level = 1. * mapping if the indirect sp has level = 1.
*/ */
if (sp->role.direct && if (sp->role.direct &&
sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn)) { sp->role.level < kvm_mmu_max_mapping_level(kvm, NULL, slot, sp->gfn)) {
kvm_zap_one_rmap_spte(kvm, rmap_head, sptep); kvm_zap_one_rmap_spte(kvm, rmap_head, sptep);
if (kvm_available_flush_remote_tlbs_range()) if (kvm_available_flush_remote_tlbs_range())

View File

@ -411,7 +411,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
return r; return r;
} }
int kvm_mmu_max_mapping_level(struct kvm *kvm, int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,
const struct kvm_memory_slot *slot, gfn_t gfn); const struct kvm_memory_slot *slot, gfn_t gfn);
void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_level); void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_level);

View File

@ -1813,7 +1813,7 @@ retry:
if (iter.gfn < start || iter.gfn >= end) if (iter.gfn < start || iter.gfn >= end)
continue; continue;
max_mapping_level = kvm_mmu_max_mapping_level(kvm, slot, iter.gfn); max_mapping_level = kvm_mmu_max_mapping_level(kvm, NULL, slot, iter.gfn);
if (max_mapping_level < iter.level) if (max_mapping_level < iter.level)
continue; continue;

View File

@ -2361,7 +2361,7 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
mutex_lock(&kvm->slots_lock); mutex_lock(&kvm->slots_lock);
memslot = gfn_to_memslot(kvm, params.gfn_start); memslot = gfn_to_memslot(kvm, params.gfn_start);
if (!kvm_slot_can_be_private(memslot)) { if (!kvm_slot_has_gmem(memslot)) {
ret = -EINVAL; ret = -EINVAL;
goto out; goto out;
} }
@ -4715,7 +4715,7 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
} }
slot = gfn_to_memslot(kvm, gfn); slot = gfn_to_memslot(kvm, gfn);
if (!kvm_slot_can_be_private(slot)) { if (!kvm_slot_has_gmem(slot)) {
pr_warn_ratelimited("SEV: Unexpected RMP fault, non-private slot for GPA 0x%llx\n", pr_warn_ratelimited("SEV: Unexpected RMP fault, non-private slot for GPA 0x%llx\n",
gpa); gpa);
return; return;
@ -4943,7 +4943,7 @@ next_pfn:
} }
} }
int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn) int sev_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, bool is_private)
{ {
int level, rc; int level, rc;
bool assigned; bool assigned;

View File

@ -5179,7 +5179,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.gmem_prepare = sev_gmem_prepare, .gmem_prepare = sev_gmem_prepare,
.gmem_invalidate = sev_gmem_invalidate, .gmem_invalidate = sev_gmem_invalidate,
.private_max_mapping_level = sev_private_max_mapping_level, .gmem_max_mapping_level = sev_gmem_max_mapping_level,
}; };
/* /*

View File

@ -866,7 +866,7 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu); void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order); int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end); void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn); int sev_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, bool is_private);
struct vmcb_save_area *sev_decrypt_vmsa(struct kvm_vcpu *vcpu); struct vmcb_save_area *sev_decrypt_vmsa(struct kvm_vcpu *vcpu);
void sev_free_decrypted_vmsa(struct kvm_vcpu *vcpu, struct vmcb_save_area *vmsa); void sev_free_decrypted_vmsa(struct kvm_vcpu *vcpu, struct vmcb_save_area *vmsa);
#else #else
@ -895,7 +895,7 @@ static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, in
return 0; return 0;
} }
static inline void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) {} static inline void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) {}
static inline int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn) static inline int sev_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, bool is_private)
{ {
return 0; return 0;
} }

View File

@ -831,10 +831,11 @@ static int vt_vcpu_mem_enc_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
return tdx_vcpu_ioctl(vcpu, argp); return tdx_vcpu_ioctl(vcpu, argp);
} }
static int vt_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn) static int vt_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
bool is_private)
{ {
if (is_td(kvm)) if (is_td(kvm))
return tdx_gmem_private_max_mapping_level(kvm, pfn); return tdx_gmem_max_mapping_level(kvm, pfn, is_private);
return 0; return 0;
} }
@ -1005,7 +1006,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
.mem_enc_ioctl = vt_op_tdx_only(mem_enc_ioctl), .mem_enc_ioctl = vt_op_tdx_only(mem_enc_ioctl),
.vcpu_mem_enc_ioctl = vt_op_tdx_only(vcpu_mem_enc_ioctl), .vcpu_mem_enc_ioctl = vt_op_tdx_only(vcpu_mem_enc_ioctl),
.private_max_mapping_level = vt_op_tdx_only(gmem_private_max_mapping_level) .gmem_max_mapping_level = vt_op_tdx_only(gmem_max_mapping_level)
}; };
struct kvm_x86_init_ops vt_init_ops __initdata = { struct kvm_x86_init_ops vt_init_ops __initdata = {

View File

@ -3318,8 +3318,11 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
return ret; return ret;
} }
int tdx_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn) int tdx_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, bool is_private)
{ {
if (!is_private)
return 0;
return PG_LEVEL_4K; return PG_LEVEL_4K;
} }

View File

@ -5785,6 +5785,13 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
if (kvm_test_request(KVM_REQ_EVENT, vcpu)) if (kvm_test_request(KVM_REQ_EVENT, vcpu))
return 1; return 1;
/*
* Ensure that any updates to kvm->buses[] observed by the
* previous instruction (emulated or otherwise) are also
* visible to the instruction KVM is about to emulate.
*/
smp_rmb();
if (!kvm_emulate_instruction(vcpu, 0)) if (!kvm_emulate_instruction(vcpu, 0))
return 0; return 0;

View File

@ -153,7 +153,7 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
void tdx_flush_tlb_current(struct kvm_vcpu *vcpu); void tdx_flush_tlb_current(struct kvm_vcpu *vcpu);
void tdx_flush_tlb_all(struct kvm_vcpu *vcpu); void tdx_flush_tlb_all(struct kvm_vcpu *vcpu);
void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level);
int tdx_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn); int tdx_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, bool is_private);
#endif #endif
#endif /* __KVM_X86_VMX_X86_OPS_H */ #endif /* __KVM_X86_VMX_X86_OPS_H */

View File

@ -13530,6 +13530,16 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
} }
EXPORT_SYMBOL_GPL(kvm_arch_no_poll); EXPORT_SYMBOL_GPL(kvm_arch_no_poll);
#ifdef CONFIG_KVM_GUEST_MEMFD
/*
* KVM doesn't yet support mmap() on guest_memfd for VMs with private memory
* (the private vs. shared tracking needs to be moved into guest_memfd).
*/
bool kvm_arch_supports_gmem_mmap(struct kvm *kvm)
{
return !kvm_arch_has_private_mem(kvm);
}
#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order) int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order)
{ {
@ -13543,6 +13553,7 @@ void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
kvm_x86_call(gmem_invalidate)(start, end); kvm_x86_call(gmem_invalidate)(start, end);
} }
#endif #endif
#endif
int kvm_spec_ctrl_test_value(u64 value) int kvm_spec_ctrl_test_value(u64 value)
{ {

View File

@ -1062,16 +1062,9 @@ static void gicv5_set_cpuif_idbits(void)
#ifdef CONFIG_KVM #ifdef CONFIG_KVM
static struct gic_kvm_info gic_v5_kvm_info __initdata; static struct gic_kvm_info gic_v5_kvm_info __initdata;
static bool __init gicv5_cpuif_has_gcie_legacy(void)
{
u64 idr0 = read_sysreg_s(SYS_ICC_IDR0_EL1);
return !!FIELD_GET(ICC_IDR0_EL1_GCIE_LEGACY, idr0);
}
static void __init gic_of_setup_kvm_info(struct device_node *node) static void __init gic_of_setup_kvm_info(struct device_node *node)
{ {
gic_v5_kvm_info.type = GIC_V5; gic_v5_kvm_info.type = GIC_V5;
gic_v5_kvm_info.has_gcie_v3_compat = gicv5_cpuif_has_gcie_legacy();
/* GIC Virtual CPU interface maintenance interrupt */ /* GIC Virtual CPU interface maintenance interrupt */
gic_v5_kvm_info.no_maint_irq_mask = false; gic_v5_kvm_info.no_maint_irq_mask = false;

View File

@ -59,10 +59,11 @@ asm volatile(ALTERNATIVE( \
#define PERF_EVENT_FLAG_USER_ACCESS BIT(SYSCTL_USER_ACCESS) #define PERF_EVENT_FLAG_USER_ACCESS BIT(SYSCTL_USER_ACCESS)
#define PERF_EVENT_FLAG_LEGACY BIT(SYSCTL_LEGACY) #define PERF_EVENT_FLAG_LEGACY BIT(SYSCTL_LEGACY)
PMU_FORMAT_ATTR(event, "config:0-47"); PMU_FORMAT_ATTR(event, "config:0-55");
PMU_FORMAT_ATTR(firmware, "config:62-63"); PMU_FORMAT_ATTR(firmware, "config:62-63");
static bool sbi_v2_available; static bool sbi_v2_available;
static bool sbi_v3_available;
static DEFINE_STATIC_KEY_FALSE(sbi_pmu_snapshot_available); static DEFINE_STATIC_KEY_FALSE(sbi_pmu_snapshot_available);
#define sbi_pmu_snapshot_available() \ #define sbi_pmu_snapshot_available() \
static_branch_unlikely(&sbi_pmu_snapshot_available) static_branch_unlikely(&sbi_pmu_snapshot_available)
@ -99,6 +100,7 @@ static unsigned int riscv_pmu_irq;
/* Cache the available counters in a bitmask */ /* Cache the available counters in a bitmask */
static unsigned long cmask; static unsigned long cmask;
static int pmu_event_find_cache(u64 config);
struct sbi_pmu_event_data { struct sbi_pmu_event_data {
union { union {
union { union {
@ -298,6 +300,66 @@ static struct sbi_pmu_event_data pmu_cache_event_map[PERF_COUNT_HW_CACHE_MAX]
}, },
}; };
static int pmu_sbi_check_event_info(void)
{
int num_events = ARRAY_SIZE(pmu_hw_event_map) + PERF_COUNT_HW_CACHE_MAX *
PERF_COUNT_HW_CACHE_OP_MAX * PERF_COUNT_HW_CACHE_RESULT_MAX;
struct riscv_pmu_event_info *event_info_shmem;
phys_addr_t base_addr;
int i, j, k, result = 0, count = 0;
struct sbiret ret;
event_info_shmem = kcalloc(num_events, sizeof(*event_info_shmem), GFP_KERNEL);
if (!event_info_shmem)
return -ENOMEM;
for (i = 0; i < ARRAY_SIZE(pmu_hw_event_map); i++)
event_info_shmem[count++].event_idx = pmu_hw_event_map[i].event_idx;
for (i = 0; i < ARRAY_SIZE(pmu_cache_event_map); i++) {
for (j = 0; j < ARRAY_SIZE(pmu_cache_event_map[i]); j++) {
for (k = 0; k < ARRAY_SIZE(pmu_cache_event_map[i][j]); k++)
event_info_shmem[count++].event_idx =
pmu_cache_event_map[i][j][k].event_idx;
}
}
base_addr = __pa(event_info_shmem);
if (IS_ENABLED(CONFIG_32BIT))
ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_EVENT_GET_INFO, lower_32_bits(base_addr),
upper_32_bits(base_addr), count, 0, 0, 0);
else
ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_EVENT_GET_INFO, base_addr, 0,
count, 0, 0, 0);
if (ret.error) {
result = -EOPNOTSUPP;
goto free_mem;
}
for (i = 0; i < ARRAY_SIZE(pmu_hw_event_map); i++) {
if (!(event_info_shmem[i].output & RISCV_PMU_EVENT_INFO_OUTPUT_MASK))
pmu_hw_event_map[i].event_idx = -ENOENT;
}
count = ARRAY_SIZE(pmu_hw_event_map);
for (i = 0; i < ARRAY_SIZE(pmu_cache_event_map); i++) {
for (j = 0; j < ARRAY_SIZE(pmu_cache_event_map[i]); j++) {
for (k = 0; k < ARRAY_SIZE(pmu_cache_event_map[i][j]); k++) {
if (!(event_info_shmem[count].output &
RISCV_PMU_EVENT_INFO_OUTPUT_MASK))
pmu_cache_event_map[i][j][k].event_idx = -ENOENT;
count++;
}
}
}
free_mem:
kfree(event_info_shmem);
return result;
}
static void pmu_sbi_check_event(struct sbi_pmu_event_data *edata) static void pmu_sbi_check_event(struct sbi_pmu_event_data *edata)
{ {
struct sbiret ret; struct sbiret ret;
@ -315,6 +377,15 @@ static void pmu_sbi_check_event(struct sbi_pmu_event_data *edata)
static void pmu_sbi_check_std_events(struct work_struct *work) static void pmu_sbi_check_std_events(struct work_struct *work)
{ {
int ret;
if (sbi_v3_available) {
ret = pmu_sbi_check_event_info();
if (ret)
pr_err("pmu_sbi_check_event_info failed with error %d\n", ret);
return;
}
for (int i = 0; i < ARRAY_SIZE(pmu_hw_event_map); i++) for (int i = 0; i < ARRAY_SIZE(pmu_hw_event_map); i++)
pmu_sbi_check_event(&pmu_hw_event_map[i]); pmu_sbi_check_event(&pmu_hw_event_map[i]);
@ -342,6 +413,71 @@ static bool pmu_sbi_ctr_is_fw(int cidx)
return info->type == SBI_PMU_CTR_TYPE_FW; return info->type == SBI_PMU_CTR_TYPE_FW;
} }
int riscv_pmu_get_event_info(u32 type, u64 config, u64 *econfig)
{
int ret = -ENOENT;
switch (type) {
case PERF_TYPE_HARDWARE:
if (config >= PERF_COUNT_HW_MAX)
return -EINVAL;
ret = pmu_hw_event_map[config].event_idx;
break;
case PERF_TYPE_HW_CACHE:
ret = pmu_event_find_cache(config);
break;
case PERF_TYPE_RAW:
/*
* As per SBI v0.3 specification,
* -- the upper 16 bits must be unused for a hardware raw event.
* As per SBI v2.0 specification,
* -- the upper 8 bits must be unused for a hardware raw event.
* Bits 63:62 are used to distinguish between raw events
* 00 - Hardware raw event
* 10 - SBI firmware events
* 11 - Risc-V platform specific firmware event
*/
switch (config >> 62) {
case 0:
if (sbi_v3_available) {
/* Return error any bits [56-63] is set as it is not allowed by the spec */
if (!(config & ~RISCV_PMU_RAW_EVENT_V2_MASK)) {
if (econfig)
*econfig = config & RISCV_PMU_RAW_EVENT_V2_MASK;
ret = RISCV_PMU_RAW_EVENT_V2_IDX;
}
/* Return error any bits [48-63] is set as it is not allowed by the spec */
} else if (!(config & ~RISCV_PMU_RAW_EVENT_MASK)) {
if (econfig)
*econfig = config & RISCV_PMU_RAW_EVENT_MASK;
ret = RISCV_PMU_RAW_EVENT_IDX;
}
break;
case 2:
ret = (config & 0xFFFF) | (SBI_PMU_EVENT_TYPE_FW << 16);
break;
case 3:
/*
* For Risc-V platform specific firmware events
* Event code - 0xFFFF
* Event data - raw event encoding
*/
ret = SBI_PMU_EVENT_TYPE_FW << 16 | RISCV_PLAT_FW_EVENT;
if (econfig)
*econfig = config & RISCV_PMU_PLAT_FW_EVENT_MASK;
break;
default:
break;
}
break;
default:
break;
}
return ret;
}
EXPORT_SYMBOL_GPL(riscv_pmu_get_event_info);
/* /*
* Returns the counter width of a programmable counter and number of hardware * Returns the counter width of a programmable counter and number of hardware
* counters. As we don't support heterogeneous CPUs yet, it is okay to just * counters. As we don't support heterogeneous CPUs yet, it is okay to just
@ -507,7 +643,6 @@ static int pmu_sbi_event_map(struct perf_event *event, u64 *econfig)
{ {
u32 type = event->attr.type; u32 type = event->attr.type;
u64 config = event->attr.config; u64 config = event->attr.config;
int ret = -ENOENT;
/* /*
* Ensure we are finished checking standard hardware events for * Ensure we are finished checking standard hardware events for
@ -515,54 +650,7 @@ static int pmu_sbi_event_map(struct perf_event *event, u64 *econfig)
*/ */
flush_work(&check_std_events_work); flush_work(&check_std_events_work);
switch (type) { return riscv_pmu_get_event_info(type, config, econfig);
case PERF_TYPE_HARDWARE:
if (config >= PERF_COUNT_HW_MAX)
return -EINVAL;
ret = pmu_hw_event_map[event->attr.config].event_idx;
break;
case PERF_TYPE_HW_CACHE:
ret = pmu_event_find_cache(config);
break;
case PERF_TYPE_RAW:
/*
* As per SBI specification, the upper 16 bits must be unused
* for a hardware raw event.
* Bits 63:62 are used to distinguish between raw events
* 00 - Hardware raw event
* 10 - SBI firmware events
* 11 - Risc-V platform specific firmware event
*/
switch (config >> 62) {
case 0:
/* Return error any bits [48-63] is set as it is not allowed by the spec */
if (!(config & ~RISCV_PMU_RAW_EVENT_MASK)) {
*econfig = config & RISCV_PMU_RAW_EVENT_MASK;
ret = RISCV_PMU_RAW_EVENT_IDX;
}
break;
case 2:
ret = (config & 0xFFFF) | (SBI_PMU_EVENT_TYPE_FW << 16);
break;
case 3:
/*
* For Risc-V platform specific firmware events
* Event code - 0xFFFF
* Event data - raw event encoding
*/
ret = SBI_PMU_EVENT_TYPE_FW << 16 | RISCV_PLAT_FW_EVENT;
*econfig = config & RISCV_PMU_PLAT_FW_EVENT_MASK;
break;
default:
break;
}
break;
default:
break;
}
return ret;
} }
static void pmu_sbi_snapshot_free(struct riscv_pmu *pmu) static void pmu_sbi_snapshot_free(struct riscv_pmu *pmu)
@ -1454,6 +1542,9 @@ static int __init pmu_sbi_devinit(void)
if (sbi_spec_version >= sbi_mk_version(2, 0)) if (sbi_spec_version >= sbi_mk_version(2, 0))
sbi_v2_available = true; sbi_v2_available = true;
if (sbi_spec_version >= sbi_mk_version(3, 0))
sbi_v3_available = true;
ret = cpuhp_setup_state_multi(CPUHP_AP_PERF_RISCV_STARTING, ret = cpuhp_setup_state_multi(CPUHP_AP_PERF_RISCV_STARTING,
"perf/riscv/pmu:starting", "perf/riscv/pmu:starting",
pmu_sbi_starting_cpu, pmu_sbi_dying_cpu); pmu_sbi_starting_cpu, pmu_sbi_dying_cpu);

View File

@ -378,6 +378,7 @@ struct vgic_cpu {
extern struct static_key_false vgic_v2_cpuif_trap; extern struct static_key_false vgic_v2_cpuif_trap;
extern struct static_key_false vgic_v3_cpuif_trap; extern struct static_key_false vgic_v3_cpuif_trap;
extern struct static_key_false vgic_v3_has_v2_compat;
int kvm_set_legacy_vgic_v2_addr(struct kvm *kvm, struct kvm_arm_device_addr *dev_addr); int kvm_set_legacy_vgic_v2_addr(struct kvm *kvm, struct kvm_arm_device_addr *dev_addr);
void kvm_vgic_early_init(struct kvm *kvm); void kvm_vgic_early_init(struct kvm *kvm);
@ -409,7 +410,6 @@ u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu);
#define irqchip_in_kernel(k) (!!((k)->arch.vgic.in_kernel)) #define irqchip_in_kernel(k) (!!((k)->arch.vgic.in_kernel))
#define vgic_initialized(k) ((k)->arch.vgic.initialized) #define vgic_initialized(k) ((k)->arch.vgic.initialized)
#define vgic_ready(k) ((k)->arch.vgic.ready)
#define vgic_valid_spi(k, i) (((i) >= VGIC_NR_PRIVATE_IRQS) && \ #define vgic_valid_spi(k, i) (((i) >= VGIC_NR_PRIVATE_IRQS) && \
((i) < (k)->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS)) ((i) < (k)->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS))

View File

@ -128,6 +128,7 @@
#define FFA_FEAT_RXTX_MIN_SZ_4K 0 #define FFA_FEAT_RXTX_MIN_SZ_4K 0
#define FFA_FEAT_RXTX_MIN_SZ_64K 1 #define FFA_FEAT_RXTX_MIN_SZ_64K 1
#define FFA_FEAT_RXTX_MIN_SZ_16K 2 #define FFA_FEAT_RXTX_MIN_SZ_16K 2
#define FFA_FEAT_RXTX_MIN_SZ_MASK GENMASK(1, 0)
/* FFA Bus/Device/Driver related */ /* FFA Bus/Device/Driver related */
struct ffa_device { struct ffa_device {

View File

@ -36,8 +36,6 @@ struct gic_kvm_info {
bool has_v4_1; bool has_v4_1;
/* Deactivation impared, subpar stuff */ /* Deactivation impared, subpar stuff */
bool no_hw_deactivation; bool no_hw_deactivation;
/* v3 compat support (GICv5 hosts, only) */
bool has_gcie_v3_compat;
}; };
#ifdef CONFIG_KVM #ifdef CONFIG_KVM

View File

@ -52,9 +52,10 @@
/* /*
* The bit 16 ~ bit 31 of kvm_userspace_memory_region::flags are internally * The bit 16 ~ bit 31 of kvm_userspace_memory_region::flags are internally
* used in kvm, other bits are visible for userspace which are defined in * used in kvm, other bits are visible for userspace which are defined in
* include/linux/kvm_h. * include/uapi/linux/kvm.h.
*/ */
#define KVM_MEMSLOT_INVALID (1UL << 16) #define KVM_MEMSLOT_INVALID (1UL << 16)
#define KVM_MEMSLOT_GMEM_ONLY (1UL << 17)
/* /*
* Bit 63 of the memslot generation number is an "update in-progress flag", * Bit 63 of the memslot generation number is an "update in-progress flag",
@ -206,6 +207,7 @@ struct kvm_io_range {
struct kvm_io_bus { struct kvm_io_bus {
int dev_count; int dev_count;
int ioeventfd_count; int ioeventfd_count;
struct rcu_head rcu;
struct kvm_io_range range[]; struct kvm_io_range range[];
}; };
@ -602,7 +604,7 @@ struct kvm_memory_slot {
short id; short id;
u16 as_id; u16 as_id;
#ifdef CONFIG_KVM_PRIVATE_MEM #ifdef CONFIG_KVM_GUEST_MEMFD
struct { struct {
/* /*
* Writes protected by kvm->slots_lock. Acquiring a * Writes protected by kvm->slots_lock. Acquiring a
@ -615,7 +617,7 @@ struct kvm_memory_slot {
#endif #endif
}; };
static inline bool kvm_slot_can_be_private(const struct kvm_memory_slot *slot) static inline bool kvm_slot_has_gmem(const struct kvm_memory_slot *slot)
{ {
return slot && (slot->flags & KVM_MEM_GUEST_MEMFD); return slot && (slot->flags & KVM_MEM_GUEST_MEMFD);
} }
@ -719,17 +721,17 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
} }
#endif #endif
/* #ifndef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
* Arch code must define kvm_arch_has_private_mem if support for private memory
* is enabled.
*/
#if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_PRIVATE_MEM)
static inline bool kvm_arch_has_private_mem(struct kvm *kvm) static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
{ {
return false; return false;
} }
#endif #endif
#ifdef CONFIG_KVM_GUEST_MEMFD
bool kvm_arch_supports_gmem_mmap(struct kvm *kvm);
#endif
#ifndef kvm_arch_has_readonly_mem #ifndef kvm_arch_has_readonly_mem
static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm) static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
{ {
@ -860,7 +862,7 @@ struct kvm {
struct notifier_block pm_notifier; struct notifier_block pm_notifier;
#endif #endif
#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
/* Protected by slots_locks (for writes) and RCU (for reads) */ /* Protected by slots_lock (for writes) and RCU (for reads) */
struct xarray mem_attr_array; struct xarray mem_attr_array;
#endif #endif
char stats_id[KVM_STATS_NAME_SIZE]; char stats_id[KVM_STATS_NAME_SIZE];
@ -966,11 +968,15 @@ static inline bool kvm_dirty_log_manual_protect_and_init_set(struct kvm *kvm)
return !!(kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET); return !!(kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET);
} }
/*
* Get a bus reference under the update-side lock. No long-term SRCU reader
* references are permitted, to avoid stale reads vs concurrent IO
* registrations.
*/
static inline struct kvm_io_bus *kvm_get_bus(struct kvm *kvm, enum kvm_bus idx) static inline struct kvm_io_bus *kvm_get_bus(struct kvm *kvm, enum kvm_bus idx)
{ {
return srcu_dereference_check(kvm->buses[idx], &kvm->srcu, return rcu_dereference_protected(kvm->buses[idx],
lockdep_is_held(&kvm->slots_lock) || lockdep_is_held(&kvm->slots_lock));
!refcount_read(&kvm->users_count));
} }
static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i) static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
@ -2490,6 +2496,14 @@ static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE; vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE;
} }
static inline bool kvm_memslot_is_gmem_only(const struct kvm_memory_slot *slot)
{
if (!IS_ENABLED(CONFIG_KVM_GUEST_MEMFD))
return false;
return slot->flags & KVM_MEMSLOT_GMEM_ONLY;
}
#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn) static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
{ {
@ -2505,8 +2519,7 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn) static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
{ {
return IS_ENABLED(CONFIG_KVM_PRIVATE_MEM) && return kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
} }
#else #else
static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn) static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
@ -2515,7 +2528,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
} }
#endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */ #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
#ifdef CONFIG_KVM_PRIVATE_MEM #ifdef CONFIG_KVM_GUEST_MEMFD
int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
gfn_t gfn, kvm_pfn_t *pfn, struct page **page, gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
int *max_order); int *max_order);
@ -2528,13 +2541,13 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
KVM_BUG_ON(1, kvm); KVM_BUG_ON(1, kvm);
return -EIO; return -EIO;
} }
#endif /* CONFIG_KVM_PRIVATE_MEM */ #endif /* CONFIG_KVM_GUEST_MEMFD */
#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order); int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
#endif #endif
#ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_POPULATE
/** /**
* kvm_gmem_populate() - Populate/prepare a GPA range with guest data * kvm_gmem_populate() - Populate/prepare a GPA range with guest data
* *

View File

@ -89,6 +89,7 @@ static inline void riscv_pmu_legacy_skip_init(void) {};
struct riscv_pmu *riscv_pmu_alloc(void); struct riscv_pmu *riscv_pmu_alloc(void);
#ifdef CONFIG_RISCV_PMU_SBI #ifdef CONFIG_RISCV_PMU_SBI
int riscv_pmu_get_hpm_info(u32 *hw_ctr_width, u32 *num_hw_ctr); int riscv_pmu_get_hpm_info(u32 *hw_ctr_width, u32 *num_hw_ctr);
int riscv_pmu_get_event_info(u32 type, u64 config, u64 *econfig);
#endif #endif
#endif /* CONFIG_RISCV_PMU */ #endif /* CONFIG_RISCV_PMU */

View File

@ -156,41 +156,6 @@ TRACE_EVENT(kvm_mmio,
__entry->len, __entry->gpa, __entry->val) __entry->len, __entry->gpa, __entry->val)
); );
#define KVM_TRACE_IOCSR_READ_UNSATISFIED 0
#define KVM_TRACE_IOCSR_READ 1
#define KVM_TRACE_IOCSR_WRITE 2
#define kvm_trace_symbol_iocsr \
{ KVM_TRACE_IOCSR_READ_UNSATISFIED, "unsatisfied-read" }, \
{ KVM_TRACE_IOCSR_READ, "read" }, \
{ KVM_TRACE_IOCSR_WRITE, "write" }
TRACE_EVENT(kvm_iocsr,
TP_PROTO(int type, int len, u64 gpa, void *val),
TP_ARGS(type, len, gpa, val),
TP_STRUCT__entry(
__field( u32, type )
__field( u32, len )
__field( u64, gpa )
__field( u64, val )
),
TP_fast_assign(
__entry->type = type;
__entry->len = len;
__entry->gpa = gpa;
__entry->val = 0;
if (val)
memcpy(&__entry->val, val,
min_t(u32, sizeof(__entry->val), len));
),
TP_printk("iocsr %s len %u gpa 0x%llx val 0x%llx",
__print_symbolic(__entry->type, kvm_trace_symbol_iocsr),
__entry->len, __entry->gpa, __entry->val)
);
#define kvm_fpu_load_symbol \ #define kvm_fpu_load_symbol \
{0, "unload"}, \ {0, "unload"}, \
{1, "load"} {1, "load"}

View File

@ -962,6 +962,7 @@ struct kvm_enable_cap {
#define KVM_CAP_ARM_EL2_E2H0 241 #define KVM_CAP_ARM_EL2_E2H0 241
#define KVM_CAP_RISCV_MP_STATE_RESET 242 #define KVM_CAP_RISCV_MP_STATE_RESET 242
#define KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED 243 #define KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED 243
#define KVM_CAP_GUEST_MEMFD_MMAP 244
struct kvm_irq_routing_irqchip { struct kvm_irq_routing_irqchip {
__u32 irqchip; __u32 irqchip;
@ -1598,6 +1599,7 @@ struct kvm_memory_attributes {
#define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)
#define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd) #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd)
#define GUEST_MEMFD_FLAG_MMAP (1ULL << 0)
struct kvm_create_guest_memfd { struct kvm_create_guest_memfd {
__u64 size; __u64 size;

View File

@ -156,6 +156,7 @@ TEST_GEN_PROGS_arm64 = $(TEST_GEN_PROGS_COMMON)
TEST_GEN_PROGS_arm64 += arm64/aarch32_id_regs TEST_GEN_PROGS_arm64 += arm64/aarch32_id_regs
TEST_GEN_PROGS_arm64 += arm64/arch_timer_edge_cases TEST_GEN_PROGS_arm64 += arm64/arch_timer_edge_cases
TEST_GEN_PROGS_arm64 += arm64/debug-exceptions TEST_GEN_PROGS_arm64 += arm64/debug-exceptions
TEST_GEN_PROGS_arm64 += arm64/hello_el2
TEST_GEN_PROGS_arm64 += arm64/host_sve TEST_GEN_PROGS_arm64 += arm64/host_sve
TEST_GEN_PROGS_arm64 += arm64/hypercalls TEST_GEN_PROGS_arm64 += arm64/hypercalls
TEST_GEN_PROGS_arm64 += arm64/external_aborts TEST_GEN_PROGS_arm64 += arm64/external_aborts
@ -175,6 +176,7 @@ TEST_GEN_PROGS_arm64 += arch_timer
TEST_GEN_PROGS_arm64 += coalesced_io_test TEST_GEN_PROGS_arm64 += coalesced_io_test
TEST_GEN_PROGS_arm64 += dirty_log_perf_test TEST_GEN_PROGS_arm64 += dirty_log_perf_test
TEST_GEN_PROGS_arm64 += get-reg-list TEST_GEN_PROGS_arm64 += get-reg-list
TEST_GEN_PROGS_arm64 += guest_memfd_test
TEST_GEN_PROGS_arm64 += memslot_modification_stress_test TEST_GEN_PROGS_arm64 += memslot_modification_stress_test
TEST_GEN_PROGS_arm64 += memslot_perf_test TEST_GEN_PROGS_arm64 += memslot_perf_test
TEST_GEN_PROGS_arm64 += mmu_stress_test TEST_GEN_PROGS_arm64 += mmu_stress_test
@ -196,9 +198,15 @@ TEST_GEN_PROGS_s390 += rseq_test
TEST_GEN_PROGS_riscv = $(TEST_GEN_PROGS_COMMON) TEST_GEN_PROGS_riscv = $(TEST_GEN_PROGS_COMMON)
TEST_GEN_PROGS_riscv += riscv/sbi_pmu_test TEST_GEN_PROGS_riscv += riscv/sbi_pmu_test
TEST_GEN_PROGS_riscv += riscv/ebreak_test TEST_GEN_PROGS_riscv += riscv/ebreak_test
TEST_GEN_PROGS_riscv += access_tracking_perf_test
TEST_GEN_PROGS_riscv += arch_timer TEST_GEN_PROGS_riscv += arch_timer
TEST_GEN_PROGS_riscv += coalesced_io_test TEST_GEN_PROGS_riscv += coalesced_io_test
TEST_GEN_PROGS_riscv += dirty_log_perf_test
TEST_GEN_PROGS_riscv += get-reg-list TEST_GEN_PROGS_riscv += get-reg-list
TEST_GEN_PROGS_riscv += memslot_modification_stress_test
TEST_GEN_PROGS_riscv += memslot_perf_test
TEST_GEN_PROGS_riscv += mmu_stress_test
TEST_GEN_PROGS_riscv += rseq_test
TEST_GEN_PROGS_riscv += steal_time TEST_GEN_PROGS_riscv += steal_time
TEST_GEN_PROGS_loongarch += coalesced_io_test TEST_GEN_PROGS_loongarch += coalesced_io_test

View File

@ -50,6 +50,7 @@
#include "memstress.h" #include "memstress.h"
#include "guest_modes.h" #include "guest_modes.h"
#include "processor.h" #include "processor.h"
#include "ucall_common.h"
#include "cgroup_util.h" #include "cgroup_util.h"
#include "lru_gen_util.h" #include "lru_gen_util.h"

View File

@ -165,10 +165,8 @@ static void guest_code(void)
static void test_init_timer_irq(struct kvm_vm *vm) static void test_init_timer_irq(struct kvm_vm *vm)
{ {
/* Timer initid should be same for all the vCPUs, so query only vCPU-0 */ /* Timer initid should be same for all the vCPUs, so query only vCPU-0 */
vcpu_device_attr_get(vcpus[0], KVM_ARM_VCPU_TIMER_CTRL, ptimer_irq = vcpu_get_ptimer_irq(vcpus[0]);
KVM_ARM_VCPU_TIMER_IRQ_PTIMER, &ptimer_irq); vtimer_irq = vcpu_get_vtimer_irq(vcpus[0]);
vcpu_device_attr_get(vcpus[0], KVM_ARM_VCPU_TIMER_CTRL,
KVM_ARM_VCPU_TIMER_IRQ_VTIMER, &vtimer_irq);
sync_global_to_guest(vm, ptimer_irq); sync_global_to_guest(vm, ptimer_irq);
sync_global_to_guest(vm, vtimer_irq); sync_global_to_guest(vm, vtimer_irq);
@ -176,14 +174,14 @@ static void test_init_timer_irq(struct kvm_vm *vm)
pr_debug("ptimer_irq: %d; vtimer_irq: %d\n", ptimer_irq, vtimer_irq); pr_debug("ptimer_irq: %d; vtimer_irq: %d\n", ptimer_irq, vtimer_irq);
} }
static int gic_fd;
struct kvm_vm *test_vm_create(void) struct kvm_vm *test_vm_create(void)
{ {
struct kvm_vm *vm; struct kvm_vm *vm;
unsigned int i; unsigned int i;
int nr_vcpus = test_args.nr_vcpus; int nr_vcpus = test_args.nr_vcpus;
TEST_REQUIRE(kvm_supports_vgic_v3());
vm = vm_create_with_vcpus(nr_vcpus, guest_code, vcpus); vm = vm_create_with_vcpus(nr_vcpus, guest_code, vcpus);
vm_init_descriptor_tables(vm); vm_init_descriptor_tables(vm);
@ -204,8 +202,6 @@ struct kvm_vm *test_vm_create(void)
vcpu_init_descriptor_tables(vcpus[i]); vcpu_init_descriptor_tables(vcpus[i]);
test_init_timer_irq(vm); test_init_timer_irq(vm);
gic_fd = vgic_v3_setup(vm, nr_vcpus, 64);
__TEST_REQUIRE(gic_fd >= 0, "Failed to create vgic-v3");
/* Make all the test's cmdline args visible to the guest */ /* Make all the test's cmdline args visible to the guest */
sync_global_to_guest(vm, test_args); sync_global_to_guest(vm, test_args);
@ -215,6 +211,5 @@ struct kvm_vm *test_vm_create(void)
void test_vm_cleanup(struct kvm_vm *vm) void test_vm_cleanup(struct kvm_vm *vm)
{ {
close(gic_fd);
kvm_vm_free(vm); kvm_vm_free(vm);
} }

View File

@ -924,10 +924,8 @@ static void test_run(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
static void test_init_timer_irq(struct kvm_vm *vm, struct kvm_vcpu *vcpu) static void test_init_timer_irq(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
{ {
vcpu_device_attr_get(vcpu, KVM_ARM_VCPU_TIMER_CTRL, ptimer_irq = vcpu_get_ptimer_irq(vcpu);
KVM_ARM_VCPU_TIMER_IRQ_PTIMER, &ptimer_irq); vtimer_irq = vcpu_get_vtimer_irq(vcpu);
vcpu_device_attr_get(vcpu, KVM_ARM_VCPU_TIMER_CTRL,
KVM_ARM_VCPU_TIMER_IRQ_VTIMER, &vtimer_irq);
sync_global_to_guest(vm, ptimer_irq); sync_global_to_guest(vm, ptimer_irq);
sync_global_to_guest(vm, vtimer_irq); sync_global_to_guest(vm, vtimer_irq);
@ -935,8 +933,6 @@ static void test_init_timer_irq(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
pr_debug("ptimer_irq: %d; vtimer_irq: %d\n", ptimer_irq, vtimer_irq); pr_debug("ptimer_irq: %d; vtimer_irq: %d\n", ptimer_irq, vtimer_irq);
} }
static int gic_fd;
static void test_vm_create(struct kvm_vm **vm, struct kvm_vcpu **vcpu, static void test_vm_create(struct kvm_vm **vm, struct kvm_vcpu **vcpu,
enum arch_timer timer) enum arch_timer timer)
{ {
@ -951,8 +947,6 @@ static void test_vm_create(struct kvm_vm **vm, struct kvm_vcpu **vcpu,
vcpu_args_set(*vcpu, 1, timer); vcpu_args_set(*vcpu, 1, timer);
test_init_timer_irq(*vm, *vcpu); test_init_timer_irq(*vm, *vcpu);
gic_fd = vgic_v3_setup(*vm, 1, 64);
__TEST_REQUIRE(gic_fd >= 0, "Failed to create vgic-v3");
sync_global_to_guest(*vm, test_args); sync_global_to_guest(*vm, test_args);
sync_global_to_guest(*vm, CVAL_MAX); sync_global_to_guest(*vm, CVAL_MAX);
@ -961,7 +955,6 @@ static void test_vm_create(struct kvm_vm **vm, struct kvm_vcpu **vcpu,
static void test_vm_cleanup(struct kvm_vm *vm) static void test_vm_cleanup(struct kvm_vm *vm)
{ {
close(gic_fd);
kvm_vm_free(vm); kvm_vm_free(vm);
} }
@ -1042,6 +1035,8 @@ int main(int argc, char *argv[])
/* Tell stdout not to buffer its content */ /* Tell stdout not to buffer its content */
setbuf(stdout, NULL); setbuf(stdout, NULL);
TEST_REQUIRE(kvm_supports_vgic_v3());
if (!parse_args(argc, argv)) if (!parse_args(argc, argv))
exit(KSFT_SKIP); exit(KSFT_SKIP);

View File

@ -250,6 +250,47 @@ static void test_serror(void)
kvm_vm_free(vm); kvm_vm_free(vm);
} }
static void expect_sea_s1ptw_handler(struct ex_regs *regs)
{
u64 esr = read_sysreg(esr_el1);
GUEST_ASSERT_EQ(regs->pc, expected_abort_pc);
GUEST_ASSERT_EQ(ESR_ELx_EC(esr), ESR_ELx_EC_DABT_CUR);
GUEST_ASSERT_EQ((esr & ESR_ELx_FSC), ESR_ELx_FSC_SEA_TTW(3));
GUEST_DONE();
}
static noinline void test_s1ptw_abort_guest(void)
{
extern char test_s1ptw_abort_insn;
WRITE_ONCE(expected_abort_pc, (u64)&test_s1ptw_abort_insn);
asm volatile("test_s1ptw_abort_insn:\n\t"
"ldr x0, [%0]\n\t"
: : "r" (MMIO_ADDR) : "x0", "memory");
GUEST_FAIL("Load on S1PTW abort should not retire");
}
static void test_s1ptw_abort(void)
{
struct kvm_vcpu *vcpu;
u64 *ptep, bad_pa;
struct kvm_vm *vm = vm_create_with_dabt_handler(&vcpu, test_s1ptw_abort_guest,
expect_sea_s1ptw_handler);
ptep = virt_get_pte_hva_at_level(vm, MMIO_ADDR, 2);
bad_pa = BIT(vm->pa_bits) - vm->page_size;
*ptep &= ~GENMASK(47, 12);
*ptep |= bad_pa;
vcpu_run_expect_done(vcpu);
kvm_vm_free(vm);
}
static void test_serror_emulated_guest(void) static void test_serror_emulated_guest(void)
{ {
GUEST_ASSERT(!(read_sysreg(isr_el1) & ISR_EL1_A)); GUEST_ASSERT(!(read_sysreg(isr_el1) & ISR_EL1_A));
@ -327,4 +368,5 @@ int main(void)
test_serror_masked(); test_serror_masked();
test_serror_emulated(); test_serror_emulated();
test_mmio_ease(); test_mmio_ease();
test_s1ptw_abort();
} }

View File

@ -0,0 +1,71 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* hello_el2 - Basic KVM selftest for VM running at EL2 with E2H=RES1
*
* Copyright 2025 Google LLC
*/
#include "kvm_util.h"
#include "processor.h"
#include "test_util.h"
#include "ucall.h"
#include <asm/sysreg.h>
static void guest_code(void)
{
u64 mmfr0 = read_sysreg_s(SYS_ID_AA64MMFR0_EL1);
u64 mmfr1 = read_sysreg_s(SYS_ID_AA64MMFR1_EL1);
u64 mmfr4 = read_sysreg_s(SYS_ID_AA64MMFR4_EL1);
u8 e2h0 = SYS_FIELD_GET(ID_AA64MMFR4_EL1, E2H0, mmfr4);
GUEST_ASSERT_EQ(get_current_el(), 2);
GUEST_ASSERT(read_sysreg(hcr_el2) & HCR_EL2_E2H);
GUEST_ASSERT_EQ(SYS_FIELD_GET(ID_AA64MMFR1_EL1, VH, mmfr1),
ID_AA64MMFR1_EL1_VH_IMP);
/*
* Traps of the complete ID register space are IMPDEF without FEAT_FGT,
* which is really annoying to deal with in KVM describing E2H as RES1.
*
* If the implementation doesn't honor the trap then expect the register
* to return all zeros.
*/
if (e2h0 == ID_AA64MMFR4_EL1_E2H0_IMP)
GUEST_ASSERT_EQ(SYS_FIELD_GET(ID_AA64MMFR0_EL1, FGT, mmfr0),
ID_AA64MMFR0_EL1_FGT_NI);
else
GUEST_ASSERT_EQ(e2h0, ID_AA64MMFR4_EL1_E2H0_NI_NV1);
GUEST_DONE();
}
int main(void)
{
struct kvm_vcpu_init init;
struct kvm_vcpu *vcpu;
struct kvm_vm *vm;
struct ucall uc;
TEST_REQUIRE(kvm_check_cap(KVM_CAP_ARM_EL2));
vm = vm_create(1);
kvm_get_default_vcpu_target(vm, &init);
init.features[0] |= BIT(KVM_ARM_VCPU_HAS_EL2);
vcpu = aarch64_vcpu_add(vm, 0, &init, guest_code);
kvm_arch_vm_finalize_vcpus(vm);
vcpu_run(vcpu);
switch (get_ucall(vcpu, &uc)) {
case UCALL_DONE:
break;
case UCALL_ABORT:
REPORT_GUEST_ASSERT(uc);
break;
default:
TEST_FAIL("Unhandled ucall: %ld\n", uc.cmd);
}
kvm_vm_free(vm);
return 0;
}

View File

@ -108,7 +108,7 @@ static void guest_test_hvc(const struct test_hvc_info *hc_info)
for (i = 0; i < hvc_info_arr_sz; i++, hc_info++) { for (i = 0; i < hvc_info_arr_sz; i++, hc_info++) {
memset(&res, 0, sizeof(res)); memset(&res, 0, sizeof(res));
smccc_hvc(hc_info->func_id, hc_info->arg1, 0, 0, 0, 0, 0, 0, &res); do_smccc(hc_info->func_id, hc_info->arg1, 0, 0, 0, 0, 0, 0, &res);
switch (stage) { switch (stage) {
case TEST_STAGE_HVC_IFACE_FEAT_DISABLED: case TEST_STAGE_HVC_IFACE_FEAT_DISABLED:

Some files were not shown because too many files have changed in this diff Show More