Commit ba1f9c8f authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull arm64 updates from Catalin Marinas:

 - Support for running Linux in a protected VM under the Arm
   Confidential Compute Architecture (CCA)

 - Guarded Control Stack user-space support. Current patches follow the
   x86 ABI of implicitly creating a shadow stack on clone(). Subsequent
   patches (already on the list) will add support for clone3() allowing
   finer-grained control of the shadow stack size and placement from
   libc

 - AT_HWCAP3 support (not running out of HWCAP2 bits yet but we are
   getting close with the upcoming dpISA support)

 - Other arch features:

     - In-kernel use of the memcpy instructions, FEAT_MOPS (previously
       only exposed to user; uaccess support not merged yet)

     - MTE: hugetlbfs support and the corresponding kselftests

     - Optimise CRC32 using the PMULL instructions

     - Support for FEAT_HAFT enabling ARCH_HAS_NONLEAF_PMD_YOUNG

     - Optimise the kernel TLB flushing to use the range operations

     - POE/pkey (permission overlays): further cleanups after bringing
       the signal handler in line with the x86 behaviour for 6.12

 - arm64 perf updates:

     - Support for the NXP i.MX91 PMU in the existing IMX driver

     - Support for Ampere SoCs in the Designware PCIe PMU driver

     - Support for Marvell's 'PEM' PCIe PMU present in the 'Odyssey' SoC

     - Support for Samsung's 'Mongoose' CPU PMU

     - Support for PMUv3.9 finer-grained userspace counter access
       control

     - Switch back to platform_driver::remove() now that it returns
       'void'

     - Add some missing events for the CXL PMU driver

 - Miscellaneous arm64 fixes/cleanups:

     - Page table accessors cleanup: type updates, drop unused macros,
       reorganise arch_make_huge_pte() and clean up pte_mkcont(), sanity
       check addresses before runtime P4D/PUD folding

     - Command line override for ID_AA64MMFR0_EL1.ECV (advertising the
       FEAT_ECV for the generic timers) allowing Linux to boot with
       firmware deployments that don't set SCTLR_EL3.ECVEn

     - ACPI/arm64: tighten the check for the array of platform timer
       structures and adjust the error handling procedure in
       gtdt_parse_timer_block()

     - Optimise the cache flush for the uprobes xol slot (skip if no
       change) and other uprobes/kprobes cleanups

     - Fix the context switching of tpidrro_el0 when kpti is enabled

     - Dynamic shadow call stack fixes

     - Sysreg updates

     - Various arm64 kselftest improvements

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (168 commits)
  arm64: tls: Fix context-switching of tpidrro_el0 when kpti is enabled
  kselftest/arm64: Try harder to generate different keys during PAC tests
  kselftest/arm64: Don't leak pipe fds in pac.exec_sign_all()
  arm64/ptrace: Clarify documentation of VL configuration via ptrace
  kselftest/arm64: Corrupt P0 in the irritator when testing SSVE
  acpi/arm64: remove unnecessary cast
  arm64/mm: Change protval as 'pteval_t' in map_range()
  kselftest/arm64: Fix missing printf() argument in gcs/gcs-stress.c
  kselftest/arm64: Add FPMR coverage to fp-ptrace
  kselftest/arm64: Expand the set of ZA writes fp-ptrace does
  kselftets/arm64: Use flag bits for features in fp-ptrace assembler code
  kselftest/arm64: Enable build of PAC tests with LLVM=1
  kselftest/arm64: Check that SVCR is 0 in signal handlers
  selftests/mm: Fix unused function warning for aarch64_write_signal_pkey()
  kselftest/arm64: Fix printf() compiler warnings in the arm64 syscall-abi.c tests
  kselftest/arm64: Fix printf() warning in the arm64 MTE prctl() test
  kselftest/arm64: Fix printf() compiler warnings in the arm64 fp tests
  kselftest/arm64: Fix build with stricter assemblers
  arm64/scs: Drop unused prototype __pi_scs_patch_vmlinux()
  arm64/scs: Deal with 64-bit relative offsets in FDE frames
  ...
parents 9aa4c37f 83ef4a37
Loading
Loading
Loading
Loading
+3 −0
Original line number Diff line number Diff line
@@ -446,6 +446,9 @@
	arm64.nobti	[ARM64] Unconditionally disable Branch Target
			Identification support

	arm64.nogcs	[ARM64] Unconditionally disable Guarded Control Stack
			support

	arm64.nomops	[ARM64] Unconditionally disable Memory Copy and Memory
			Set instructions support

+1 −0
Original line number Diff line number Diff line
@@ -26,3 +26,4 @@ Performance monitor support
   meson-ddr-pmu
   cxl
   ampere_cspmu
   mrvl-pem-pmu
+56 −0
Original line number Diff line number Diff line
=================================================================
Marvell Odyssey PEM Performance Monitoring Unit (PMU UNCORE)
=================================================================

The PCI Express Interface Units(PEM) are associated with a corresponding
monitoring unit. This includes performance counters to track various
characteristics of the data that is transmitted over the PCIe link.

The counters track inbound and outbound transactions which
includes separate counters for posted/non-posted/completion TLPs.
Also, inbound and outbound memory read requests along with their
latencies can also be monitored. Address Translation Services(ATS)events
such as ATS Translation, ATS Page Request, ATS Invalidation along with
their corresponding latencies are also tracked.

There are separate 64 bit counters to measure posted/non-posted/completion
tlps in inbound and outbound transactions. ATS events are measured by
different counters.

The PMU driver exposes the available events and format options under sysfs,
/sys/bus/event_source/devices/mrvl_pcie_rc_pmu_<>/events/
/sys/bus/event_source/devices/mrvl_pcie_rc_pmu_<>/format/

Examples::

  # perf list | grep mrvl_pcie_rc_pmu
  mrvl_pcie_rc_pmu_<>/ats_inv/             [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ats_inv_latency/     [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ats_pri/             [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ats_pri_latency/     [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ats_trans/           [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ats_trans_latency/   [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ib_inflight/         [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ib_reads/            [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ib_req_no_ro_ebus/   [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ib_req_no_ro_ncb/    [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ib_tlp_cpl_partid/   [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ib_tlp_dwords_cpl_partid/ [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ib_tlp_dwords_npr/   [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ib_tlp_dwords_pr/    [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ib_tlp_npr/          [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ib_tlp_pr/           [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ob_inflight_partid/  [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ob_merges_cpl_partid/ [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ob_merges_npr_partid/ [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ob_merges_pr_partid/ [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ob_reads_partid/     [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ob_tlp_cpl_partid/   [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ob_tlp_dwords_cpl_partid/ [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ob_tlp_dwords_npr_partid/ [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ob_tlp_dwords_pr_partid/ [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ob_tlp_npr_partid/   [Kernel PMU event]
  mrvl_pcie_rc_pmu_<>/ob_tlp_pr_partid/    [Kernel PMU event]


  # perf stat -e ib_inflight,ib_reads,ib_req_no_ro_ebus,ib_req_no_ro_ncb <workload>
+69 −0
Original line number Diff line number Diff line
.. SPDX-License-Identifier: GPL-2.0

=====================================
Arm Confidential Compute Architecture
=====================================

Arm systems that support the Realm Management Extension (RME) contain
hardware to allow a VM guest to be run in a way which protects the code
and data of the guest from the hypervisor. It extends the older "two
world" model (Normal and Secure World) into four worlds: Normal, Secure,
Root and Realm. Linux can then also be run as a guest to a monitor
running in the Realm world.

The monitor running in the Realm world is known as the Realm Management
Monitor (RMM) and implements the Realm Management Monitor
specification[1]. The monitor acts a bit like a hypervisor (e.g. it runs
in EL2 and manages the stage 2 page tables etc of the guests running in
Realm world), however much of the control is handled by a hypervisor
running in the Normal World. The Normal World hypervisor uses the Realm
Management Interface (RMI) defined by the RMM specification to request
the RMM to perform operations (e.g. mapping memory or executing a vCPU).

The RMM defines an environment for guests where the address space (IPA)
is split into two. The lower half is protected - any memory that is
mapped in this half cannot be seen by the Normal World and the RMM
restricts what operations the Normal World can perform on this memory
(e.g. the Normal World cannot replace pages in this region without the
guest's cooperation). The upper half is shared, the Normal World is free
to make changes to the pages in this region, and is able to emulate MMIO
devices in this region too.

A guest running in a Realm may also communicate with the RMM using the
Realm Services Interface (RSI) to request changes in its environment or
to perform attestation about its environment. In particular it may
request that areas of the protected address space are transitioned
between 'RAM' and 'EMPTY' (in either direction). This allows a Realm
guest to give up memory to be returned to the Normal World, or to
request new memory from the Normal World.  Without an explicit request
from the Realm guest the RMM will otherwise prevent the Normal World
from making these changes.

Linux as a Realm Guest
----------------------

To run Linux as a guest within a Realm, the following must be provided
either by the VMM or by a `boot loader` run in the Realm before Linux:

 * All protected RAM described to Linux (by DT or ACPI) must be marked
   RIPAS RAM before handing control over to Linux.

 * MMIO devices must be either unprotected (e.g. emulated by the Normal
   World) or marked RIPAS DEV.

 * MMIO devices emulated by the Normal World and used very early in boot
   (specifically earlycon) must be specified in the upper half of IPA.
   For earlycon this can be done by specifying the address on the
   command line, e.g. with an IPA size of 33 bits and the base address
   of the emulated UART at 0x1000000: ``earlycon=uart,mmio,0x101000000``

 * Linux will use bounce buffers for communicating with unprotected
   devices. It will transition some protected memory to RIPAS EMPTY and
   expect to be able to access unprotected pages at the same IPA address
   but with the highest valid IPA bit set. The expectation is that the
   VMM will remove the physical pages from the protected mapping and
   provide those pages as unprotected pages.

References
----------
[1] https://developer.arm.com/documentation/den0137/
+38 −0
Original line number Diff line number Diff line
@@ -41,6 +41,9 @@ to automatically locate and size all RAM, or it may use knowledge of
the RAM in the machine, or any other method the boot loader designer
sees fit.)

For Arm Confidential Compute Realms this includes ensuring that all
protected RAM has a Realm IPA state (RIPAS) of "RAM".


2. Setup the device tree
-------------------------
@@ -385,6 +388,9 @@ Before jumping into the kernel, the following conditions must be met:

    - HCRX_EL2.MSCEn (bit 11) must be initialised to 0b1.

    - HCRX_EL2.MCE2 (bit 10) must be initialised to 0b1 and the hypervisor
      must handle MOPS exceptions as described in :ref:`arm64_mops_hyp`.

  For CPUs with the Extended Translation Control Register feature (FEAT_TCR2):

  - If EL3 is present:
@@ -411,6 +417,38 @@ Before jumping into the kernel, the following conditions must be met:

    - HFGRWR_EL2.nPIRE0_EL1 (bit 57) must be initialised to 0b1.

 - For CPUs with Guarded Control Stacks (FEAT_GCS):

  - GCSCR_EL1 must be initialised to 0.

  - GCSCRE0_EL1 must be initialised to 0.

  - If EL3 is present:

    - SCR_EL3.GCSEn (bit 39) must be initialised to 0b1.

  - If EL2 is present:

    - GCSCR_EL2 must be initialised to 0.

 - If the kernel is entered at EL1 and EL2 is present:

    - HCRX_EL2.GCSEn must be initialised to 0b1.

    - HFGITR_EL2.nGCSEPP (bit 59) must be initialised to 0b1.

    - HFGITR_EL2.nGCSSTR_EL1 (bit 58) must be initialised to 0b1.

    - HFGITR_EL2.nGCSPUSHM_EL1 (bit 57) must be initialised to 0b1.

    - HFGRTR_EL2.nGCS_EL1 (bit 53) must be initialised to 0b1.

    - HFGRTR_EL2.nGCS_EL0 (bit 52) must be initialised to 0b1.

    - HFGWTR_EL2.nGCS_EL1 (bit 53) must be initialised to 0b1.

    - HFGWTR_EL2.nGCS_EL0 (bit 52) must be initialised to 0b1.

The requirements described above for CPU mode, caches, MMUs, architected
timers, coherency and system registers apply to all CPUs.  All CPUs must
enter the kernel in the same exception level.  Where the values documented
Loading