Commit 55b29050 authored by Arnaldo Carvalho de Melo's avatar Arnaldo Carvalho de Melo
Browse files

Merge remote-tracking branch 'torvalds/master' into perf-tools-next



To pick up some more fixes that went upstream via the perf-tools fixes
branch.

Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
parents 8c49c6e1 374a7f47
Loading
Loading
Loading
Loading
+7 −6
Original line number Diff line number Diff line
@@ -513,17 +513,18 @@ Description: information about CPUs heterogeneity.
		cpu_capacity: capacity of cpuX.

What:		/sys/devices/system/cpu/vulnerabilities
		/sys/devices/system/cpu/vulnerabilities/gather_data_sampling
		/sys/devices/system/cpu/vulnerabilities/itlb_multihit
		/sys/devices/system/cpu/vulnerabilities/l1tf
		/sys/devices/system/cpu/vulnerabilities/mds
		/sys/devices/system/cpu/vulnerabilities/meltdown
		/sys/devices/system/cpu/vulnerabilities/mmio_stale_data
		/sys/devices/system/cpu/vulnerabilities/retbleed
		/sys/devices/system/cpu/vulnerabilities/spec_store_bypass
		/sys/devices/system/cpu/vulnerabilities/spectre_v1
		/sys/devices/system/cpu/vulnerabilities/spectre_v2
		/sys/devices/system/cpu/vulnerabilities/spec_store_bypass
		/sys/devices/system/cpu/vulnerabilities/l1tf
		/sys/devices/system/cpu/vulnerabilities/mds
		/sys/devices/system/cpu/vulnerabilities/srbds
		/sys/devices/system/cpu/vulnerabilities/tsx_async_abort
		/sys/devices/system/cpu/vulnerabilities/itlb_multihit
		/sys/devices/system/cpu/vulnerabilities/mmio_stale_data
		/sys/devices/system/cpu/vulnerabilities/retbleed
Date:		January 2018
Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
Description:	Information about CPU vulnerabilities
+109 −0
Original line number Diff line number Diff line
.. SPDX-License-Identifier: GPL-2.0

GDS - Gather Data Sampling
==========================

Gather Data Sampling is a hardware vulnerability which allows unprivileged
speculative access to data which was previously stored in vector registers.

Problem
-------
When a gather instruction performs loads from memory, different data elements
are merged into the destination vector register. However, when a gather
instruction that is transiently executed encounters a fault, stale data from
architectural or internal vector registers may get transiently forwarded to the
destination vector register instead. This will allow a malicious attacker to
infer stale data using typical side channel techniques like cache timing
attacks. GDS is a purely sampling-based attack.

The attacker uses gather instructions to infer the stale vector register data.
The victim does not need to do anything special other than use the vector
registers. The victim does not need to use gather instructions to be
vulnerable.

Because the buffers are shared between Hyper-Threads cross Hyper-Thread attacks
are possible.

Attack scenarios
----------------
Without mitigation, GDS can infer stale data across virtually all
permission boundaries:

	Non-enclaves can infer SGX enclave data
	Userspace can infer kernel data
	Guests can infer data from hosts
	Guest can infer guest from other guests
	Users can infer data from other users

Because of this, it is important to ensure that the mitigation stays enabled in
lower-privilege contexts like guests and when running outside SGX enclaves.

The hardware enforces the mitigation for SGX. Likewise, VMMs should  ensure
that guests are not allowed to disable the GDS mitigation. If a host erred and
allowed this, a guest could theoretically disable GDS mitigation, mount an
attack, and re-enable it.

Mitigation mechanism
--------------------
This issue is mitigated in microcode. The microcode defines the following new
bits:

 ================================   ===   ============================
 IA32_ARCH_CAPABILITIES[GDS_CTRL]   R/O   Enumerates GDS vulnerability
                                          and mitigation support.
 IA32_ARCH_CAPABILITIES[GDS_NO]     R/O   Processor is not vulnerable.
 IA32_MCU_OPT_CTRL[GDS_MITG_DIS]    R/W   Disables the mitigation
                                          0 by default.
 IA32_MCU_OPT_CTRL[GDS_MITG_LOCK]   R/W   Locks GDS_MITG_DIS=0. Writes
                                          to GDS_MITG_DIS are ignored
                                          Can't be cleared once set.
 ================================   ===   ============================

GDS can also be mitigated on systems that don't have updated microcode by
disabling AVX. This can be done by setting gather_data_sampling="force" or
"clearcpuid=avx" on the kernel command-line.

If used, these options will disable AVX use by turning off XSAVE YMM support.
However, the processor will still enumerate AVX support.  Userspace that
does not follow proper AVX enumeration to check both AVX *and* XSAVE YMM
support will break.

Mitigation control on the kernel command line
---------------------------------------------
The mitigation can be disabled by setting "gather_data_sampling=off" or
"mitigations=off" on the kernel command line. Not specifying either will default
to the mitigation being enabled. Specifying "gather_data_sampling=force" will
use the microcode mitigation when available or disable AVX on affected systems
where the microcode hasn't been updated to include the mitigation.

GDS System Information
------------------------
The kernel provides vulnerability status information through sysfs. For
GDS this can be accessed by the following sysfs file:

/sys/devices/system/cpu/vulnerabilities/gather_data_sampling

The possible values contained in this file are:

 ============================== =============================================
 Not affected                   Processor not vulnerable.
 Vulnerable                     Processor vulnerable and mitigation disabled.
 Vulnerable: No microcode       Processor vulnerable and microcode is missing
                                mitigation.
 Mitigation: AVX disabled,
 no microcode                   Processor is vulnerable and microcode is missing
                                mitigation. AVX disabled as mitigation.
 Mitigation: Microcode          Processor is vulnerable and mitigation is in
                                effect.
 Mitigation: Microcode (locked) Processor is vulnerable and mitigation is in
                                effect and cannot be disabled.
 Unknown: Dependent on
 hypervisor status              Running on a virtual guest processor that is
                                affected but with no way to know if host
                                processor is mitigated or vulnerable.
 ============================== =============================================

GDS Default mitigation
----------------------
The updated microcode will enable the mitigation by default. The kernel's
default action is to leave the mitigation enabled.
+2 −0
Original line number Diff line number Diff line
@@ -19,3 +19,5 @@ are configurable at compile, boot or run time.
   l1d_flush.rst
   processor_mmio_stale_data.rst
   cross-thread-rsb.rst
   srso
   gather_data_sampling.rst
+133 −0
Original line number Diff line number Diff line
.. SPDX-License-Identifier: GPL-2.0

Speculative Return Stack Overflow (SRSO)
========================================

This is a mitigation for the speculative return stack overflow (SRSO)
vulnerability found on AMD processors. The mechanism is by now the well
known scenario of poisoning CPU functional units - the Branch Target
Buffer (BTB) and Return Address Predictor (RAP) in this case - and then
tricking the elevated privilege domain (the kernel) into leaking
sensitive data.

AMD CPUs predict RET instructions using a Return Address Predictor (aka
Return Address Stack/Return Stack Buffer). In some cases, a non-architectural
CALL instruction (i.e., an instruction predicted to be a CALL but is
not actually a CALL) can create an entry in the RAP which may be used
to predict the target of a subsequent RET instruction.

The specific circumstances that lead to this varies by microarchitecture
but the concern is that an attacker can mis-train the CPU BTB to predict
non-architectural CALL instructions in kernel space and use this to
control the speculative target of a subsequent kernel RET, potentially
leading to information disclosure via a speculative side-channel.

The issue is tracked under CVE-2023-20569.

Affected processors
-------------------

AMD Zen, generations 1-4. That is, all families 0x17 and 0x19. Older
processors have not been investigated.

System information and options
------------------------------

First of all, it is required that the latest microcode be loaded for
mitigations to be effective.

The sysfs file showing SRSO mitigation status is:

  /sys/devices/system/cpu/vulnerabilities/spec_rstack_overflow

The possible values in this file are:

 - 'Not affected'               The processor is not vulnerable

 - 'Vulnerable: no microcode'   The processor is vulnerable, no
                                microcode extending IBPB functionality
                                to address the vulnerability has been
                                applied.

 - 'Mitigation: microcode'      Extended IBPB functionality microcode
                                patch has been applied. It does not
                                address User->Kernel and Guest->Host
                                transitions protection but it does
                                address User->User and VM->VM attack
                                vectors.

                                (spec_rstack_overflow=microcode)

 - 'Mitigation: safe RET'       Software-only mitigation. It complements
                                the extended IBPB microcode patch
                                functionality by addressing User->Kernel 
                                and Guest->Host transitions protection.

                                Selected by default or by
                                spec_rstack_overflow=safe-ret

 - 'Mitigation: IBPB'           Similar protection as "safe RET" above
                                but employs an IBPB barrier on privilege
                                domain crossings (User->Kernel,
                                Guest->Host).

                                (spec_rstack_overflow=ibpb)

 - 'Mitigation: IBPB on VMEXIT' Mitigation addressing the cloud provider
                                scenario - the Guest->Host transitions
                                only.

                                (spec_rstack_overflow=ibpb-vmexit)

In order to exploit vulnerability, an attacker needs to:

 - gain local access on the machine

 - break kASLR

 - find gadgets in the running kernel in order to use them in the exploit

 - potentially create and pin an additional workload on the sibling
   thread, depending on the microarchitecture (not necessary on fam 0x19)

 - run the exploit

Considering the performance implications of each mitigation type, the
default one is 'Mitigation: safe RET' which should take care of most
attack vectors, including the local User->Kernel one.

As always, the user is advised to keep her/his system up-to-date by
applying software updates regularly.

The default setting will be reevaluated when needed and especially when
new attack vectors appear.

As one can surmise, 'Mitigation: safe RET' does come at the cost of some
performance depending on the workload. If one trusts her/his userspace
and does not want to suffer the performance impact, one can always
disable the mitigation with spec_rstack_overflow=off.

Similarly, 'Mitigation: IBPB' is another full mitigation type employing
an indrect branch prediction barrier after having applied the required
microcode patch for one's system. This mitigation comes also at
a performance cost.

Mitigation: safe RET
--------------------

The mitigation works by ensuring all RET instructions speculate to
a controlled location, similar to how speculation is controlled in the
retpoline sequence.  To accomplish this, the __x86_return_thunk forces
the CPU to mispredict every function return using a 'safe return'
sequence.

To ensure the safety of this mitigation, the kernel must ensure that the
safe return sequence is itself free from attacker interference.  In Zen3
and Zen4, this is accomplished by creating a BTB alias between the
untraining function srso_untrain_ret_alias() and the safe return
function srso_safe_ret_alias() which results in evicting a potentially
poisoned BTB entry and using that safe one for all function returns.

In older Zen1 and Zen2, this is accomplished using a reinterpretation
technique similar to Retbleed one: srso_untrain_ret() and
srso_safe_ret().
+6 −0
Original line number Diff line number Diff line
@@ -624,3 +624,9 @@ Used to get the correct ranges:
  * VMALLOC_START ~ VMALLOC_END : vmalloc() / ioremap() space.
  * VMEMMAP_START ~ VMEMMAP_END : vmemmap space, used for struct page array.
  * KERNEL_LINK_ADDR : start address of Kernel link and BPF

va_kernel_pa_offset
-------------------

Indicates the offset between the kernel virtual and physical mappings.
Used to translate virtual to physical addresses.
Loading