Commit ca7e9177 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge tag 'x86-apic-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 APIC updates from Thomas Gleixner:
 "Rework of APIC enumeration and topology evaluation.

  The current implementation has a couple of shortcomings:

   - It fails to handle hybrid systems correctly.

   - The APIC registration code which handles CPU number assignents is
     in the middle of the APIC code and detached from the topology
     evaluation.

   - The various mechanisms which enumerate APICs, ACPI, MPPARSE and
     guest specific ones, tweak global variables as they see fit or in
     case of XENPV just hack around the generic mechanisms completely.

   - The CPUID topology evaluation code is sprinkled all over the vendor
     code and reevaluates global variables on every hotplug operation.

   - There is no way to analyze topology on the boot CPU before bringing
     up the APs. This causes problems for infrastructure like PERF which
     needs to size certain aspects upfront or could be simplified if
     that would be possible.

   - The APIC admission and CPU number association logic is
     incomprehensible and overly complex and needs to be kept around
     after boot instead of completing this right after the APIC
     enumeration.

  This update addresses these shortcomings with the following changes:

   - Rework the CPUID evaluation code so it is common for all vendors
     and provides information about the APIC ID segments in a uniform
     way independent of the number of segments (Thread, Core, Module,
     ..., Die, Package) so that this information can be computed instead
     of rewriting global variables of dubious value over and over.

   - A few cleanups and simplifcations of the APIC, IO/APIC and related
     interfaces to prepare for the topology evaluation changes.

   - Seperation of the parser stages so the early evaluation which tries
     to find the APIC address can be seperately overridden from the late
     evaluation which enumerates and registers the local APIC as further
     preparation for sanitizing the topology evaluation.

   - A new registration and admission logic which

       - encapsulates the inner workings so that parsers and guest logic
         cannot longer fiddle in it

       - uses the APIC ID segments to build topology bitmaps at
         registration time

       - provides a sane admission logic

       - allows to detect the crash kernel case, where CPU0 does not run
         on the real BSP, automatically. This is required to prevent
         sending INIT/SIPI sequences to the real BSP which would reset
         the whole machine. This was so far handled by a tedious command
         line parameter, which does not even work in nested crash
         scenarios.

       - Associates CPU number after the enumeration completed and
         prevents the late registration of APICs, which was somehow
         tolerated before.

   - Converting all parsers and guest enumeration mechanisms over to the
     new interfaces.

     This allows to get rid of all global variable tweaking from the
     parsers and enumeration mechanisms and sanitizes the XEN[PV]
     handling so it can use CPUID evaluation for the first time.

   - Mopping up existing sins by taking the information from the APIC ID
     segment bitmaps.

     This evaluates hybrid systems correctly on the boot CPU and allows
     for cleanups and fixes in the related drivers, e.g. PERF.

  The series has been extensively tested and the minimal late fallout
  due to a broken ACPI/MADT table has been addressed by tightening the
  admission logic further"

* tag 'x86-apic-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (76 commits)
  x86/topology: Ignore non-present APIC IDs in a present package
  x86/apic: Build the x86 topology enumeration functions on UP APIC builds too
  smp: Provide 'setup_max_cpus' definition on UP too
  smp: Avoid 'setup_max_cpus' namespace collision/shadowing
  x86/bugs: Use fixed addressing for VERW operand
  x86/cpu/topology: Get rid of cpuinfo::x86_max_cores
  x86/cpu/topology: Provide __num_[cores|threads]_per_package
  x86/cpu/topology: Rename topology_max_die_per_package()
  x86/cpu/topology: Rename smp_num_siblings
  x86/cpu/topology: Retrieve cores per package from topology bitmaps
  x86/cpu/topology: Use topology logical mapping mechanism
  x86/cpu/topology: Provide logical pkg/die mapping
  x86/cpu/topology: Simplify cpu_mark_primary_thread()
  x86/cpu/topology: Mop up primary thread mask handling
  x86/cpu/topology: Use topology bitmaps for sizing
  x86/cpu/topology: Let XEN/PV use topology from CPUID/MADT
  x86/xen/smp_pv: Count number of vCPUs early
  x86/cpu/topology: Assign hotpluggable CPUIDs during init
  x86/cpu/topology: Reject unknown APIC IDs on ACPI hotplug
  x86/topology: Add a mechanism to track topology via APIC IDs
  ...
parents d08c407f f0551af0
Loading
Loading
Loading
Loading
+2 −5
Original line number Diff line number Diff line
@@ -191,9 +191,7 @@ Dump-capture kernel config options (Arch Dependent, i386 and x86_64)
   CPU is enough for kdump kernel to dump vmcore on most of systems.

   However, you can also specify nr_cpus=X to enable multiple processors
   in kdump kernel. In this case, "disable_cpu_apicid=" is needed to
   tell kdump kernel which cpu is 1st kernel's BSP. Please refer to
   admin-guide/kernel-parameters.txt for more details.
   in kdump kernel.

   With CONFIG_SMP=n, the above things are not related.

@@ -454,8 +452,7 @@ Notes on loading the dump-capture kernel:
  to use multi-thread programs with it, such as parallel dump feature of
  makedumpfile. Otherwise, the multi-thread program may have a great
  performance degradation. To enable multi-cpu support, you should bring up an
  SMP dump-capture kernel and specify maxcpus/nr_cpus, disable_cpu_apicid=[X]
  options while loading it.
  SMP dump-capture kernel and specify maxcpus/nr_cpus options while loading it.

* For s390x there are two kdump modes: If a ELF header is specified with
  the elfcorehdr= kernel parameter, it is used by the kdump kernel as it
+0 −9
Original line number Diff line number Diff line
@@ -1095,15 +1095,6 @@
			Disable TLBIE instruction. Currently does not work
			with KVM, with HASH MMU, or with coherent accelerators.

	disable_cpu_apicid= [X86,APIC,SMP]
			Format: <int>
			The number of initial APIC ID for the
			corresponding CPU to be disabled at boot,
			mostly used for the kdump 2nd kernel to
			disable BSP to wake up multiple CPUs without
			causing system reset or hang due to sending
			INIT from AP to BSP.

	disable_ddw	[PPC/PSERIES,EARLY]
			Disable Dynamic DMA Window support. Use this
			to workaround buggy firmware.
+9 −15
Original line number Diff line number Diff line
@@ -47,17 +47,21 @@ AMD nomenclature for package is 'Node'.

Package-related topology information in the kernel:

  - cpuinfo_x86.x86_max_cores:
  - topology_num_threads_per_package()

    The number of cores in a package. This information is retrieved via CPUID.
    The number of threads in a package.

  - cpuinfo_x86.x86_max_dies:
  - topology_num_cores_per_package()

    The number of dies in a package. This information is retrieved via CPUID.
    The number of cores in a package.

  - topology_max_dies_per_package()

    The maximum number of dies in a package.

  - cpuinfo_x86.topo.die_id:

    The physical ID of the die. This information is retrieved via CPUID.
    The physical ID of the die.

  - cpuinfo_x86.topo.pkg_id:

@@ -96,16 +100,6 @@ are SMT- or CMT-type threads.
AMDs nomenclature for a CMT core is "Compute Unit". The kernel always uses
"core".

Core-related topology information in the kernel:

  - smp_num_siblings:

    The number of threads in a core. The number of threads in a package can be
    calculated by::

	threads_per_package = cpuinfo_x86.x86_max_cores * smp_num_siblings


Threads
=======
A thread is a single scheduling unit. It's the equivalent to a logical Linux
+1 −1
Original line number Diff line number Diff line
@@ -579,7 +579,7 @@ static void amd_pmu_cpu_starting(int cpu)
	if (!x86_pmu.amd_nb_constraints)
		return;

	nb_id = topology_die_id(cpu);
	nb_id = topology_amd_node_id(cpu);
	WARN_ON_ONCE(nb_id == BAD_APICID);

	for_each_online_cpu(i) {
+1 −1
Original line number Diff line number Diff line
@@ -834,7 +834,7 @@ static int __init cstate_init(void)
	}

	if (has_cstate_pkg) {
		if (topology_max_die_per_package() > 1) {
		if (topology_max_dies_per_package() > 1) {
			err = perf_pmu_register(&cstate_pkg_pmu,
						"cstate_die", -1);
		} else {
Loading