Commit 49219bba authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge tag 'edac_updates_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:

 - imh_edac: Add a new EDAC driver for Intel Diamond Rapids and future
   incarnations of this memory controllers architecture

 - amd64_edac: Remove the legacy csrow sysfs interface which has been
   deprecated and unused (we assume) for at least a decade

 - Add the capability to fallback to BIOS-provided address translation
   functionality (ACPI PRM) which can be used on systems unsupported by
   the current AMD address translation library

 - The usual fixes, fixlets, cleanups and improvements all over the
   place

* tag 'edac_updates_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  RAS/AMD/ATL: Replace bitwise_xor_bits() with hweight16()
  EDAC/igen6: Fix error handling in igen6_edac driver
  EDAC/imh: Setup 'imh_test' debugfs testing node
  EDAC/{skx_comm,imh}: Detect 2-level memory configuration
  EDAC/skx_common: Extend the maximum number of DRAM chip row bits
  EDAC/{skx_common,imh}: Add EDAC driver for Intel Diamond Rapids servers
  EDAC/skx_common: Prepare for skx_set_hi_lo()
  EDAC/skx_common: Prepare for skx_get_edac_list()
  EDAC/{skx_common,skx,i10nm}: Make skx_register_mci() independent of pci_dev
  EDAC/ghes: Replace deprecated strcpy() in ghes_edac_report_mem_error()
  EDAC/ie31200: Fix error handling in ie31200_register_mci
  RAS/CEC: Replace use of system_wq with system_percpu_wq
  EDAC: Remove the legacy EDAC sysfs interface
  EDAC/amd64: Remove NUM_CONTROLLERS macro
  EDAC/amd64: Generate ctl_name string at runtime
  RAS/AMD/ATL: Require PRM support for future systems
  ACPI: PRM: Add acpi_prm_handler_available()
  RAS/AMD/ATL: Return error codes from helper functions
parents 7f8d5f70 e2349c58
Loading
Loading
Loading
Loading
+3 −139
Original line number Diff line number Diff line
@@ -406,24 +406,8 @@ index of the MC::
		   |->mc2
		   ....

Under each ``mcX`` directory each ``csrowX`` is again represented by a
``csrowX``, where ``X`` is the csrow index::

	.../mc/mc0/
		|
		|->csrow0
		|->csrow2
		|->csrow3
		....

Notice that there is no csrow1, which indicates that csrow0 is composed
of a single ranked DIMMs. This should also apply in both Channels, in
order to have dual-channel mode be operational. Since both csrow2 and
csrow3 are populated, this indicates a dual ranked set of DIMMs for
channels 0 and 1.

Within each of the ``mcX`` and ``csrowX`` directories are several EDAC
control and attribute files.
Within each of the ``mcX`` directory are several EDAC control and
attribute files.

``mcX`` directories
-------------------
@@ -569,7 +553,7 @@ this ``X`` memory module:
		- Unbuffered-DDR

.. [#f5] On some systems, the memory controller doesn't have any logic
  to identify the memory module. On such systems, the directory is called ``rankX`` and works on a similar way as the ``csrowX`` directories.
  to identify the memory module. On such systems, the directory is called ``rankX``.
  On modern Intel memory controllers, the memory controller identifies the
  memory modules directly. On such systems, the directory is called ``dimmX``.

@@ -577,126 +561,6 @@ this ``X`` memory module:
  symlinks inside the sysfs mapping that are automatically created by
  the sysfs subsystem. Currently, they serve no purpose.

``csrowX`` directories
----------------------

When CONFIG_EDAC_LEGACY_SYSFS is enabled, sysfs will contain the ``csrowX``
directories. As this API doesn't work properly for Rambus, FB-DIMMs and
modern Intel Memory Controllers, this is being deprecated in favor of
``dimmX`` directories.

In the ``csrowX`` directories are EDAC control and attribute files for
this ``X`` instance of csrow:


- ``ue_count`` - Total Uncorrectable Errors count attribute file

	This attribute file displays the total count of uncorrectable
	errors that have occurred on this csrow. If panic_on_ue is set
	this counter will not have a chance to increment, since EDAC
	will panic the system.


- ``ce_count`` - Total Correctable Errors count attribute file

	This attribute file displays the total count of correctable
	errors that have occurred on this csrow. This count is very
	important to examine. CEs provide early indications that a
	DIMM is beginning to fail. This count field should be
	monitored for non-zero values and report such information
	to the system administrator.


- ``size_mb`` - Total memory managed by this csrow attribute file

	This attribute file displays, in count of megabytes, the memory
	that this csrow contains.


- ``mem_type`` - Memory Type attribute file

	This attribute file will display what type of memory is currently
	on this csrow. Normally, either buffered or unbuffered memory.
	Examples:

		- Registered-DDR
		- Unbuffered-DDR


- ``edac_mode`` - EDAC Mode of operation attribute file

	This attribute file will display what type of Error detection
	and correction is being utilized.


- ``dev_type`` - Device type attribute file

	This attribute file will display what type of DRAM device is
	being utilized on this DIMM.
	Examples:

		- x1
		- x2
		- x4
		- x8


- ``ch0_ce_count`` - Channel 0 CE Count attribute file

	This attribute file will display the count of CEs on this
	DIMM located in channel 0.


- ``ch0_ue_count`` - Channel 0 UE Count attribute file

	This attribute file will display the count of UEs on this
	DIMM located in channel 0.


- ``ch0_dimm_label`` - Channel 0 DIMM Label control file


	This control file allows this DIMM to have a label assigned
	to it. With this label in the module, when errors occur
	the output can provide the DIMM label in the system log.
	This becomes vital for panic events to isolate the
	cause of the UE event.

	DIMM Labels must be assigned after booting, with information
	that correctly identifies the physical slot with its
	silk screen label. This information is currently very
	motherboard specific and determination of this information
	must occur in userland at this time.


- ``ch1_ce_count`` - Channel 1 CE Count attribute file


	This attribute file will display the count of CEs on this
	DIMM located in channel 1.


- ``ch1_ue_count`` - Channel 1 UE Count attribute file


	This attribute file will display the count of UEs on this
	DIMM located in channel 0.


- ``ch1_dimm_label`` - Channel 1 DIMM Label control file

	This control file allows this DIMM to have a label assigned
	to it. With this label in the module, when errors occur
	the output can provide the DIMM label in the system log.
	This becomes vital for panic events to isolate the
	cause of the UE event.

	DIMM Labels must be assigned after booting, with information
	that correctly identifies the physical slot with its
	silk screen label. This information is currently very
	motherboard specific and determination of this information
	must occur in userland at this time.


System Logging
--------------
+0 −1
Original line number Diff line number Diff line
@@ -917,7 +917,6 @@ CONFIG_MMC=y
CONFIG_MMC_LOONGSON2=m
CONFIG_INFINIBAND=m
CONFIG_EDAC=y
# CONFIG_EDAC_LEGACY_SYSFS is not set
CONFIG_EDAC_LOONGSON=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_DRV_EFI=y
+6 −0
Original line number Diff line number Diff line
@@ -244,6 +244,12 @@ static struct prm_handler_info *find_prm_handler(const guid_t *guid)
	return (struct prm_handler_info *) find_guid_info(guid, GET_HANDLER);
}

bool acpi_prm_handler_available(const guid_t *guid)
{
	return find_prm_handler(guid) && find_prm_module(guid);
}
EXPORT_SYMBOL_GPL(acpi_prm_handler_available);

/* In-coming PRM commands */

#define PRM_CMD_RUN_SERVICE		0
+12 −8
Original line number Diff line number Diff line
@@ -23,14 +23,6 @@ menuconfig EDAC

if EDAC

config EDAC_LEGACY_SYSFS
	bool "EDAC legacy sysfs"
	default y
	help
	  Enable the compatibility sysfs nodes.
	  Use 'Y' if your edac utilities aren't ported to work with the newer
	  structures.

config EDAC_DEBUG
	bool "Debugging"
	select DEBUG_FS
@@ -291,6 +283,18 @@ config EDAC_I10NM
	  system has non-volatile DIMMs you should also manually
	  select CONFIG_ACPI_NFIT.

config EDAC_IMH
	tristate "Intel Integrated Memory/IO Hub MC"
	depends on X86_64 && X86_MCE_INTEL && ACPI
	depends on ACPI_NFIT || !ACPI_NFIT # if ACPI_NFIT=m, EDAC_IMH can't be y
	select DMI
	select ACPI_ADXL
	help
	  Support for error detection and correction the Intel
	  Integrated Memory/IO Hub Memory Controller. This MC IP is
	  first used on the Diamond Rapids servers but may appear on
	  others in the future.

config EDAC_PND2
	tristate "Intel Pondicherry2"
	depends on PCI && X86_64 && X86_MCE_INTEL
+3 −0
Original line number Diff line number Diff line
@@ -65,6 +65,9 @@ obj-$(CONFIG_EDAC_SKX) += skx_edac.o skx_edac_common.o
i10nm_edac-y				:= i10nm_base.o
obj-$(CONFIG_EDAC_I10NM)		+= i10nm_edac.o skx_edac_common.o

imh_edac-y				:= imh_base.o
obj-$(CONFIG_EDAC_IMH)			+= imh_edac.o skx_edac_common.o

obj-$(CONFIG_EDAC_HIGHBANK_MC)		+= highbank_mc_edac.o
obj-$(CONFIG_EDAC_HIGHBANK_L2)		+= highbank_l2_edac.o

Loading