Commit e2349c58 authored by Borislav Petkov (AMD)'s avatar Borislav Petkov (AMD)
Browse files

Merge remote-tracking branches 'ras/edac-amd-atl', 'ras/edac-drivers' and...


Merge remote-tracking branches 'ras/edac-amd-atl', 'ras/edac-drivers' and 'ras/edac-misc' into edac-updates

Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
Loading
Loading
Loading
Loading
+3 −139
Original line number Diff line number Diff line
@@ -406,24 +406,8 @@ index of the MC::
		   |->mc2
		   ....

Under each ``mcX`` directory each ``csrowX`` is again represented by a
``csrowX``, where ``X`` is the csrow index::

	.../mc/mc0/
		|
		|->csrow0
		|->csrow2
		|->csrow3
		....

Notice that there is no csrow1, which indicates that csrow0 is composed
of a single ranked DIMMs. This should also apply in both Channels, in
order to have dual-channel mode be operational. Since both csrow2 and
csrow3 are populated, this indicates a dual ranked set of DIMMs for
channels 0 and 1.

Within each of the ``mcX`` and ``csrowX`` directories are several EDAC
control and attribute files.
Within each of the ``mcX`` directory are several EDAC control and
attribute files.

``mcX`` directories
-------------------
@@ -569,7 +553,7 @@ this ``X`` memory module:
		- Unbuffered-DDR

.. [#f5] On some systems, the memory controller doesn't have any logic
  to identify the memory module. On such systems, the directory is called ``rankX`` and works on a similar way as the ``csrowX`` directories.
  to identify the memory module. On such systems, the directory is called ``rankX``.
  On modern Intel memory controllers, the memory controller identifies the
  memory modules directly. On such systems, the directory is called ``dimmX``.

@@ -577,126 +561,6 @@ this ``X`` memory module:
  symlinks inside the sysfs mapping that are automatically created by
  the sysfs subsystem. Currently, they serve no purpose.

``csrowX`` directories
----------------------

When CONFIG_EDAC_LEGACY_SYSFS is enabled, sysfs will contain the ``csrowX``
directories. As this API doesn't work properly for Rambus, FB-DIMMs and
modern Intel Memory Controllers, this is being deprecated in favor of
``dimmX`` directories.

In the ``csrowX`` directories are EDAC control and attribute files for
this ``X`` instance of csrow:


- ``ue_count`` - Total Uncorrectable Errors count attribute file

	This attribute file displays the total count of uncorrectable
	errors that have occurred on this csrow. If panic_on_ue is set
	this counter will not have a chance to increment, since EDAC
	will panic the system.


- ``ce_count`` - Total Correctable Errors count attribute file

	This attribute file displays the total count of correctable
	errors that have occurred on this csrow. This count is very
	important to examine. CEs provide early indications that a
	DIMM is beginning to fail. This count field should be
	monitored for non-zero values and report such information
	to the system administrator.


- ``size_mb`` - Total memory managed by this csrow attribute file

	This attribute file displays, in count of megabytes, the memory
	that this csrow contains.


- ``mem_type`` - Memory Type attribute file

	This attribute file will display what type of memory is currently
	on this csrow. Normally, either buffered or unbuffered memory.
	Examples:

		- Registered-DDR
		- Unbuffered-DDR


- ``edac_mode`` - EDAC Mode of operation attribute file

	This attribute file will display what type of Error detection
	and correction is being utilized.


- ``dev_type`` - Device type attribute file

	This attribute file will display what type of DRAM device is
	being utilized on this DIMM.
	Examples:

		- x1
		- x2
		- x4
		- x8


- ``ch0_ce_count`` - Channel 0 CE Count attribute file

	This attribute file will display the count of CEs on this
	DIMM located in channel 0.


- ``ch0_ue_count`` - Channel 0 UE Count attribute file

	This attribute file will display the count of UEs on this
	DIMM located in channel 0.


- ``ch0_dimm_label`` - Channel 0 DIMM Label control file


	This control file allows this DIMM to have a label assigned
	to it. With this label in the module, when errors occur
	the output can provide the DIMM label in the system log.
	This becomes vital for panic events to isolate the
	cause of the UE event.

	DIMM Labels must be assigned after booting, with information
	that correctly identifies the physical slot with its
	silk screen label. This information is currently very
	motherboard specific and determination of this information
	must occur in userland at this time.


- ``ch1_ce_count`` - Channel 1 CE Count attribute file


	This attribute file will display the count of CEs on this
	DIMM located in channel 1.


- ``ch1_ue_count`` - Channel 1 UE Count attribute file


	This attribute file will display the count of UEs on this
	DIMM located in channel 0.


- ``ch1_dimm_label`` - Channel 1 DIMM Label control file

	This control file allows this DIMM to have a label assigned
	to it. With this label in the module, when errors occur
	the output can provide the DIMM label in the system log.
	This becomes vital for panic events to isolate the
	cause of the UE event.

	DIMM Labels must be assigned after booting, with information
	that correctly identifies the physical slot with its
	silk screen label. This information is currently very
	motherboard specific and determination of this information
	must occur in userland at this time.


System Logging
--------------
+0 −1
Original line number Diff line number Diff line
@@ -917,7 +917,6 @@ CONFIG_MMC=y
CONFIG_MMC_LOONGSON2=m
CONFIG_INFINIBAND=m
CONFIG_EDAC=y
# CONFIG_EDAC_LEGACY_SYSFS is not set
CONFIG_EDAC_LOONGSON=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_DRV_EFI=y
+12 −8
Original line number Diff line number Diff line
@@ -23,14 +23,6 @@ menuconfig EDAC

if EDAC

config EDAC_LEGACY_SYSFS
	bool "EDAC legacy sysfs"
	default y
	help
	  Enable the compatibility sysfs nodes.
	  Use 'Y' if your edac utilities aren't ported to work with the newer
	  structures.

config EDAC_DEBUG
	bool "Debugging"
	select DEBUG_FS
@@ -291,6 +283,18 @@ config EDAC_I10NM
	  system has non-volatile DIMMs you should also manually
	  select CONFIG_ACPI_NFIT.

config EDAC_IMH
	tristate "Intel Integrated Memory/IO Hub MC"
	depends on X86_64 && X86_MCE_INTEL && ACPI
	depends on ACPI_NFIT || !ACPI_NFIT # if ACPI_NFIT=m, EDAC_IMH can't be y
	select DMI
	select ACPI_ADXL
	help
	  Support for error detection and correction the Intel
	  Integrated Memory/IO Hub Memory Controller. This MC IP is
	  first used on the Diamond Rapids servers but may appear on
	  others in the future.

config EDAC_PND2
	tristate "Intel Pondicherry2"
	depends on PCI && X86_64 && X86_MCE_INTEL
+3 −0
Original line number Diff line number Diff line
@@ -65,6 +65,9 @@ obj-$(CONFIG_EDAC_SKX) += skx_edac.o skx_edac_common.o
i10nm_edac-y				:= i10nm_base.o
obj-$(CONFIG_EDAC_I10NM)		+= i10nm_edac.o skx_edac_common.o

imh_edac-y				:= imh_base.o
obj-$(CONFIG_EDAC_IMH)			+= imh_edac.o skx_edac_common.o

obj-$(CONFIG_EDAC_HIGHBANK_MC)		+= highbank_mc_edac.o
obj-$(CONFIG_EDAC_HIGHBANK_L2)		+= highbank_l2_edac.o

+15 −46
Original line number Diff line number Diff line
@@ -3732,6 +3732,7 @@ static void hw_info_put(struct amd64_pvt *pvt)
	pci_dev_put(pvt->F1);
	pci_dev_put(pvt->F2);
	kfree(pvt->umc);
	kfree(pvt->csels);
}

static struct low_ops umc_ops = {
@@ -3766,6 +3767,7 @@ static int per_family_init(struct amd64_pvt *pvt)
	pvt->stepping	= boot_cpu_data.x86_stepping;
	pvt->model	= boot_cpu_data.x86_model;
	pvt->fam	= boot_cpu_data.x86;
	char *tmp_name = NULL;
	pvt->max_mcs	= 2;

	/*
@@ -3779,7 +3781,7 @@ static int per_family_init(struct amd64_pvt *pvt)

	switch (pvt->fam) {
	case 0xf:
		pvt->ctl_name				= (pvt->ext_model >= K8_REV_F) ?
		tmp_name				= (pvt->ext_model >= K8_REV_F) ?
							  "K8 revF or later" : "K8 revE or earlier";
		pvt->f1_id				= PCI_DEVICE_ID_AMD_K8_NB_ADDRMAP;
		pvt->f2_id				= PCI_DEVICE_ID_AMD_K8_NB_MEMCTL;
@@ -3788,7 +3790,6 @@ static int per_family_init(struct amd64_pvt *pvt)
		break;

	case 0x10:
		pvt->ctl_name				= "F10h";
		pvt->f1_id				= PCI_DEVICE_ID_AMD_10H_NB_MAP;
		pvt->f2_id				= PCI_DEVICE_ID_AMD_10H_NB_DRAM;
		pvt->ops->dbam_to_cs			= f10_dbam_to_chip_select;
@@ -3797,12 +3798,10 @@ static int per_family_init(struct amd64_pvt *pvt)
	case 0x15:
		switch (pvt->model) {
		case 0x30:
			pvt->ctl_name			= "F15h_M30h";
			pvt->f1_id			= PCI_DEVICE_ID_AMD_15H_M30H_NB_F1;
			pvt->f2_id			= PCI_DEVICE_ID_AMD_15H_M30H_NB_F2;
			break;
		case 0x60:
			pvt->ctl_name			= "F15h_M60h";
			pvt->f1_id			= PCI_DEVICE_ID_AMD_15H_M60H_NB_F1;
			pvt->f2_id			= PCI_DEVICE_ID_AMD_15H_M60H_NB_F2;
			pvt->ops->dbam_to_cs		= f15_m60h_dbam_to_chip_select;
@@ -3811,7 +3810,6 @@ static int per_family_init(struct amd64_pvt *pvt)
			/* Richland is only client */
			return -ENODEV;
		default:
			pvt->ctl_name			= "F15h";
			pvt->f1_id			= PCI_DEVICE_ID_AMD_15H_NB_F1;
			pvt->f2_id			= PCI_DEVICE_ID_AMD_15H_NB_F2;
			pvt->ops->dbam_to_cs		= f15_dbam_to_chip_select;
@@ -3822,12 +3820,10 @@ static int per_family_init(struct amd64_pvt *pvt)
	case 0x16:
		switch (pvt->model) {
		case 0x30:
			pvt->ctl_name			= "F16h_M30h";
			pvt->f1_id			= PCI_DEVICE_ID_AMD_16H_M30H_NB_F1;
			pvt->f2_id			= PCI_DEVICE_ID_AMD_16H_M30H_NB_F2;
			break;
		default:
			pvt->ctl_name			= "F16h";
			pvt->f1_id			= PCI_DEVICE_ID_AMD_16H_NB_F1;
			pvt->f2_id			= PCI_DEVICE_ID_AMD_16H_NB_F2;
			break;
@@ -3836,76 +3832,51 @@ static int per_family_init(struct amd64_pvt *pvt)

	case 0x17:
		switch (pvt->model) {
		case 0x10 ... 0x2f:
			pvt->ctl_name			= "F17h_M10h";
			break;
		case 0x30 ... 0x3f:
			pvt->ctl_name			= "F17h_M30h";
			pvt->max_mcs			= 8;
			break;
		case 0x60 ... 0x6f:
			pvt->ctl_name			= "F17h_M60h";
			break;
		case 0x70 ... 0x7f:
			pvt->ctl_name			= "F17h_M70h";
			break;
		default:
			pvt->ctl_name			= "F17h";
			break;
		}
		break;

	case 0x18:
		pvt->ctl_name				= "F18h";
		break;

	case 0x19:
		switch (pvt->model) {
		case 0x00 ... 0x0f:
			pvt->ctl_name			= "F19h";
			pvt->max_mcs			= 8;
			break;
		case 0x10 ... 0x1f:
			pvt->ctl_name			= "F19h_M10h";
			pvt->max_mcs			= 12;
			pvt->flags.zn_regs_v2		= 1;
			break;
		case 0x20 ... 0x2f:
			pvt->ctl_name			= "F19h_M20h";
			break;
		case 0x30 ... 0x3f:
			if (pvt->F3->device == PCI_DEVICE_ID_AMD_MI200_DF_F3) {
				pvt->ctl_name		= "MI200";
				tmp_name			= "MI200";
				pvt->max_mcs		= 4;
				pvt->dram_type		= MEM_HBM2;
				pvt->gpu_umc_base	= 0x50000;
				pvt->ops		= &gpu_ops;
			} else {
				pvt->ctl_name		= "F19h_M30h";
				pvt->max_mcs		= 8;
			}
			break;
		case 0x50 ... 0x5f:
			pvt->ctl_name			= "F19h_M50h";
			break;
		case 0x60 ... 0x6f:
			pvt->ctl_name			= "F19h_M60h";
			pvt->flags.zn_regs_v2		= 1;
			break;
		case 0x70 ... 0x7f:
			pvt->ctl_name			= "F19h_M70h";
			pvt->max_mcs			= 4;
			pvt->flags.zn_regs_v2		= 1;
			break;
		case 0x90 ... 0x9f:
			pvt->ctl_name			= "F19h_M90h";
			pvt->max_mcs			= 4;
			pvt->dram_type			= MEM_HBM3;
			pvt->gpu_umc_base		= 0x90000;
			pvt->ops			= &gpu_ops;
			break;
		case 0xa0 ... 0xaf:
			pvt->ctl_name			= "F19h_MA0h";
			pvt->max_mcs			= 12;
			pvt->flags.zn_regs_v2		= 1;
			break;
@@ -3915,34 +3886,22 @@ static int per_family_init(struct amd64_pvt *pvt)
	case 0x1A:
		switch (pvt->model) {
		case 0x00 ... 0x1f:
			pvt->ctl_name           = "F1Ah";
			pvt->max_mcs            = 12;
			pvt->flags.zn_regs_v2   = 1;
			break;
		case 0x40 ... 0x4f:
			pvt->ctl_name           = "F1Ah_M40h";
			pvt->flags.zn_regs_v2   = 1;
			break;
		case 0x50 ... 0x57:
			pvt->ctl_name           = "F1Ah_M50h";
		case 0xc0 ... 0xc7:
			pvt->max_mcs            = 16;
			pvt->flags.zn_regs_v2   = 1;
			break;
		case 0x90 ... 0x9f:
			pvt->ctl_name           = "F1Ah_M90h";
			pvt->max_mcs            = 8;
			pvt->flags.zn_regs_v2   = 1;
			break;
		case 0xa0 ... 0xaf:
			pvt->ctl_name           = "F1Ah_MA0h";
			pvt->max_mcs            = 8;
			pvt->flags.zn_regs_v2   = 1;
			break;
		case 0xc0 ... 0xc7:
			pvt->ctl_name           = "F1Ah_MC0h";
			pvt->max_mcs            = 16;
			pvt->flags.zn_regs_v2   = 1;
			break;
		}
		break;

@@ -3951,6 +3910,16 @@ static int per_family_init(struct amd64_pvt *pvt)
		return -ENODEV;
	}

	if (tmp_name)
		scnprintf(pvt->ctl_name, sizeof(pvt->ctl_name), tmp_name);
	else
		scnprintf(pvt->ctl_name, sizeof(pvt->ctl_name), "F%02Xh_M%02Xh",
			  pvt->fam, pvt->model);

	pvt->csels = kcalloc(pvt->max_mcs, sizeof(*pvt->csels), GFP_KERNEL);
	if (!pvt->csels)
		return -ENOMEM;

	return 0;
}

Loading