Commit 2e6fe1bb authored by Qiuxu Zhuo's avatar Qiuxu Zhuo Committed by Tony Luck
Browse files

EDAC/i10nm: Skip DIMM enumeration on a disabled memory controller



When loading the i10nm_edac driver on some Intel Granite Rapids servers,
a call trace may appear as follows:

  UBSAN: shift-out-of-bounds in drivers/edac/skx_common.c:453:16
  shift exponent -66 is negative
  ...
  __ubsan_handle_shift_out_of_bounds+0x1e3/0x390
  skx_get_dimm_info.cold+0x47/0xd40 [skx_edac_common]
  i10nm_get_dimm_config+0x23e/0x390 [i10nm_edac]
  skx_register_mci+0x159/0x220 [skx_edac_common]
  i10nm_init+0xcb0/0x1ff0 [i10nm_edac]
  ...

This occurs because some BIOS may disable a memory controller if there
aren't any memory DIMMs populated on this memory controller. The DIMMMTR
register of this disabled memory controller contains the invalid value
~0, resulting in the call trace above.

Fix this call trace by skipping DIMM enumeration on a disabled memory
controller.

Fixes: ba987eaa ("EDAC/i10nm: Add Intel Granite Rapids server support")
Reported-by: default avatarJose Jesus Ambriz Meza <jose.jesus.ambriz.meza@intel.com>
Reported-by: default avatarChia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Closes: https://lore.kernel.org/all/20250730063155.2612379-1-acelan.kao@canonical.com/


Signed-off-by: default avatarQiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
Tested-by: default avatarChia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Link: https://lore.kernel.org/r/20250806065707.3533345-1-qiuxu.zhuo@intel.com
parent 71b69f81
Loading
Loading
Loading
Loading
+14 −0
Original line number Diff line number Diff line
@@ -1057,6 +1057,15 @@ static bool i10nm_check_ecc(struct skx_imc *imc, int chan)
	return !!GET_BITFIELD(mcmtr, 2, 2);
}

static bool i10nm_channel_disabled(struct skx_imc *imc, int chan)
{
	u32 mcmtr = I10NM_GET_MCMTR(imc, chan);

	edac_dbg(1, "mc%d ch%d mcmtr reg %x\n", imc->mc, chan, mcmtr);

	return (mcmtr == ~0 || GET_BITFIELD(mcmtr, 18, 18));
}

static int i10nm_get_dimm_config(struct mem_ctl_info *mci,
				 struct res_config *cfg)
{
@@ -1070,6 +1079,11 @@ static int i10nm_get_dimm_config(struct mem_ctl_info *mci,
		if (!imc->mbase)
			continue;

		if (i10nm_channel_disabled(imc, i)) {
			edac_dbg(1, "mc%d ch%d is disabled.\n", imc->mc, i);
			continue;
		}

		ndimms = 0;

		if (res_cfg->type != GNR)