Loading Documentation/edac/memory_repair.rst +31 −0 Original line number Diff line number Diff line Loading @@ -119,3 +119,34 @@ sysfs Sysfs files are documented in `Documentation/ABI/testing/sysfs-edac-memory-repair`. Examples -------- The memory repair usage takes the form shown in this example: 1. CXL memory sparing Memory sparing is defined as a repair function that replaces a portion of memory with a portion of functional memory at that same DPA. The subclass for this operation, cacheline/row/bank/rank sparing, vary in terms of the scope of the sparing being performed. Memory sparing maintenance operations may be supported by CXL devices that implement CXL.mem protocol. A sparing maintenance operation requests the CXL device to perform a repair operation on its media. For example, a CXL device with DRAM components that support memory sparing features may implement sparing maintenance operations. 2. CXL memory Soft Post Package Repair (sPPR) Post Package Repair (PPR) maintenance operations may be supported by CXL devices that implement CXL.mem protocol. A PPR maintenance operation requests the CXL device to perform a repair operation on its media. For example, a CXL device with DRAM components that support PPR features may implement PPR Maintenance operations. Soft PPR (sPPR) is a temporary row repair. Soft PPR may be faster, but the repair is lost with a power cycle. Sysfs files for memory repair are documented in `Documentation/ABI/testing/sysfs-edac-memory-repair` Documentation/edac/scrub.rst +76 −0 Original line number Diff line number Diff line Loading @@ -264,3 +264,79 @@ Sysfs files are documented in `Documentation/ABI/testing/sysfs-edac-scrub` `Documentation/ABI/testing/sysfs-edac-ecs` Examples -------- The usage takes the form shown in these examples: 1. CXL memory Patrol Scrub The following are the use cases identified why we might increase the scrub rate. - Scrubbing is needed at device granularity because a device is showing unexpectedly high errors. - Scrubbing may apply to memory that isn't online at all yet. Likely this is a system wide default setting on boot. - Scrubbing at a higher rate because the monitor software has determined that more reliability is necessary for a particular data set. This is called Differentiated Reliability. 1.1. Device based scrubbing CXL memory is exposed to memory management subsystem and ultimately userspace via CXL devices. Device-based scrubbing is used for the first use case described in "Section 1 CXL Memory Patrol Scrub". When combining control via the device interfaces and region interfaces, "see Section 1.2 Region based scrubbing". Sysfs files for scrubbing are documented in `Documentation/ABI/testing/sysfs-edac-scrub` 1.2. Region based scrubbing CXL memory is exposed to memory management subsystem and ultimately userspace via CXL regions. CXL Regions represent mapped memory capacity in system physical address space. These can incorporate one or more parts of multiple CXL memory devices with traffic interleaved across them. The user may want to control the scrub rate via this more abstract region instead of having to figure out the constituent devices and program them separately. The scrub rate for each device covers the whole device. Thus if multiple regions use parts of that device then requests for scrubbing of other regions may result in a higher scrub rate than requested for this specific region. Region-based scrubbing is used for the third use case described in "Section 1 CXL Memory Patrol Scrub". Userspace must follow below set of rules on how to set the scrub rates for any mixture of requirements. 1. Taking each region in turn from lowest desired scrub rate to highest and set their scrub rates. Later regions may override the scrub rate on individual devices (and hence potentially whole regions). 2. Take each device for which enhanced scrubbing is required (higher rate) and set those scrub rates. This will override the scrub rates of individual devices, setting them to the maximum rate required for any of the regions they help back, unless a specific rate is already defined. Sysfs files for scrubbing are documented in `Documentation/ABI/testing/sysfs-edac-scrub` 2. CXL memory Error Check Scrub (ECS) The Error Check Scrub (ECS) feature enables a memory device to perform error checking and correction (ECC) and count single-bit errors. The associated memory controller sets the ECS mode with a trigger sent to the memory device. CXL ECS control allows the host, thus the userspace, to change the attributes for error count mode, threshold number of errors per segment (indicating how many segments have at least that number of errors) for reporting errors, and reset the ECS counter. Thus the responsibility for initiating Error Check Scrub on a memory device may lie with the memory controller or platform when unexpectedly high error rates are detected. Sysfs files for scrubbing are documented in `Documentation/ABI/testing/sysfs-edac-ecs` drivers/cxl/Kconfig +71 −0 Original line number Diff line number Diff line Loading @@ -114,6 +114,77 @@ config CXL_FEATURES If unsure say 'n' config CXL_EDAC_MEM_FEATURES bool "CXL: EDAC Memory Features" depends on EXPERT depends on CXL_MEM depends on CXL_FEATURES depends on EDAC >= CXL_BUS help The CXL EDAC memory feature is optional and allows host to control the EDAC memory features configurations of CXL memory expander devices. Say 'y' if you have an expert need to change default settings of a memory RAS feature established by the platform/device. Otherwise say 'n'. config CXL_EDAC_SCRUB bool "Enable CXL Patrol Scrub Control (Patrol Read)" depends on CXL_EDAC_MEM_FEATURES depends on EDAC_SCRUB help The CXL EDAC scrub control is optional and allows host to control the scrub feature configurations of CXL memory expander devices. When enabled 'cxl_mem' and 'cxl_region' EDAC devices are published with memory scrub control attributes as described by Documentation/ABI/testing/sysfs-edac-scrub. Say 'y' if you have an expert need to change default settings of a memory scrub feature established by the platform/device (e.g. scrub rates for the patrol scrub feature). Otherwise say 'n'. config CXL_EDAC_ECS bool "Enable CXL Error Check Scrub (Repair)" depends on CXL_EDAC_MEM_FEATURES depends on EDAC_ECS help The CXL EDAC ECS control is optional and allows host to control the ECS feature configurations of CXL memory expander devices. When enabled 'cxl_mem' EDAC devices are published with memory ECS control attributes as described by Documentation/ABI/testing/sysfs-edac-ecs. Say 'y' if you have an expert need to change default settings of a memory ECS feature established by the platform/device. Otherwise say 'n'. config CXL_EDAC_MEM_REPAIR bool "Enable CXL Memory Repair" depends on CXL_EDAC_MEM_FEATURES depends on EDAC_MEM_REPAIR help The CXL EDAC memory repair control is optional and allows host to control the memory repair features (e.g. sparing, PPR) configurations of CXL memory expander devices. When enabled, the memory repair feature requires an additional memory of approximately 43KB to store CXL DRAM and CXL general media event records. When enabled 'cxl_mem' EDAC devices are published with memory repair control attributes as described by Documentation/ABI/testing/sysfs-edac-memory-repair. Say 'y' if you have an expert need to change default settings of a memory repair feature established by the platform/device. Otherwise say 'n'. config CXL_PORT default CXL_BUS tristate Loading drivers/cxl/core/Makefile +1 −0 Original line number Diff line number Diff line Loading @@ -20,3 +20,4 @@ cxl_core-$(CONFIG_TRACING) += trace.o cxl_core-$(CONFIG_CXL_REGION) += region.o cxl_core-$(CONFIG_CXL_MCE) += mce.o cxl_core-$(CONFIG_CXL_FEATURES) += features.o cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += edac.o drivers/cxl/core/core.h +2 −0 Original line number Diff line number Diff line Loading @@ -124,6 +124,8 @@ int cxl_acpi_get_extended_linear_cache_size(struct resource *backing_res, int nid, resource_size_t *size); #ifdef CONFIG_CXL_FEATURES struct cxl_feat_entry * cxl_feature_info(struct cxl_features_state *cxlfs, const uuid_t *uuid); size_t cxl_get_feature(struct cxl_mailbox *cxl_mbox, const uuid_t *feat_uuid, enum cxl_get_feat_selection selection, void *feat_out, size_t feat_out_size, u16 offset, Loading Loading
Documentation/edac/memory_repair.rst +31 −0 Original line number Diff line number Diff line Loading @@ -119,3 +119,34 @@ sysfs Sysfs files are documented in `Documentation/ABI/testing/sysfs-edac-memory-repair`. Examples -------- The memory repair usage takes the form shown in this example: 1. CXL memory sparing Memory sparing is defined as a repair function that replaces a portion of memory with a portion of functional memory at that same DPA. The subclass for this operation, cacheline/row/bank/rank sparing, vary in terms of the scope of the sparing being performed. Memory sparing maintenance operations may be supported by CXL devices that implement CXL.mem protocol. A sparing maintenance operation requests the CXL device to perform a repair operation on its media. For example, a CXL device with DRAM components that support memory sparing features may implement sparing maintenance operations. 2. CXL memory Soft Post Package Repair (sPPR) Post Package Repair (PPR) maintenance operations may be supported by CXL devices that implement CXL.mem protocol. A PPR maintenance operation requests the CXL device to perform a repair operation on its media. For example, a CXL device with DRAM components that support PPR features may implement PPR Maintenance operations. Soft PPR (sPPR) is a temporary row repair. Soft PPR may be faster, but the repair is lost with a power cycle. Sysfs files for memory repair are documented in `Documentation/ABI/testing/sysfs-edac-memory-repair`
Documentation/edac/scrub.rst +76 −0 Original line number Diff line number Diff line Loading @@ -264,3 +264,79 @@ Sysfs files are documented in `Documentation/ABI/testing/sysfs-edac-scrub` `Documentation/ABI/testing/sysfs-edac-ecs` Examples -------- The usage takes the form shown in these examples: 1. CXL memory Patrol Scrub The following are the use cases identified why we might increase the scrub rate. - Scrubbing is needed at device granularity because a device is showing unexpectedly high errors. - Scrubbing may apply to memory that isn't online at all yet. Likely this is a system wide default setting on boot. - Scrubbing at a higher rate because the monitor software has determined that more reliability is necessary for a particular data set. This is called Differentiated Reliability. 1.1. Device based scrubbing CXL memory is exposed to memory management subsystem and ultimately userspace via CXL devices. Device-based scrubbing is used for the first use case described in "Section 1 CXL Memory Patrol Scrub". When combining control via the device interfaces and region interfaces, "see Section 1.2 Region based scrubbing". Sysfs files for scrubbing are documented in `Documentation/ABI/testing/sysfs-edac-scrub` 1.2. Region based scrubbing CXL memory is exposed to memory management subsystem and ultimately userspace via CXL regions. CXL Regions represent mapped memory capacity in system physical address space. These can incorporate one or more parts of multiple CXL memory devices with traffic interleaved across them. The user may want to control the scrub rate via this more abstract region instead of having to figure out the constituent devices and program them separately. The scrub rate for each device covers the whole device. Thus if multiple regions use parts of that device then requests for scrubbing of other regions may result in a higher scrub rate than requested for this specific region. Region-based scrubbing is used for the third use case described in "Section 1 CXL Memory Patrol Scrub". Userspace must follow below set of rules on how to set the scrub rates for any mixture of requirements. 1. Taking each region in turn from lowest desired scrub rate to highest and set their scrub rates. Later regions may override the scrub rate on individual devices (and hence potentially whole regions). 2. Take each device for which enhanced scrubbing is required (higher rate) and set those scrub rates. This will override the scrub rates of individual devices, setting them to the maximum rate required for any of the regions they help back, unless a specific rate is already defined. Sysfs files for scrubbing are documented in `Documentation/ABI/testing/sysfs-edac-scrub` 2. CXL memory Error Check Scrub (ECS) The Error Check Scrub (ECS) feature enables a memory device to perform error checking and correction (ECC) and count single-bit errors. The associated memory controller sets the ECS mode with a trigger sent to the memory device. CXL ECS control allows the host, thus the userspace, to change the attributes for error count mode, threshold number of errors per segment (indicating how many segments have at least that number of errors) for reporting errors, and reset the ECS counter. Thus the responsibility for initiating Error Check Scrub on a memory device may lie with the memory controller or platform when unexpectedly high error rates are detected. Sysfs files for scrubbing are documented in `Documentation/ABI/testing/sysfs-edac-ecs`
drivers/cxl/Kconfig +71 −0 Original line number Diff line number Diff line Loading @@ -114,6 +114,77 @@ config CXL_FEATURES If unsure say 'n' config CXL_EDAC_MEM_FEATURES bool "CXL: EDAC Memory Features" depends on EXPERT depends on CXL_MEM depends on CXL_FEATURES depends on EDAC >= CXL_BUS help The CXL EDAC memory feature is optional and allows host to control the EDAC memory features configurations of CXL memory expander devices. Say 'y' if you have an expert need to change default settings of a memory RAS feature established by the platform/device. Otherwise say 'n'. config CXL_EDAC_SCRUB bool "Enable CXL Patrol Scrub Control (Patrol Read)" depends on CXL_EDAC_MEM_FEATURES depends on EDAC_SCRUB help The CXL EDAC scrub control is optional and allows host to control the scrub feature configurations of CXL memory expander devices. When enabled 'cxl_mem' and 'cxl_region' EDAC devices are published with memory scrub control attributes as described by Documentation/ABI/testing/sysfs-edac-scrub. Say 'y' if you have an expert need to change default settings of a memory scrub feature established by the platform/device (e.g. scrub rates for the patrol scrub feature). Otherwise say 'n'. config CXL_EDAC_ECS bool "Enable CXL Error Check Scrub (Repair)" depends on CXL_EDAC_MEM_FEATURES depends on EDAC_ECS help The CXL EDAC ECS control is optional and allows host to control the ECS feature configurations of CXL memory expander devices. When enabled 'cxl_mem' EDAC devices are published with memory ECS control attributes as described by Documentation/ABI/testing/sysfs-edac-ecs. Say 'y' if you have an expert need to change default settings of a memory ECS feature established by the platform/device. Otherwise say 'n'. config CXL_EDAC_MEM_REPAIR bool "Enable CXL Memory Repair" depends on CXL_EDAC_MEM_FEATURES depends on EDAC_MEM_REPAIR help The CXL EDAC memory repair control is optional and allows host to control the memory repair features (e.g. sparing, PPR) configurations of CXL memory expander devices. When enabled, the memory repair feature requires an additional memory of approximately 43KB to store CXL DRAM and CXL general media event records. When enabled 'cxl_mem' EDAC devices are published with memory repair control attributes as described by Documentation/ABI/testing/sysfs-edac-memory-repair. Say 'y' if you have an expert need to change default settings of a memory repair feature established by the platform/device. Otherwise say 'n'. config CXL_PORT default CXL_BUS tristate Loading
drivers/cxl/core/Makefile +1 −0 Original line number Diff line number Diff line Loading @@ -20,3 +20,4 @@ cxl_core-$(CONFIG_TRACING) += trace.o cxl_core-$(CONFIG_CXL_REGION) += region.o cxl_core-$(CONFIG_CXL_MCE) += mce.o cxl_core-$(CONFIG_CXL_FEATURES) += features.o cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += edac.o
drivers/cxl/core/core.h +2 −0 Original line number Diff line number Diff line Loading @@ -124,6 +124,8 @@ int cxl_acpi_get_extended_linear_cache_size(struct resource *backing_res, int nid, resource_size_t *size); #ifdef CONFIG_CXL_FEATURES struct cxl_feat_entry * cxl_feature_info(struct cxl_features_state *cxlfs, const uuid_t *uuid); size_t cxl_get_feature(struct cxl_mailbox *cxl_mbox, const uuid_t *feat_uuid, enum cxl_get_feat_selection selection, void *feat_out, size_t feat_out_size, u16 offset, Loading