Use ipmctl to Debug Intel® Optane™ DC Persistent Memory Modules

ID 660418
Updated 6/26/2019
Version Latest
Public

Introduction

This article describes how to debug or further configure your Intel® persistent memory devices with ipmctl. ipmctl is an open source tool maintained by Intel and is available for download on GitHub*. With ipmctl, you can select operating modes, create goals, provision capacities, create regions, and much more. The most common ipmctl calls are described in our Quick Start Guide.

This article assumes you have basic knowledge of ipmctl and persistent memory programming concepts. If you’re just getting started, check out the Quick Start Guide first, and come back to this article for debugging assistance.

Discover Configuration

Show Topology

To see available resources, use the show topology command, which displays both the Intel® Optane™ DC persistent memory modules and DDR4 dual in-line memory modules (DIMMs) discovered in the system by enumerating the SMIOS Type 17 tables. For more information on this, please refer to ACPI Specifications v6.0 or the Advanced Configuration Tables section of this article for NFIT table information.

Platform Configuration Details

You can learn many details about your configuration from looking at the platform configuration details (PCD) with the following command:

# ipmctl show –dimm 0x0001 -pcd

The tables that are shown when this command is run are:

  • Configuration Header
  • Current Config
  • Interleave Information
  • Identification Information x6
  • Conf Input
  • Conf Output
  • Partition Size Change
  • Interleave Information
  • Identification Information x6
  • Label Storage Area—Current Index
  • Label Storage Area—Labels

Advanced Configuration and Power Interface Tables

The following Advanced Configuration and Power Interface (ACPI) tables are available:

  • NFIT: The nonvolatile dual in-line memory module (NVDIMM) Firmware Interface Table
  • PCAT: The Platform Capabilities Table
  • PMTT: The Platform Memory Topology Table

Shortened versions of the output of each command can be seen below:

NFIT

# ipmctl show -system NFIT ---NVDIMM Firmware Interface Table--- Signature: NFIT Length: 3296 bytes Revision: 0x1 Checksum: 0x32 OEMID: INTEL OEMTableID: S2600WF OEMRevision: 0x2 CreatorID: INTL CreatorRevision: 0x20091013 BwRegionTablesNum: 0 ControlRegionTablesNum: 12 FlushHintTablesNum: 12 InterleaveTablesNum: 24 NVDIMMRegionTablesNum: 24 SmbiosTablesNum: 0 SpaRangeTablesNum: 3 PlatformCapabilitiesTablesNum: 1 Type: 0x4 Length: 32 bytes TypeEquals: ControlRegion ControlRegionDescriptorTableIndex: 0x1 VendorId: 0x8980 DeviceId: 0x4151 Rid: 0x0 SubsystemVendorId: 0x8980 SubsystemDeviceId: 0x97a SubsystemRid: 0x18 ValidFields: 0x1 ManufacturingLocation: 0xa2 ManufacturingDate: 0x3718 SerialNumber: 0x63110000 RegionFormatInterfaceCode: 0x301 NumberOfBlockControlWindows: 0x0 ... Type: 0x2 Length: 80 bytes TypeEquals: Interleave InterleaveStructureIndex: 0x9 NumberOfLinesDescribed: 0x10 LineSize: 0x100 LineOffset 0: 0x0 LineOffset 1: 0x3 LineOffset 2: 0x6 LineOffset 3: 0x9 LineOffset 4: 0xc LineOffset 5: 0x3f LineOffset 6: 0x42 LineOffset 7: 0x45 LineOffset 8: 0x48 LineOffset 9: 0x4b LineOffset 10: 0x7e LineOffset 11: 0x81 LineOffset 12: 0x84 LineOffset 13: 0x87 LineOffset 14: 0x8a LineOffset 15: 0x8d ... Type: 0x1 Length: 48 bytes TypeEquals: NvDimmRegion NfitDeviceHandle: 0x0001 NfitDeviceHandle.DimmNumber: 0x1 NfitDeviceHandle.MemChannel: 0x0 NfitDeviceHandle.MemControllerId: 0x0 NfitDeviceHandle.SocketId: 0x0 NfitDeviceHandle.NodeControllerId: 0x0 NvDimmPhysicalId: 0x28 NvDimmRegionalId: 0x0 SpaRangeDescriptionTableIndex: 0x1 NvdimmControlRegionDescriptorTableIndex: 0x1 NvDimmRegionSize: 0x3f00000000 RegionOffset: 0x0 NvDimmPhysicalAddressRegionBase: 0x10000000 InterleaveStructureIndex: 0x1 InterleaveWays: 0x6 NvDimmStateFlags: 0x34 ... Type: 0x0 Length: 56 bytes TypeEquals: SpaRange AddressRangeType: 66f0d379-b4f3-4074-ac43-0d3318b78cdb SpaRangeDescriptionTableIndex: 0x1 Flags: 0x2 ProximityDomain: 0x2 SystemPhysicalAddressRangeBase: 0x3060000000 SystemPhysicalAddressRangeLength: 0x17a00000000 MemoryMappingAttribute: 0x8008 ... ---NVDIMM Firmware Interface Table--- Signature: NFIT Length: 3296 bytes Revision: 0x1 Checksum: 0x32 OEMID: INTEL OEMTableID: S2600WF OEMRevision: 0x2 CreatorID: INTL CreatorRevision: 0x20091013 BwRegionTablesNum: 0 ControlRegionTablesNum: 12 FlushHintTablesNum: 12 InterleaveTablesNum: 24 NVDIMMRegionTablesNum: 24 SmbiosTablesNum: 0 SpaRangeTablesNum: 3 PlatformCapabilitiesTablesNum: 1 Type: 0x4 Length: 32 bytes TypeEquals: ControlRegion ControlRegionDescriptorTableIndex: 0x1 VendorId: 0x8980 DeviceId: 0x4151 Rid: 0x0 SubsystemVendorId: 0x8980 SubsystemDeviceId: 0x97a SubsystemRid: 0x18 ValidFields: 0x1 ManufacturingLocation: 0xa2 ManufacturingDate: 0x3718 SerialNumber: 0x63110000 RegionFormatInterfaceCode: 0x301 NumberOfBlockControlWindows: 0x0 ... Type: 0x2 Length: 80 bytes TypeEquals: Interleave InterleaveStructureIndex: 0x9 NumberOfLinesDescribed: 0x10 LineSize: 0x100 LineOffset 0: 0x0 LineOffset 1: 0x3 LineOffset 2: 0x6 LineOffset 3: 0x9 LineOffset 4: 0xc LineOffset 5: 0x3f LineOffset 6: 0x42 LineOffset 7: 0x45 LineOffset 8: 0x48 LineOffset 9: 0x4b LineOffset 10: 0x7e LineOffset 11: 0x81 LineOffset 12: 0x84 LineOffset 13: 0x87 LineOffset 14: 0x8a LineOffset 15: 0x8d ... Type: 0x1 Length: 48 bytes TypeEquals: NvDimmRegion NfitDeviceHandle: 0x0001 NfitDeviceHandle.DimmNumber: 0x1 NfitDeviceHandle.MemChannel: 0x0 NfitDeviceHandle.MemControllerId: 0x0 NfitDeviceHandle.SocketId: 0x0 NfitDeviceHandle.NodeControllerId: 0x0 NvDimmPhysicalId: 0x28 NvDimmRegionalId: 0x0 SpaRangeDescriptionTableIndex: 0x1 NvdimmControlRegionDescriptorTableIndex: 0x1 NvDimmRegionSize: 0x3f00000000 RegionOffset: 0x0 NvDimmPhysicalAddressRegionBase: 0x10000000 InterleaveStructureIndex: 0x1 InterleaveWays: 0x6 NvDimmStateFlags: 0x34 ... Type: 0x0 Length: 56 bytes TypeEquals: SpaRange AddressRangeType: 66f0d379-b4f3-4074-ac43-0d3318b78cdb SpaRangeDescriptionTableIndex: 0x1 Flags: 0x2 ProximityDomain: 0x2 SystemPhysicalAddressRangeBase: 0x3060000000 SystemPhysicalAddressRangeLength: 0x17a00000000 MemoryMappingAttribute: 0x8008 ...

PCAT

# ipmctl show -system PCAT ---Platform Configurations Attributes Table--- Signature: PCAT Length: 136 bytes Revision: 0x2 Checksum: 0xae OEMID: INTEL OEMTableID: S2600WF OEMRevision: 0x2 CreatorID: INTL CreatorRevision: 0x20091013 Type: 0x0 Length: 16 bytes TypeEquals: PlatformCapabilityInfoTable IntelNVDIMMManagementSWConfigInputSupport: 0x1 MemoryModeCapabilities: 0x27 CurrentMemoryMode: 0x14 PersistentMemoryRASCapability: 0x0 Type: 0x1 Length: 16 bytes TypeEquals: MemoryInterleaveCapabilityTable MemoryMode: 0x3 InterleaveAlignmentSize: 0x1e NumberOfInterleaveFormatsSupported: 0x1 InterleaveFormatSupported(0): 0x801f4040 Type: 0x6 Length: 32 bytes SocketSkuInfoTable SocketID: 0x0 MappedMemorySizeLimit: 4947802324992 TotalMemorySizeMappedToSpa: 1828582326272 CachingMemorySize: 0 ...

PMTT

# ipmctl show -system PMTT ---Platform Memory Topology Table--- Signature: PMTT Length: 1336 bytes Revision: 0x1 Checksum: 0x9f OEMID: INTEL OEMTableID: S2600WF OEMRevision: 0x1 CreatorID: INTL CreatorRevision: 0x20091013 --------------------------Socket-------------------------- Type: 0 Reserved1: 0 Length: 324 Flags:3 Reserved2:0 SocketId: 0 Reserved3: 0 -------------------iMC------------------- Type: 1 Reserved1: 0 Length: 156 Flags:2 Reserved2:0 ReadLatency: 0 WriteLatency: 0 ReadBW: 0 WriteBW:0 OptimalAccessUnit:0 OptimalAccessAlignment:0 Reserved3:0 NoOfProximityDomains:0 ProximityDomainArray:1 ----MODULE---- Type: 2 Reserved1: 0 Length: 20 Flags:2 Reserved2:0 PhysicalComponentId: 0 Reserved3: 0 SizeOfDimm: 32768 ----MODULE---- ...

Health Monitoring

Show DIMM Information

The show -dimm command displays the Intel Optane DC persistent memory modules discovered in the system and verifies that software can communicate with them. Among other information, this command outputs each DIMM’s ID, capacity, health state, and firmware version:

# ipmctl show –dimm

Sensor Health States

ipmctl has the ability to see health states of sensors located on each persistent memory module. The sensors available are:

  • Health
  • MediaTemperature
  • ControllerTemperature
  • PercentagRemaining
  • LatchedDirtyShutdownCount
  • PowerOnTime
  • UpTime
  • PowerCycles
  • FwErrorCount
  • UnlatchedDirtyShutdownCount

Use the following command to see sensor health for a specific module. Health values for all modules can be seen by not specifying a DimmID.

# ipmctl show -sensor -dimm 0x0001 DimmID | Type | CurrentValue | CurrentState ==================================================================== 0x0001 | Health | Healthy | Normal 0x0001 | MediaTemperature | 33C | Normal 0x0001 | ControllerTemperature | 35C | Normal 0x0001 | PercentageRemaining | 100% | Normal 0x0001 | LatchedDirtyShutdownCount | 2 | Normal 0x0001 | PowerOnTime | 12944539s | Normal 0x0001 | UpTime | 2728s | Normal 0x0001 | PowerCycles | 80 | Normal 0x0001 | FwErrorCount | 8 | Normal 0x0001 | UnlatchedDirtyShutdownCount | 34 | Normal

Percentage Life Remaining

The remaining life of a persistent memory module is based on the number of reads/writes left in its lifetime. Use the following command to see the percentage of life remaining on each module. In the example below, you can see that DIMM 0x0101 has 45 percent life remaining, and the rest have 100 percent.

# ipmctl show -sensor PercentageRemaining DimmID | Type | CurrentValue | CurrentState ============================================================ 0x0001 | PercentageRemaining | 100% | Normal 0x0011 | PercentageRemaining | 100% | Normal 0x0021 | PercentageRemaining | 100% | Normal 0x0101 | PercentageRemaining | 45% | Normal 0x0111 | PercentageRemaining | 100% | Normal 0x0121 | PercentageRemaining | 100% | Normal 0x1001 | PercentageRemaining | 100% | Normal 0x1011 | PercentageRemaining | 100% | Normal 0x1021 | PercentageRemaining | 100% | Normal 0x1101 | PercentageRemaining | 100% | Normal 0x1111 | PercentageRemaining | 100% | Normal 0x1121 | PercentageRemaining | 100% | Normal

Similar to how in this call we can see the PercentageRemaining sensor value for each DIMM available, you could replace PercentageRemaining with any of the other sensor types and see their values that way.

On DIMM 0x0101, I injected an error to specify the PercentageRemaining to be 45 percent. You can read more about error injection in the Debugging section.

Change Sensor Thresholds

Each sensor has a set threshold that specifies the Normal range. On your modules, you can set your own threshold, called the NonCriticalThreshold. For example, if you were to set the MediaTemperature NonCriticalThreshold to a lower number than the Normal range, you would get a warning if the temperature went above that number specified. Each sensor’s threshold limit can be set with the following command:

# ipmctl set -sensor MediaTemperature -dimm 0x0001 NonCriticalThreshold=51 EnabledState=1 Modifying settings on DIMM (0x0001). Do you want to continue? [y/n] y Modify media temperature settings on DIMM 0x0001: Success

Performance

Show Sensor Performance Per DIMM

Performance indicators can be seen either per DIMM, per indicator, or all of the above as a big dump. To see all the performance indicators of a single DIMM, use this command:

# ipmctl show -dimm 0x0001 -performance ---DimmID=0x0001--- MediaReads=0x0000000000000000000000011dd1d084 MediaWrites=0x0000000000000000000000001e877cc0 ReadRequests=0x000000000000000000000000000959b7 WriteRequests=0x0000000000000000000000000000974f TotalMediaReads=0x00000000000000000000008c4c411278 TotalMediaWrites=0x0000000000000000000000523e0292f8 TotalReadRequests=0x000000000000000000000006b0fd3128 TotalWriteRequests=0x000000000000000000000007dd265020

Here is the full list of performance indicators:

  • DimmID: The Intel Optane DC persistent memory module identifier.
  • MediaReads: Number of 64-byte reads from media on the Intel Optane DC persistent memory module since the last alternating current (AC) cycle.
  • MediaWrites: Number of 64-byte writes to media on the Intel Optane DC persistent memory module since the last AC cycle.
  • ReadRequests: Number of DDRT read transactions that the Intel Optane DC persistent memory module has serviced since the last AC cycle.
  • WriteRequests: Number of DDRT write transactions that the Intel Optane DC persistent memory module has serviced since the last AC cycle.
  • TotalMediaReads: Number of 64-byte reads from the media on the Intel Optane DC persistent memory module over its lifetime.
  • TotalMediaWrites: Number of 64-byte writes to media on the Intel Optane DC persistent memory module over its lifetime.
  • TotalReadRequest: Number of DDRT read transactions that the Intel Optane DC persistent memory module has serviced over its lifetime.
  • TotalWriteRequest: Number of DDRT write transactions that the Intel Optane DC persistent memory module has serviced over its lifetime.

Debugging

Discover Errors

To debug errors on your modules, the following commands will come in handy. Seeing the error log can easily be done with show error log command.

# ipmctl show -dimm 0x1111 -error Thermal Level=High No errors found on DIMM 0x1111 Show error executed successfully

If an error is present, the output will be similar to:

# ipmctl show -dimm 0x0001 -error Media Level=High Media Error occurred on DIMM 0x0001: System Timestamp : Thu Jan 01 00:45:32 UTC 1998 DPA : 0x00012880 PDA : 0x00000001 Range : 4B Error Type : 4 - Locked/Illegal Access Error Flags : DPA Valid Transaction Type : 10 - CSR Read Sequence Number : 20

The –error option can be either Thermal or Media, with severity levels of either High or Low.

Inject an Error

For testing purposes, you may want to inject a mock error onto your persistent memory modules. Injectable errors include: Temperature, Poison, PoisonType, PackageSparing, PercentageRemaining, FatalMediaError, and DirtyShutdown. It is important to note that this command is only available when error injection is enabled on the Intel Optane DC persistent memory module in the BIOS. Examples of each of these can be seen in the ipmctl-inject-error man pages.

To change the PercentageRemaining:

# ipmctl set -dimm 0x1001 PercentageRemaining=84 Trigger a percentage remaining on DIMM 0x1001: Success

To change the Temperature (Celsius) variable:

# ipmctl set -dimm 0x1111 Temperature=12 Set temperature on DIMM 0x1111: Success

To clear injected errors, specify which injection property (Temperature, Poison, PoisonType, PackageSparing, PercentageRemaining, FatalMediaError, or DirtyShutdown), and add Clear=1. For example, the first call clears all DIMMs of any injected Temperature changes:

# ipmctl set -dimm Clear=1 Temperature=1

This call clears only DIMM 0x1001 of the injected PercentageRemaining change:

# ipmctl set -dimm 0x1001 PercentageRemaining=10 Clear=1

Diagnose Further Problems

Use the start diagnostic command to see a quick health overview of your persistent memory modules. After the –diagnostic flag, you can specify any of the following flags. Or, if left blank, all will display.

  • Quick - This test verifies that the Intel Optane DC persistent memory module host mailbox is accessible and that basic health indicators can be read and are currently reporting acceptable values.
  • Config - This test verifies that the BIOS platform configuration matches the installed hardware, and the platform configuration conforms to best-known practices.
  • Security - This test verifies that all Intel Optane DC persistent memory modules have a consistent security state. It is a best practice to enable security on all Intel Optane DC persistent memory modules, rather than just some.
  • FW - This test verifies that all Intel Optane DC persistent memory modules of a given model have consistent FW installed and other FW modifiable attributes are set in accordance with best practices.

Note that the test does not have a means of verifying that the installed FW is the optimal version for a given Intel Optane DC persistent memory module model, just that it has been consistently applied across the system.

For example, the following command shows all the diagnostic flags for DIMM 0x0001:

# ipmctl start -diagnostic -dimm 0x0001 ---Diagnostic=Quick--- State=Ok Message=The quick health check detected that the firmware on DIMM 0x0001 experienced a dirty shutdown before its latest restart. The quick health check succeeded. ---Diagnostic=Config--- State=Ok Message=The platform configuration check succeeded. ---Diagnostic=Security--- State=Ok Message=The security check succeeded. ---Diagnostic=FW--- State=Warning Message=The firmware consistency and settings check detected that DIMM 0x0001 is greater than system time by 21 seconds. The firmware consistency and settings check detected that DIMM 0x0011 is greater than system time by 22 seconds. The firmware consistency and settings check detected that DIMM 0x0021 is greater than system time by 23 seconds. The firmware consistency and settings check detected that DIMM 0x0101 is reporting a percentage remaining of 45% which is below the recommended threshold 50% The firmware consistency and settings check detected that DIMM 0x0101 is greater than system time by 22 seconds. The firmware consistency and settings check detected that DIMM 0x0111 is greater than system time by 22 seconds. The firmware consistency and settings check detected that DIMM 0x0121 is greater than system time by 22 seconds. The firmware consistency and settings check detected that DIMM 0x1001 is greater than system time by 22 seconds. The firmware consistency and settings check detected that DIMM 0x1011 is greater than system time by 22 seconds. The firmware consistency and settings check detected that DIMM 0x1021 is greater than system time by 22 seconds. The firmware consistency and settings check detected that DIMM 0x1101 is greater than system time by 22 seconds. The firmware consistency and settings check detected that DIMM 0x1111 is greater than system time by 22 seconds. The firmware consistency and settings check detected that DIMM 0x1121 is greater than system time by 23 seconds.

Security

Firmware Version

Show information about the firmware on one or more DIMMs:

# ipmctl show -firmware DimmID | ActiveFWVersion | StagedFWVersion ============================================ 0x0001 | 01.02.00.5310 | N/A 0x0011 | 01.02.00.5310 | N/A 0x0021 | 01.02.00.5310 | N/A 0x0101 | 01.02.00.5310 | N/A 0x0111 | 01.02.00.5310 | N/A 0x0121 | 01.02.00.5310 | N/A 0x1001 | 01.02.00.5310 | N/A 0x1011 | 01.02.00.5310 | N/A 0x1021 | 01.02.00.5310 | N/A 0x1101 | 01.02.00.5310 | N/A 0x1111 | 01.02.00.5310 | N/A 0x1121 | 01.02.00.5310 | N/A

Update Firmware

Update firmware on one or more DIMMs with the following command. To update all DIMMs, simply leave the –dimm tag off so that no DIMM is specified.

# ipmctl load -source (path) -dimm 0x0101

Firmware Debug Log

Dump the firmware debug log to a specified file destination using the following command:

# ipmctl dump -destination (file) -debug -dimm 0x0001

Display CLI version

The ipmctl command line version can easily be seen with the following command:

# ipmctl version Intel(R) Optane(TM) DC Persistent Memory Command Line Interface Version 01.00.00.3402

Conclusion

ipmctl is a powerful tool used for configuring and managing Intel Optane DC persistent memory modules. This article outlines some of the most common ipmctl debugging and configuration commands used for learning more about your Intel Optane DC Persistent Memory Modules. The full ipmctl API can be found on the man pages or by typing ipmctl help at any time.

Resources

Man pages

Quick Start Guide

ipmctl GitHub