Introduction
This article describes how to debug or further configure your Intel® persistent memory devices with ipmctl. ipmctl is an open source tool maintained by Intel and is available for download on GitHub*. With ipmctl, you can select operating modes, create goals, provision capacities, create regions, and much more. The most common ipmctl calls are described in our Quick Start Guide.
This article assumes you have basic knowledge of ipmctl and persistent memory programming concepts. If you’re just getting started, check out the Quick Start Guide first, and come back to this article for debugging assistance.
Discover Configuration
Show Topology
To see available resources, use the show topology command, which displays both the Intel® Optane™ DC persistent memory modules and DDR4 dual in-line memory modules (DIMMs) discovered in the system by enumerating the SMIOS Type 17 tables. For more information on this, please refer to ACPI Specifications v6.0 or the Advanced Configuration Tables section of this article for NFIT table information.
Platform Configuration Details
You can learn many details about your configuration from looking at the platform configuration details (PCD) with the following command:
# ipmctl show –dimm 0x0001 -pcd
The tables that are shown when this command is run are:
- Configuration Header
- Current Config
- Interleave Information
- Identification Information x6
- Conf Input
- Conf Output
- Partition Size Change
- Interleave Information
- Identification Information x6
- Label Storage Area—Current Index
- Label Storage Area—Labels
Advanced Configuration and Power Interface Tables
The following Advanced Configuration and Power Interface (ACPI) tables are available:
- NFIT: The nonvolatile dual in-line memory module (NVDIMM) Firmware Interface Table
- PCAT: The Platform Capabilities Table
- PMTT: The Platform Memory Topology Table
Shortened versions of the output of each command can be seen below:
NFIT
# ipmctl show -system NFIT
---NVDIMM Firmware Interface Table---
Signature: NFIT
Length: 3296 bytes
Revision: 0x1
Checksum: 0x32
OEMID: INTEL
OEMTableID: S2600WF
OEMRevision: 0x2
CreatorID: INTL
CreatorRevision: 0x20091013
BwRegionTablesNum: 0
ControlRegionTablesNum: 12
FlushHintTablesNum: 12
InterleaveTablesNum: 24
NVDIMMRegionTablesNum: 24
SmbiosTablesNum: 0
SpaRangeTablesNum: 3
PlatformCapabilitiesTablesNum: 1
Type: 0x4
Length: 32 bytes
TypeEquals: ControlRegion
ControlRegionDescriptorTableIndex: 0x1
VendorId: 0x8980
DeviceId: 0x4151
Rid: 0x0
SubsystemVendorId: 0x8980
SubsystemDeviceId: 0x97a
SubsystemRid: 0x18
ValidFields: 0x1
ManufacturingLocation: 0xa2
ManufacturingDate: 0x3718
SerialNumber: 0x63110000
RegionFormatInterfaceCode: 0x301
NumberOfBlockControlWindows: 0x0
...
Type: 0x2
Length: 80 bytes
TypeEquals: Interleave
InterleaveStructureIndex: 0x9
NumberOfLinesDescribed: 0x10
LineSize: 0x100
LineOffset 0: 0x0
LineOffset 1: 0x3
LineOffset 2: 0x6
LineOffset 3: 0x9
LineOffset 4: 0xc
LineOffset 5: 0x3f
LineOffset 6: 0x42
LineOffset 7: 0x45
LineOffset 8: 0x48
LineOffset 9: 0x4b
LineOffset 10: 0x7e
LineOffset 11: 0x81
LineOffset 12: 0x84
LineOffset 13: 0x87
LineOffset 14: 0x8a
LineOffset 15: 0x8d
...
Type: 0x1
Length: 48 bytes
TypeEquals: NvDimmRegion
NfitDeviceHandle: 0x0001
NfitDeviceHandle.DimmNumber: 0x1
NfitDeviceHandle.MemChannel: 0x0
NfitDeviceHandle.MemControllerId: 0x0
NfitDeviceHandle.SocketId: 0x0
NfitDeviceHandle.NodeControllerId: 0x0
NvDimmPhysicalId: 0x28
NvDimmRegionalId: 0x0
SpaRangeDescriptionTableIndex: 0x1
NvdimmControlRegionDescriptorTableIndex: 0x1
NvDimmRegionSize: 0x3f00000000
RegionOffset: 0x0
NvDimmPhysicalAddressRegionBase: 0x10000000
InterleaveStructureIndex: 0x1
InterleaveWays: 0x6
NvDimmStateFlags: 0x34
...
Type: 0x0
Length: 56 bytes
TypeEquals: SpaRange
AddressRangeType: 66f0d379-b4f3-4074-ac43-0d3318b78cdb
SpaRangeDescriptionTableIndex: 0x1
Flags: 0x2
ProximityDomain: 0x2
SystemPhysicalAddressRangeBase: 0x3060000000
SystemPhysicalAddressRangeLength: 0x17a00000000
MemoryMappingAttribute: 0x8008
...
---NVDIMM Firmware Interface Table---
Signature: NFIT
Length: 3296 bytes
Revision: 0x1
Checksum: 0x32
OEMID: INTEL
OEMTableID: S2600WF
OEMRevision: 0x2
CreatorID: INTL
CreatorRevision: 0x20091013
BwRegionTablesNum: 0
ControlRegionTablesNum: 12
FlushHintTablesNum: 12
InterleaveTablesNum: 24
NVDIMMRegionTablesNum: 24
SmbiosTablesNum: 0
SpaRangeTablesNum: 3
PlatformCapabilitiesTablesNum: 1
Type: 0x4
Length: 32 bytes
TypeEquals: ControlRegion
ControlRegionDescriptorTableIndex: 0x1
VendorId: 0x8980
DeviceId: 0x4151
Rid: 0x0
SubsystemVendorId: 0x8980
SubsystemDeviceId: 0x97a
SubsystemRid: 0x18
ValidFields: 0x1
ManufacturingLocation: 0xa2
ManufacturingDate: 0x3718
SerialNumber: 0x63110000
RegionFormatInterfaceCode: 0x301
NumberOfBlockControlWindows: 0x0
...
Type: 0x2
Length: 80 bytes
TypeEquals: Interleave
InterleaveStructureIndex: 0x9
NumberOfLinesDescribed: 0x10
LineSize: 0x100
LineOffset 0: 0x0
LineOffset 1: 0x3
LineOffset 2: 0x6
LineOffset 3: 0x9
LineOffset 4: 0xc
LineOffset 5: 0x3f
LineOffset 6: 0x42
LineOffset 7: 0x45
LineOffset 8: 0x48
LineOffset 9: 0x4b
LineOffset 10: 0x7e
LineOffset 11: 0x81
LineOffset 12: 0x84
LineOffset 13: 0x87
LineOffset 14: 0x8a
LineOffset 15: 0x8d
...
Type: 0x1
Length: 48 bytes
TypeEquals: NvDimmRegion
NfitDeviceHandle: 0x0001
NfitDeviceHandle.DimmNumber: 0x1
NfitDeviceHandle.MemChannel: 0x0
NfitDeviceHandle.MemControllerId: 0x0
NfitDeviceHandle.SocketId: 0x0
NfitDeviceHandle.NodeControllerId: 0x0
NvDimmPhysicalId: 0x28
NvDimmRegionalId: 0x0
SpaRangeDescriptionTableIndex: 0x1
NvdimmControlRegionDescriptorTableIndex: 0x1
NvDimmRegionSize: 0x3f00000000
RegionOffset: 0x0
NvDimmPhysicalAddressRegionBase: 0x10000000
InterleaveStructureIndex: 0x1
InterleaveWays: 0x6
NvDimmStateFlags: 0x34
...
Type: 0x0
Length: 56 bytes
TypeEquals: SpaRange
AddressRangeType: 66f0d379-b4f3-4074-ac43-0d3318b78cdb
SpaRangeDescriptionTableIndex: 0x1
Flags: 0x2
ProximityDomain: 0x2
SystemPhysicalAddressRangeBase: 0x3060000000
SystemPhysicalAddressRangeLength: 0x17a00000000
MemoryMappingAttribute: 0x8008
...
PCAT
# ipmctl show -system PCAT
---Platform Configurations Attributes Table---
Signature: PCAT
Length: 136 bytes
Revision: 0x2
Checksum: 0xae
OEMID: INTEL
OEMTableID: S2600WF
OEMRevision: 0x2
CreatorID: INTL
CreatorRevision: 0x20091013
Type: 0x0
Length: 16 bytes
TypeEquals: PlatformCapabilityInfoTable
IntelNVDIMMManagementSWConfigInputSupport: 0x1
MemoryModeCapabilities: 0x27
CurrentMemoryMode: 0x14
PersistentMemoryRASCapability: 0x0
Type: 0x1
Length: 16 bytes
TypeEquals: MemoryInterleaveCapabilityTable
MemoryMode: 0x3
InterleaveAlignmentSize: 0x1e
NumberOfInterleaveFormatsSupported: 0x1
InterleaveFormatSupported(0): 0x801f4040
Type: 0x6
Length: 32 bytes
SocketSkuInfoTable
SocketID: 0x0
MappedMemorySizeLimit: 4947802324992
TotalMemorySizeMappedToSpa: 1828582326272
CachingMemorySize: 0
...
PMTT
# ipmctl show -system PMTT
---Platform Memory Topology Table---
Signature: PMTT
Length: 1336 bytes
Revision: 0x1
Checksum: 0x9f
OEMID: INTEL
OEMTableID: S2600WF
OEMRevision: 0x1
CreatorID: INTL
CreatorRevision: 0x20091013
--------------------------Socket--------------------------
Type: 0
Reserved1: 0
Length: 324
Flags:3
Reserved2:0
SocketId: 0
Reserved3: 0
-------------------iMC-------------------
Type: 1
Reserved1: 0
Length: 156
Flags:2
Reserved2:0
ReadLatency: 0
WriteLatency: 0
ReadBW: 0
WriteBW:0
OptimalAccessUnit:0
OptimalAccessAlignment:0
Reserved3:0
NoOfProximityDomains:0
ProximityDomainArray:1
----MODULE----
Type: 2
Reserved1: 0
Length: 20
Flags:2
Reserved2:0
PhysicalComponentId: 0
Reserved3: 0
SizeOfDimm: 32768
----MODULE----
...
Health Monitoring
Show DIMM Information
The show -dimm command displays the Intel Optane DC persistent memory modules discovered in the system and verifies that software can communicate with them. Among other information, this command outputs each DIMM’s ID, capacity, health state, and firmware version:
# ipmctl show –dimm
Sensor Health States
ipmctl has the ability to see health states of sensors located on each persistent memory module. The sensors available are:
|
|
Use the following command to see sensor health for a specific module. Health values for all modules can be seen by not specifying a DimmID.
# ipmctl show -sensor -dimm 0x0001
DimmID | Type | CurrentValue | CurrentState
====================================================================
0x0001 | Health | Healthy | Normal
0x0001 | MediaTemperature | 33C | Normal
0x0001 | ControllerTemperature | 35C | Normal
0x0001 | PercentageRemaining | 100% | Normal
0x0001 | LatchedDirtyShutdownCount | 2 | Normal
0x0001 | PowerOnTime | 12944539s | Normal
0x0001 | UpTime | 2728s | Normal
0x0001 | PowerCycles | 80 | Normal
0x0001 | FwErrorCount | 8 | Normal
0x0001 | UnlatchedDirtyShutdownCount | 34 | Normal
Percentage Life Remaining
The remaining life of a persistent memory module is based on the number of reads/writes left in its lifetime. Use the following command to see the percentage of life remaining on each module. In the example below, you can see that DIMM 0x0101 has 45 percent life remaining, and the rest have 100 percent.
# ipmctl show -sensor PercentageRemaining
DimmID | Type | CurrentValue | CurrentState
============================================================
0x0001 | PercentageRemaining | 100% | Normal
0x0011 | PercentageRemaining | 100% | Normal
0x0021 | PercentageRemaining | 100% | Normal
0x0101 | PercentageRemaining | 45% | Normal
0x0111 | PercentageRemaining | 100% | Normal
0x0121 | PercentageRemaining | 100% | Normal
0x1001 | PercentageRemaining | 100% | Normal
0x1011 | PercentageRemaining | 100% | Normal
0x1021 | PercentageRemaining | 100% | Normal
0x1101 | PercentageRemaining | 100% | Normal
0x1111 | PercentageRemaining | 100% | Normal
0x1121 | PercentageRemaining | 100% | Normal
Similar to how in this call we can see the PercentageRemaining sensor value for each DIMM available, you could replace PercentageRemaining with any of the other sensor types and see their values that way.
On DIMM 0x0101, I injected an error to specify the PercentageRemaining to be 45 percent. You can read more about error injection in the Debugging section.
Change Sensor Thresholds
Each sensor has a set threshold that specifies the Normal range. On your modules, you can set your own threshold, called the NonCriticalThreshold. For example, if you were to set the MediaTemperature NonCriticalThreshold to a lower number than the Normal range, you would get a warning if the temperature went above that number specified. Each sensor’s threshold limit can be set with the following command:
# ipmctl set -sensor MediaTemperature -dimm 0x0001 NonCriticalThreshold=51 EnabledState=1
Modifying settings on DIMM (0x0001).
Do you want to continue? [y/n] y
Modify media temperature settings on DIMM 0x0001: Success
Performance
Show Sensor Performance Per DIMM
Performance indicators can be seen either per DIMM, per indicator, or all of the above as a big dump. To see all the performance indicators of a single DIMM, use this command:
# ipmctl show -dimm 0x0001 -performance
---DimmID=0x0001---
MediaReads=0x0000000000000000000000011dd1d084
MediaWrites=0x0000000000000000000000001e877cc0
ReadRequests=0x000000000000000000000000000959b7
WriteRequests=0x0000000000000000000000000000974f
TotalMediaReads=0x00000000000000000000008c4c411278
TotalMediaWrites=0x0000000000000000000000523e0292f8
TotalReadRequests=0x000000000000000000000006b0fd3128
TotalWriteRequests=0x000000000000000000000007dd265020
Here is the full list of performance indicators:
- DimmID: The Intel Optane DC persistent memory module identifier.
- MediaReads: Number of 64-byte reads from media on the Intel Optane DC persistent memory module since the last alternating current (AC) cycle.
- MediaWrites: Number of 64-byte writes to media on the Intel Optane DC persistent memory module since the last AC cycle.
- ReadRequests: Number of DDRT read transactions that the Intel Optane DC persistent memory module has serviced since the last AC cycle.
- WriteRequests: Number of DDRT write transactions that the Intel Optane DC persistent memory module has serviced since the last AC cycle.
- TotalMediaReads: Number of 64-byte reads from the media on the Intel Optane DC persistent memory module over its lifetime.
- TotalMediaWrites: Number of 64-byte writes to media on the Intel Optane DC persistent memory module over its lifetime.
- TotalReadRequest: Number of DDRT read transactions that the Intel Optane DC persistent memory module has serviced over its lifetime.
- TotalWriteRequest: Number of DDRT write transactions that the Intel Optane DC persistent memory module has serviced over its lifetime.
Debugging
Discover Errors
To debug errors on your modules, the following commands will come in handy. Seeing the error log can easily be done with show error log command.
# ipmctl show -dimm 0x1111 -error Thermal Level=High
No errors found on DIMM 0x1111
Show error executed successfully
If an error is present, the output will be similar to:
# ipmctl show -dimm 0x0001 -error Media Level=High
Media Error occurred on DIMM 0x0001:
System Timestamp : Thu Jan 01 00:45:32 UTC 1998
DPA : 0x00012880
PDA : 0x00000001
Range : 4B
Error Type : 4 - Locked/Illegal Access
Error Flags : DPA Valid
Transaction Type : 10 - CSR Read
Sequence Number : 20
The –error option can be either Thermal or Media, with severity levels of either High or Low.
Inject an Error
For testing purposes, you may want to inject a mock error onto your persistent memory modules. Injectable errors include: Temperature, Poison, PoisonType, PackageSparing, PercentageRemaining, FatalMediaError, and DirtyShutdown. It is important to note that this command is only available when error injection is enabled on the Intel Optane DC persistent memory module in the BIOS. Examples of each of these can be seen in the ipmctl-inject-error man pages.
To change the PercentageRemaining:
# ipmctl set -dimm 0x1001 PercentageRemaining=84
Trigger a percentage remaining on DIMM 0x1001: Success
To change the Temperature (Celsius) variable:
# ipmctl set -dimm 0x1111 Temperature=12
Set temperature on DIMM 0x1111: Success
To clear injected errors, specify which injection property (Temperature, Poison, PoisonType, PackageSparing, PercentageRemaining, FatalMediaError, or DirtyShutdown), and add Clear=1. For example, the first call clears all DIMMs of any injected Temperature changes:
# ipmctl set -dimm Clear=1 Temperature=1
This call clears only DIMM 0x1001 of the injected PercentageRemaining change:
# ipmctl set -dimm 0x1001 PercentageRemaining=10 Clear=1
Diagnose Further Problems
Use the start diagnostic command to see a quick health overview of your persistent memory modules. After the –diagnostic flag, you can specify any of the following flags. Or, if left blank, all will display.
- Quick - This test verifies that the Intel Optane DC persistent memory module host mailbox is accessible and that basic health indicators can be read and are currently reporting acceptable values.
- Config - This test verifies that the BIOS platform configuration matches the installed hardware, and the platform configuration conforms to best-known practices.
- Security - This test verifies that all Intel Optane DC persistent memory modules have a consistent security state. It is a best practice to enable security on all Intel Optane DC persistent memory modules, rather than just some.
- FW - This test verifies that all Intel Optane DC persistent memory modules of a given model have consistent FW installed and other FW modifiable attributes are set in accordance with best practices.
Note that the test does not have a means of verifying that the installed FW is the optimal version for a given Intel Optane DC persistent memory module model, just that it has been consistently applied across the system.
For example, the following command shows all the diagnostic flags for DIMM 0x0001:
# ipmctl start -diagnostic -dimm 0x0001
---Diagnostic=Quick---
State=Ok
Message=The quick health check detected that the firmware on DIMM 0x0001 experienced a dirty shutdown before its latest restart.
The quick health check succeeded.
---Diagnostic=Config---
State=Ok
Message=The platform configuration check succeeded.
---Diagnostic=Security---
State=Ok
Message=The security check succeeded.
---Diagnostic=FW---
State=Warning
Message=The firmware consistency and settings check detected that DIMM 0x0001 is greater than system time by 21 seconds.
The firmware consistency and settings check detected that DIMM 0x0011 is greater than system time by 22 seconds.
The firmware consistency and settings check detected that DIMM 0x0021 is greater than system time by 23 seconds.
The firmware consistency and settings check detected that DIMM 0x0101 is reporting a percentage remaining of 45% which is below the recommended threshold 50%
The firmware consistency and settings check detected that DIMM 0x0101 is greater than system time by 22 seconds.
The firmware consistency and settings check detected that DIMM 0x0111 is greater than system time by 22 seconds.
The firmware consistency and settings check detected that DIMM 0x0121 is greater than system time by 22 seconds.
The firmware consistency and settings check detected that DIMM 0x1001 is greater than system time by 22 seconds.
The firmware consistency and settings check detected that DIMM 0x1011 is greater than system time by 22 seconds.
The firmware consistency and settings check detected that DIMM 0x1021 is greater than system time by 22 seconds.
The firmware consistency and settings check detected that DIMM 0x1101 is greater than system time by 22 seconds.
The firmware consistency and settings check detected that DIMM 0x1111 is greater than system time by 22 seconds.
The firmware consistency and settings check detected that DIMM 0x1121 is greater than system time by 23 seconds.
Security
Firmware Version
Show information about the firmware on one or more DIMMs:
# ipmctl show -firmware
DimmID | ActiveFWVersion | StagedFWVersion
============================================
0x0001 | 01.02.00.5310 | N/A
0x0011 | 01.02.00.5310 | N/A
0x0021 | 01.02.00.5310 | N/A
0x0101 | 01.02.00.5310 | N/A
0x0111 | 01.02.00.5310 | N/A
0x0121 | 01.02.00.5310 | N/A
0x1001 | 01.02.00.5310 | N/A
0x1011 | 01.02.00.5310 | N/A
0x1021 | 01.02.00.5310 | N/A
0x1101 | 01.02.00.5310 | N/A
0x1111 | 01.02.00.5310 | N/A
0x1121 | 01.02.00.5310 | N/A
Update Firmware
Update firmware on one or more DIMMs with the following command. To update all DIMMs, simply leave the –dimm tag off so that no DIMM is specified.
# ipmctl load -source (path) -dimm 0x0101
Firmware Debug Log
Dump the firmware debug log to a specified file destination using the following command:
# ipmctl dump -destination (file) -debug -dimm 0x0001
Display CLI version
The ipmctl command line version can easily be seen with the following command:
# ipmctl version
Intel(R) Optane(TM) DC Persistent Memory Command Line Interface Version 01.00.00.3402
Conclusion
ipmctl is a powerful tool used for configuring and managing Intel Optane DC persistent memory modules. This article outlines some of the most common ipmctl debugging and configuration commands used for learning more about your Intel Optane DC Persistent Memory Modules. The full ipmctl API can be found on the man pages or by typing ipmctl help at any time.