When Processor Counter Monitor (PCM) is generating csv files as output, short names are used as column headers. This helps to keep the table width at a manageable size if the data is loaded in a spreadsheet program. However, it makes it rather hard to guess what exactly is hiding behind these abbreviations. Since I'm getting a lot of questions on how to interpret these column names, I've put together a decoder ring:
The following metrics are available on all levels:
| Field | Explanation | Example |
|---|---|---|
| Date | Day-Month-Year | 5/2/2014 |
| Time | Time of day | 13:38:04 |
| EXEC | Instructions per nominal CPU cycle, i.e. in respect to the CPU frequency ignoring turbo and power saving | 0.182 |
| IPC | Instructions per cycle. This measures how effectively you are using the core. | 0.159 |
| FREQ | Frequency relative to nominal CPU frequency (“clockticks”/”invariant timer ticks”) | 1.143 |
| AFREQ | Frequency relative to nominal CPU frequency excluding the time when the CPU is sleeping | 1.143 |
| CFREQ | Core frequency in GHz | 3.87 |
| L3MISS | L3 cache line misses in millions | 182.879 |
| L2MISS | L2 cache line misses in millions | 356.3 |
| L3HIT | L3 Cache hit ratio (hits/reference) | 0.487 |
| L2HIT | L2 Cache hit ratio (hits/reference) | 0.233 |
| L3MPI | Average number of L3 cache misses per instruction | 0.0044 |
| L2MPI | Average number of L2 cache misses per instruction | 0.008 |
| Frontend_bound(%) | Top-down Microarchitecture Analysis: percentage of cycles stalled due to frontend resources (fetch latency and fetch bandwidth) | 33 |
| Bad_Speculation(%) | Top-down Microarchitecture Analysis: percentage of cycles wasted due to incorrect speculations (branch misprediction and machine clears) | 2 |
| Backend_Bound(%) | Top-down Microarchitecture Analysis: percentage of cycles stalled due to backend resources (memory bound and core bound) | 58 |
| Retiring(%) | Top-down Microarchitecture Analysis: percentage of cycles with instructions retired (light and heavy) | 4 |
| Fetch_latency_bound(%) | Top-down Microarchitecture Analysis: percentage of cycles with instruction fetch starvation, e.g. icache misses or i-TLB misses | 31 |
| Fetch_bandwidth_bound(%) | Top-down Microarchitecture Analysis: percentage of cycles with inefficiency in the instruction decoders | 2 |
| Branch_misprediction_bound(%) | Top-down Microarchitecture Analysis: percentage of cycles stalled due to mispredicted branches | 2 |
| Machine_clears_bound(%) | Top-down Microarchitecture Analysis: percentage of cycles stalled due to machine clears, e.g. due to memory ordering or self-modifying code | 0 |
| Buffer_Cache_Memory_bound(%) | Top-down Microarchitecture Analysis: percentage of cycles stalled by memory buffer, cache, or memory access | 29 |
| Core_bound(%) | Top-down Microarchitecture Analysis: percentage of cycles with pressure on execution units | 29 |
| Heavy_operations_bound(%) | Top-down Microarchitecture Analysis: percentage of cycles retiring heavy instructions (from microcode sequencer) | 1 |
| Light_operations_bound(%) | Top-down Microarchitecture Analysis: percentage of cycles retiring light instructions | 2 |
Core Residency C0res% C1res% C3res% C6res% C7res% |
Core C-state residency for c states 0, 1, 3, 6, and 7 | 16.02 |
The following metrics are only available on socket and system level:
| Field | Explanation | Example |
|---|---|---|
| READ | DRAM Memory read traffic on this socket in GB | 23.108 |
| WRITE | DRAM Memory read traffic on this socket in GB | 10.782 |
| LOCAL | Ratio of local memory requests to memory controller in % | 53 |
| PMM_RD | Optane PMem Memory read traffic on this socket in GB | 2.839 |
| PMM_WR | Optane PMem Memory write traffic on this socket in GB | 1.332 |
| HBM_READ | HBM read traffic in GB | 3.21 |
| HBM_WRITE | HBM write traffic in GB | 2.31 |
Package Residency C0res% C2res% C6res% |
Package C-state residency for c states 0, 2, 6 | 51.28 |
| LLCRDMISSLAT | average latency of last level cache miss for reads and prefetches in ns | 117 |
| UncFREQ | Uncore frequency in GHz | 1.82 |
| Proc Energy (Joules) | The energy consumed by the processor in Joules. Divide by the time to get the power consumption in watt | 122.457 |
| DRAM Energy (Joules) | The energy consumed by the DRAM attached to this socket in Joules. Divide by the time to get the power consumption in watt | 115.747 |
The following metrics are only available on a socket level:
| Field | Explanation | Example |
|---|---|---|
| TEMP | Thermal headroom in Kelvin (max design temperature – current temperature) | 32 |
| IO | Memory traffic due to IO requests to memory controller in GBytes | 1.3 |
| IA | Memory traffic due to IA requests to memory controller in GBytes | 2.1 |
| GT | Memory traffic due to GT requests to memory controller in GBytes | 2.4 |
The following metrics are only available on a system level:
| Field | Explaination | Example |
|---|---|---|
| INST | Number of instructions retired | 119706 |
| ACYC | Number of clockticks, This takes turbo and power saving modes into account. | 750640.8 |
| TIME(ticks) | Number of invariant clockticks. This is invariant to turbo and power saving modes. | 2817.883 |
| PhysIPC | Instructions per cycle (IPC) multiplied by number of threads per core. See section "Core Cycles-per-Instruction (CPI) and Thread CPI" in Performance Insights to Intel® Hyper-Threading Technology for some background information. | 0.319 |
| PhysIPC% | Instructions per cycle (IPC) multiplied by number of threads per core relative to maximum IPC | 7.974 |
| INSTnom | Instructions per nominal cycle multiplied by number of threads per core | 0.365 |
| INSTnom% | Instructions per nominal cycle multiplied by number of threads per core relative to maximum IPC. The maximum IPC is 2 for Atom and 4 for all other supported processors. | 9.113 |
| TotalUPIin | UPI data traffic estimation (data traffic coming to CPU/socket through UPI links) in MB (1024*1024) | 21937.96 |
| UPItoMC | Ratio of UPI traffic to memory traffic | 0.632 |
| TotalUPIout | UPI traffic estimation (data and non-data traffic outgoing from CPU/socket through UPI links) in MB (1024*1024) | 38443.3 |
| System Energy (Joules) | The energy consumed by the system in Joules. Divide by the time to get the power consumption in watt | 626.287 |
The following metrics are only available on a core and socket level:
Please also note that PCM reports absolute values for the measured time interval. For example, if you use a time interval of 5 seconds, memory traffic or instructions retired are reported for the whole 5 seconds. Only if you are executing PCM with 1 sec time interval, you will get memory traffic in GB/s.