Article ID: 000094111 Content Type: Troubleshooting Last Reviewed: 05/30/2023

Troubleshooting Threshold Sensors Exceeding Upper Threshold on Intel® Server

BUILT IN - ARTICLE INTRO SECOND COMPONENT
Summary

Steps to isolate or resolve threshold sensors exceeding upper threshold on Intel® Server

Description

One or more threshold sensors has exceeded its warning or critical high threshold

Resolution

Follow these steps:

Sensor NameDescriptionResolution
  • P12V AUX
  • P12V PSU
  • P3V3
  • P5V

System voltage has exceeded its normal operating range. Typically, this issue is caused by a failure in one of the power supplies but may also be caused by a short circuit on baseboard, SATA drives, fans, or PCIe cards.

Try these steps to isolate the source of the voltage excursion event:

  • Ensure all cables are connected correctly.
  • Check connections on the fans and SATA drives.
  • Remove and inspect all potentially impacted components. Place components back in one at a time to attempt to isolate failure.
  • Inspect for contamination in connectors (DIMMs, PCIe*).
  • If the issue remains, attempt to re-seat power supplies. Replace power supplies if re-seating does not solve problem.
  • P1V8 PCH
  • P105 PCH AUX
  • PVNN PCH AUX

Typically a baseboard failure, one or more of the voltages on the baseboard is outside of its normal operating range.

In order, try these steps to isolate the source of the Voltage Excursion Event:

  • Remove all but minimum components for operation and check sensor.
  • If error remains, replace baseboard.
  • P3VBAT

System has experienced a Baseboard CMOS Battery (BB +3.3V Vbat) voltage excursion, typically caused by contamination on the surface of the CMOS battery.

In order, try these steps:

  • Remove battery, wipe with alcohol to remove potential contamination, reinstall battery, and re-check batter voltage is 2.95 V or higher.
  • Replace the CMOS battery. Any battery of type CR2032 can be used.
  • If error remains, replace the baseboard.
  • PSU Input Power
  • PSU Out Current

Indicated power supply is drawing too much power from source. Possible reasons:

  • Redundant power supplies may not be powered.
  • Power supplies may be underpowered for actual system load.
  • Power supplies are operating on low line (90-120 V) versus high line (200-240 V),
  • Short circuit present in server.

Perform the following steps:

  • If power supply LEDs are solid green that indicates a functioning but potentially underrated power supply for system load.
  • Verify the power budget is within the specified range using the Power Calculator.
  • If power supply LEDs are flashing green reseat power supply and verify AC power cord connection.
  • If power supply LEDs are solid amber that indicates a power supply fault - replace power supply.
    • If power supply LEDs are flashing amber then verify AC power cord connection.
  • PVCCD HV CPU
  • PVCCFA EHV CPU
  • PVCCINFAON CPU
  • PVCCIN CPU

Processor voltage has exceeded its normal operating range. Typically, this issue is caused by a failure in one of the power supplies but may also be caused by a short circuit on baseboard, SATA drives, fans, or PCIe cards.

If the fault is asserted then de-asserted immediately after take no action but continue to monitor.

Otherwise:

  • Ensure the processor is seated properly and secured with the correct torque value.
  • Cross test the processors. If the issue remains with the processor socket, replace the motherboard, else replace the processor.
  • TCC CPU
  • CPU VR Temp
  • DTS CPU
  • Margin CPU
  • DIMM Mgn
  • Riser Temp
  • Exit Air Temp
  • Front Panel Temp
  • BB M2 Temp
  • BB OCP Temp
  • BB P0/P1 VR Temp
  • BB Riser2 Temp
  • HSBP Temp
  • PSU Temperature

The temperature sensor has exceeded its normal operating range.

  • Check for clean and unobstructed airflow into and out of the chassis.
  • Ensure there are no fan failures.
  • Ensure the air/liquid used to cool the system is within the thermal specification for the system.
  • Ensure that cables are not obstructing the air flow and that they are routed correctly inside the chassis. For details on cable routing check the Configuration Guide.
  • NVMe Aggr

The NVMe temperature sensor has exceeded its normal operating range.

  • Check for clean and unobstructed airflow into and out of the chassis.
  • Ensure there are no fan failures.
  • Assess if neighboring NVMe drives have also gone high to help distinguish between workload or mechanical event sources.
  • If the system supports it, ensure that the Air-Ducts, Drive Blanks are installed in the empty slots.