Environment
The sample system, a server board S2600ST with a dual-processor Intel® Xeon® Processor configuration, is experiencing temperature discrepancies between the two processors.
CPU0 is operating within normal temperature ranges, around 70°C, while CPU1 is consistently exceeding the threshold temperature, reaching as high as 105°C.
Despite the system allocating the same workload to both processors, CPU1 is clearly experiencing overheating issues.
Upon initial installation, one of the processors was running at an alarming 120 degrees. Notably, both processors are using the same heatsink, suggesting that the issue may not be related to the cooling mechanism.
The system logs have also detected multiple errors, as evidenced by the Sysinfo logs, which may be related to the temperature issues.
Summary:
Error Message on Logs:
Critical Interrupt PCIe Fatal Error (#0x04)
CRITICAL event: PCIe Fatal Error reports a fatal PCI Express Surprise Link Down error. (BUS:0xAE, DEV:0x00, FUN:0x00)
CRITICAL event: P1 Status reports CATERR has occurred.(Unspecified)
CRITICAL event: P2 Status reports CATERR has occurred.(Unspecified)
CRITICAL event: SPS FW Health reports SPS Health event type FW status. PECI over DMI interface error. Recovery via CPU Host reset or platform reset. DMI timeout of PECI request
If your Intel® Xeon® Processor reports a high temperature, it is crucial to take immediate action to prevent potential damage or throttling, which can impact system performance.
Here are some suggested items to help you identify and resolve the issue:
Monitoring and Initial Steps
Maintenance and Adjustments
Environmental and Software Considerations
Updates and Professional Help
Intel Recommendations and Specifications
By following these steps and considering Intel's recommendations, you can effectively manage high temperatures in your Intel Xeon processor, ensuring optimal performance and longevity.
Remember, maintaining optimal operating temperatures is vital for your CPU's health and performance. Regular monitoring and preventive measures can help avoid issues related to overheating.
If the issue persists or you need further assistance, consult Intel's technical support or refer to the processor's specific thermal and mechanical design guide for detailed recommendations, available at Resource & Documentation Center.