Runtime Performance Optimization Blueprint: Intel® Architecture Optimization with Large Code Pages

Published: 11/01/2019  

Last Updated: 03/19/2020

By Deborah Taylor, Suresh Srinivas, Uttam C Pawar, Ayodunni Aribuki, and Gabriel Schulhof

Abstract

This document is a Runtime Optimization Blueprint illustrating how the performance of runtimes can be improved by using large code pages. The intended audience is runtime implementers, customers, and providers deploying runtimes at scale. In the Overview section, we introduce the problem that runtimes have with high Instruction Translation Lookaside Buffer (ITLB) miss stalls (on average 7% of the CPU cycles are stalled across seven commonly used runtimes). In the Diagnosis section, we illustrate how to diagnose this problem using the Performance Monitoring Unit (PMU) on Intel® architecture processors, counters, and sample tools. In the Solution section, we provide an Intel reference implementation as well as other approaches to solve this problem. The Solution Integration section describes how to integrate the reference implementation in runtimes. The Case Studies section details how this optimization improves performance and reduces ITLB misses (up to 50%) in three applications in three environments. The last section summarizes the blueprint and provides a call to action for runtime developers/implementers.

Download the attached PDF below.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.