• 2019 Update 4
  • 03/20/2019
  • Public Content

Optimizing Utilization of Execution Units

When you tune your programs for execution on the Intel® Graphics device to improve performance, be aware of the way your kernels are executed on the hardware:
  • Optimize the number of work-groups
  • Optimize the work-group size
  • Use barriers in kernels wisely
  • Optimize thread utilization
The primary goal of every throughput computing machine is to keep a sufficient number of work-groups active, so that if one is stalled, another can run on its hardware resource.
The primary things to consider:
  • Launch enough work items to keep EU threads busy, keep in mind that compiler may pack up to 32 work items per thread (with SIMD-32).
  • In short/lightweight kernels: use short vector data types and compute multiple pixels to better amortize thread launch cost.

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.