• 2019 Update 4
  • 03/20/2019
  • Public Content
Contents

Optimizing Utilization of Execution Units

When you tune your programs for execution on the Intel® Graphics device to improve performance, be aware of the way your kernels are executed on the hardware:
  • Optimize the number of work-groups
  • Optimize the work-group size
  • Use barriers in kernels wisely
  • Optimize thread utilization
The primary goal of every throughput computing machine is to keep a sufficient number of work-groups active, so that if one is stalled, another can run on its hardware resource.
The primary things to consider:
  • Launch enough work items to keep EU threads busy, keep in mind that compiler may pack up to 32 work items per thread (with SIMD-32).
  • In short/lightweight kernels: use short vector data types and compute multiple pixels to better amortize thread launch cost.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.