Use Row-Wise Data Accesses

OpenCL™ Developer Guide for Intel® Processor Graphics

Download PDF

ID 773088

Date 3/20/2019

Version 2019.4

Public

Visible to Intel only — GUID: GUID-D4A42041-A52F-417E-973D-22BE83B8E17D

View Details

Use Row-Wise Data Accesses

OpenCL™ enables you to submit kernels on one-, two- or three-dimensional index space. Consider using one-dimensional ranges for cache locality and to save index computations.

If a two- or three-dimensional range naturally fits your data dimensions, try to keep work-items scanning along rows, not columns. For example:

__kernel void smooth(const __global float* input, 
                     uint image_width, uint image_height,
                     __global float* output)
{
  int myX = get_global_id(
0);
  int myY = get_global_id(
1);
  int myPixel = myY * image_width + myX;
  float data = input[myPixel];
  …
}

In the example above, the first dimension is the image width and the second is the image height. The following code is less effective:

__kernel void smooth(const __global float* input, 
                     uint image_width, uint image_height,
                     __global float* output)
{
  int myY = get_global_id(
0);
  int myX = get_global_id(
1);
  int myPixel = myY * image_width + myX;
  float data = input[myPixel];
  …
}

In the second code example, the image height is the first dimension and the image width is the second dimension. The resulting column-wise data access is inefficient, since CPU OpenCL™ framework initially iterates over the first dimension.

The same rule applies if each work-item calculates several elements. To optimize performance, make sure work-items read from consecutive memory addresses.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

OpenCL™ Developer Guide for Intel® Processor Graphics

Use Row-Wise Data Accesses