Avoid Extracting Vector Components

OpenCL™ Developer Guide for Intel® Core™ and Intel® Xeon® Processors

Download PDF

ID 773005

Date 10/30/2018

Version 2018

Public

Avoid Extracting Vector Components

Consider the following kernel:

__constant float4 oneVec = (float4)(1.0f, 1.0f, 1.0f, 1.0f);
__kernel __attribute__((vec_type_hint(float4)))
void inverter2(__global float4* input, __global float4* output)
{
  int tid = get_global_id(0);
  output[tid] = oneVec – input[tid];
  output[tid].w = input[tid].w;
  output[tid] = sqrt(output[tid]);
}

For this example of the explicit vector code, the extraction of the w component is very costly. The reason is that the next vector operation forces reloading the same vector from memory. Consider loading a vector once and performing all changes by use of vector operations even for a single component.

In this specific case, two changes are required:

Modify the oneVec, so that its w component is zero, causing only a sign flip in the w component of the input vector.
Use float representation to manually flip the sign bit of the w component back.

As a result, the kernel appears as follows:

__constant float4 oneVec = (float4)(1.0f, 1.0f, 1.0f, 0.0f);
__constant int4 signChanger = (int4)(0, 0, 0, 0x80000000);
__kernel __attribute__((vec_type_hint(float4)))
void inverter3(__global float4* input, __global float4* output)
{
  int tid  = get_global_id(0);
  output[tid] = oneVec – input[tid];
  output[tid] = as_float4(as_int4(output[tid]) ^ signChanger);
  output[tid] = sqrt(output[tid]);
}

At the cost of another constant vector, this implementation performs all the required operations addressing only full vectors. All the computations might also be performed in float8.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

OpenCL™ Developer Guide for Intel® Core™ and Intel® Xeon® Processors

Avoid Extracting Vector Components