Visible to Intel only — GUID: GUID-997DAB88-2278-49AC-B75A-E2BA8C5418B3
Visible to Intel only — GUID: GUID-997DAB88-2278-49AC-B75A-E2BA8C5418B3
Avoid Extracting Vector Components
Consider the following kernel:
__constant float4 oneVec = (float4)(1.0f, 1.0f, 1.0f, 1.0f); __kernel __attribute__((vec_type_hint(float4))) void inverter2(__global float4* input, __global float4* output) { int tid = get_global_id(0); output[tid] = oneVec – input[tid]; output[tid].w = input[tid].w; output[tid] = sqrt(output[tid]); }
For this example of the explicit vector code, extraction of the w component is very costly. The reason is that the next vector operation forces re-loading the same vector from memory. Consider loading a vector once and performing all changes, even to a single component, by use of vector operations.
In this specific case, two changes are required:
- Modify the oneVec so that its w component is zero, causing only a sign change in the w component of the input vector.
- Use float representation to manually change the sign bit of the w component back.
As a result, the kernel appears as follows:
__constant float4 oneVec = (float4)(1.0f, 1.0f, 1.0f, 0.0f); __constant int4 signChanger = (int4)(0, 0, 0, 0x80000000); __kernel __attribute__((vec_type_hint(float4))) void inverter3(__global float4* input, __global float4* output) { int tid = get_global_id(0); output[tid] = oneVec – input[tid]; output[tid] = as_float4(as_int4(output[tid]) ^ signChanger); output[tid] = sqrt(output[tid]); }
At the cost of another constant vector, this implementation performs all the required operations addressing only full vectors. All the computations can be performed in float8.