Using Floating Point for Calculations
Intel® Graphics device is much faster for floating-point
add
, sub
, mul
and so on in compare to the int
type.For example, consider the following code that performs calculations in type
int4
:__kernel void amp (__constant uchar4* src, __global uchar4* dst) … uint4 tempSrc = convert_uint4(src[offset]);//Load one RGBA8 pixel //some processing uint4 value = (tempSrc.z + tempSrc.y + tempSrc.x); uint4 tempDst = value + (tempSrc - value) * nSaturation; //store dst[offset] = convert_uchar4(tempDst); }
Below is its
float4
equivalent:__kernel void amp (__constant uchar4* src, __global uchar4* dst) … uint4 tempSrc = convert_uint4(src[offset]);//Load one RGBA8 pixel //some processing float4 value = (tempSrc.z + tempSrc.y + tempSrc.x); float4 tempDst = mad(tempSrc – value, fSaturation, value); //store dst[offset] = convert_uchar4(tempDst); }
Intel® Advanced Vector Extensions (Intel® AVX) support (if available) accelerates floating-point calculations on the modern CPUs, so floating-point data type is preferable for the CPU OpenCL device as well.
Note
The compiler can perform automatic fusion of multiplies and additions. Use compiler flag
-cl-mad-enable
to enable this optimization when compiling for both Intel® Graphics and CPU devices. However, explicit use of the "mad" built-in ensures that it is mapped directly to the efficient instruction.