Considering native_ and half_ Versions of Math Built-Ins
OpenCL™ API offers two basic ways to trade precision for speed:
- native_*andhalf_*math built-ins, which have lower precision, but are faster than their un-prefixed variants
- Compiler optimization options that enable optimizations for floating-point arithmetic for the whole OpenCL program (for example, the-cl-fast-relaxed-math flag).
For the list of other compiler options and their description please refer to the
Intel® Code Builder for OpenCL™ API - User Manual
. In general, while the -cl-fast-relaxed-math
flag is a quick way to get potentially large performance gains for kernels with many math operations, it does not permit fine control of numeric accuracy. Consider experimenting with native_*
equivalents separately for each specific case, keeping track of the resulting accuracy.The
native_
versions of math built-ins are generally supported in hardware and run substantially faster, while offering lower accuracy. Use native trigonometry and transcendental functions, such as sin
, cos
, exp
or log
, when performance is more important than precision.The list of functions that have optimized versions support is provided in "Working with cl-fast-relaxed-math Flag" section of the
OpenCL Code Builder - User’s Guide
.See Also
OpenCL™ Build and Linking Options chapter of the Intel® Code Builder for OpenCL™ API - User Manual