OpenCL™ Developer Guide for Intel® Core™ and Intel® Xeon® Processors
ID
773005
Date
10/30/2018
Public
Legal Information
Getting Help and Support
Introduction
Check-list for OpenCL™ Optimizations
Tips and Tricks for Kernel Development
Application-Level Optimizations
Debugging OpenCL™ Kernels on Linux* OS
Performance Debugging with Intel® SDK for OpenCL™ Applications
Coding for the Intel® Architecture Processors
Why Optimizing Kernels Is Important?
Avoid Spurious Operations in Kernels
Avoid Handling Edge Conditions in Kernels
Use the Preprocessor for Constants
Prefer (32-bit) Signed Integer Data Types
Prefer Row-Wise Data Accesses
Use Built-In Functions
Avoid Extracting Vector Components
Task-Parallel Programming Model Hints
Common Mistakes in OpenCL™ Applications
Introduction for OpenCL™ Coding on Intel® Architecture Processors
Vectorization Basics for Intel® Architecture Processors
Vectorization: SIMD Processing Within a Work Group
Benefitting from Implicit Vectorization
Vectorizer Knobs
Targeting a Different CPU Architecture
Using Vector Data Types
Writing Kernels to Directly Target the Intel® Architecture Processors
Work-Group Size Considerations
Threading: Achieving Work-Group Level Parallelism
Efficient Data Layout
Using the Blocking Technique
Intel® Turbo Boost Technology Support
Global Memory Size
Basic Concepts
The list below explains basic OpenCL™ concepts used in this document. The concepts are based on definitions in Khronos* OpenCL specification.
- Intel® CPU Runtime for OpenCL™ Applications enables OpenCL software technology support on Intel® Core™ and Intel® Xeon® Processors.
NOTE:Intel® CPU Runtime for OpenCL™ Applications was previously known as Intel® SDK for OpenCL™ - CPU only Runtime Package.
- Intel® SDK for OpenCL™ Applications is a software development tool that enables developing, debugging, and analyzing OpenCL applications targeting the Intel® Architecture processors with the Intel® Processor Graphics.
- An OpenCL execution model is a principle of a kernel execution on target OpenCL device, defined by the host.
- An OpenCL standard is the standard for parallel programming of modern processors.
- A compute unit is composed of one or more processing elements and local memory. It may also include dedicated texture filter units that can be accessed by its processing elements. An OpenCL device has one or more compute units. A work-group executes on a single compute unit.
- A device is a collection of compute units.
- A command-queue is used to queue commands to a device. Examples of commands include executing kernels, or reading and writing memory objects.
- A kernel is a function declared in a program and executed on an OpenCL device. A kernel is identified by the __kernel or kernel qualifier applied to any function defined in a program.
- A work-item is one of a collection of parallel executions of a kernel invoked on a device by a command. A work-item is executed by one or more processing elements as a part of a work-group executing on a compute unit. A work-item is distinguished from other executed work-items within the collection by its global ID and local ID.
- A work-group is a collection of related work-items that execute on a single compute unit. The work-items in the group execute the same kernel and share local memory and work-group barriers. Each work-group has the following properties:
- Data sharing between work-items by use of local memory
- Synchronization between work-items by use of barriers and memory fences
Special work-group level built-in functions, such as work_group_copy
A multi-core CPU or multiple CPUs (in a multi-socket machine) constitute a single CPU OpenCL device. Separate cores are compute units. For information on controlling the affinity by compute units using the device fission feature, refer to the OpenCL™ Device Fission for CPU Performance article.
- A task-parallel programming model is the OpenCL programming model that runs a single work-group with a single work item.
See Also
OpenCL™ 1.2 Specification at https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf
Developer Guide for Intel® SDK for OpenCL™ Applications