Developer Guide for Intel® SDK for OpenCL™ Applications 2017
ID
773042
Date
10/22/2018
Public
A newer version of this document is available. Customers should click here to go to the newest version.
Legal Information
Getting Help and Support
Introducing the Intel® SDK for OpenCL™ Applications
What's New in This Release
Which Version of the Intel® SDK for OpenCL™ Applications Should I Use?
Intel® Code Builder for OpenCL™ API Plug-in for Microsoft Visual Studio*
Intel® Code Builder for OpenCL™ API Plug-in for Eclipse*
Debugging OpenCL™ Kernels on GPU
Intel® SDK for OpenCL™ Applications Standalone Version
OpenCL™ 2.1 Development Environment
Intel® FPGA Emulation Platform for OpenCL™ Getting Started Guide
Troubleshooting Intel® SDK for OpenCL™ Applications Issues
Configuring Microsoft Visual Studio* IDE
Converting an Existing Project into an OpenCL™ Project
OpenCL™ New Project Wizard
Building an OpenCL™ Project
Using OpenCL™ Build Properties
Selecting a Target OpenCL™ Device
Generating and Viewing Assembly Code
Generating and Viewing LLVM Code
Generating Intermediate Program Binaries with Intel® Code Builder for OpenCL™ API Plug-in
Configuring OpenCL™ Build Options
Kernel Overview
The Kernel Overview page provides data that can help you optimize your kernel code.
This section includes the API Calls report, that shows every OpenCL kernel that was launched during the program execution.
Kernels with different name, different global work size, or different local work size are considered as a different kernels and presented in a different rows.
Each row shows:
- The total, minimum, maximum and average kernel execution time.
- EU Active - The normalized sum of all cycles on all cores spent actively executing instructions.
- EU Stalled - The normalized sum of all cycles on all cores spent stalled. At least one thread is loaded, but the core is stalled for some reason.
- GPU Memory Reads/Writes - Reads/Writes from GPU from/to chip uncore (LLC) and memory. Those are all memory accesses that miss in internal GPU L3 cache and are serviced either from uncore or main memory.
- L3 Cache Misses - All read and write misses in GPU L3 cache.
- Untyped Memory Reads/Writes - Memory accesses to buffer created with clCreateBuffer
- Typed Memory Reads/Writes - Memory accesses to typed buffers, e.g., writes to buffers created with clCreateImage. However, reads from images are counted by Sampler accesses and Texture Read.
- SLM Reads/Writes Memory accesses to Shared Local Memory
Click the + button on the left of any kernel name to expand its row. The expanded area presents additional information, including the latency, return value, command queue, context and timing data of each time this kernel was executed during the program execution.