Host-Side Analysis Optimization Tips

Developer Guide for Intel® SDK for OpenCL™ Applications 2017

Download PDF

ID 773042

Date 10/22/2018

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Host-Side Analysis Optimization Tips

While you run the host-side performance analysis, the Code Analyzer identifies inefficient use of the OpenCL API. When the analysis is done, a TIPS screen appears, showing all the detected issues, each issue also has a short description.

Click a specific tip to open a related report and highlight the data within the report which is relevant to this tip. In addition a popup window appears, with a recommendation how to fix the reported issue:

The following table summarizes the recommendations that are reported from the Tips.

int Title	Description	Recommendation
Inefficient `clCreateBuffer`	The host program includes a call to `clCreateBuffer`, where flags includes `CL_MEM_COPY_HOST_PTR`.	There are two ways to ensure zero-copy path on memory objects mapping. For best results, allocate memory with `CL_MEM_ALLOC_HOST_PTR`, this method ensures that the memory is efficiently mirrored on the host. Another way is to allocate properly aligned and sized memory yourself and share the pointer with the OpenCL framework by using the `CL_MEM_USE_HOST_PTR` flag.
`clCreateBuffer` call where `host_ptr` is not 4K aligned.	The host program includes a call to `clCreateBuffer` where `host_ptr` is not 4K aligned.	For best results, align memory address to host memory page (4K bytes)
`clCreateBuffer` call where size is not a multiple of 64 bytes	The host program includes call to `clCreateBuffer` where size is not a multiple of 64 bytes.	For best results, make sure that the amount of memory you allocate and the size of the corresponding OpenCL buffer is a multiple of the cache line sizes (64 bytes).
Redundant calls to `clBuildProgram`	The host program includes several calls to `clBuildProgram` with the same arguments.	When possible, call `clGetProgramInfo` to retrieve binaries generated from calls to `clCreateProgramWithSource` and `clBuildProgram`.
Redundant calls to `clCompileProgram`	The host program includes several calls to `clCompileProgram` with the same arguments.	When possible, call `clGetProgramInfo` to retrieve previously compiled binaries.
Redundant calls to `clCreateContextFromType`.	The host program includes calls to `clCreateContextFromType` with the same arguments.	Consider using the same OpenCL context instead of recreating it.
Redundant calls to `clCreateContext`.	The host program includes calls to `clCreateContext` with the same arguments.	Consider using the same OpenCL context instead of recreating it.
Redundant calls to `clCreateCommandQueue`	The host program includes several calls to `clCreateCommandQueue` that refer to the same device	Consider using the same command-queue to access the device.
Redundant calls to `clCreateCommandQueueWithProperties`	The host program includes several calls to `clCreateCommandQueueWithProperties` that refer to the same device.	Consider using the same command-queue to access the device.
`clEnqueueReadBuffer` call	The host program includes several calls to `clEnqueueReadBuffer`	When possible, use `clEnqueueMapBuffer` and `clEnqueueUnmapMemObject` instead of calls to `clEnqueueReadBuffer` or `clEnqueueWriteBuffer`.
`clEnqueueWriteBuffer` call	The host program includes several calls to `clEnqueueWriteBuffer`	When possible, use `clEnqueueMapBuffer` and `clEnqueueUnmapMemObject` instead of calls to `clEnqueueReadBuffer` or `clEnqueueWriteBuffer`.
`clEnqueueReadImage` call	The host program includes several calls to `clEnqueueReadImage`	When possible, use `clEnqueueMapImage` and `clEnqueueUnmapMemObject` instead of calls to `clEnqueueReadImage` or `clEnqueueWriteImage`.
`clEnqueueWriteImage` call	The host program includes several calls to `clEnqueueWriteImage`	When possible, use `clEnqueueMapImage` and `clEnqueueUnmapMemObject` instead of calls to `clEnqueueReadImage` or `clEnqueueWriteImage`.
`clEnqueueReadBufferRect` call	The host program includes several calls to `clEnqueueReadBufferRect`	When possible, use `clEnqueueMapBuffer` and `clEnqueueUnmapMemObject` instead of calls to `clEnqueueReadBufferRect` or `clEnqueueWriteBufferRect`.
`clEnqueueWriteBufferRect` call	The host program includes several calls to `clEnqueueWriteBufferRect`	When possible, use `clEnqueueMapBuffer` and `clEnqueueUnmapMemObject` instead of calls to `clEnqueueReadBufferRect` or `clEnqueueWriteBufferRect`.
The work-group dimensions are defined as "column" work-group	The host program includes a call to `clEnqueueNDRange` where the work-group dimensions are defined as "column" work-group.	When reading from memory, best to reorganize the work-group to read in lines instead of columns.
Performance Information	Kernel register pressure is too high, spill fills will be generated. Additional surface needs to be allocated.	Consider simplifying your kernel.
Performance Information	Kernel private memory usage is too high and exhaust register space. Additional surface needs to be allocated.	Consider reducing the amount of private memory used, avoid using private memory arrays.
Performance Information	Local workgroup sizes selected for this workload may not be optimal	consider using a different local workgroup size,
Performance Information	Not aligned surface detected. Driver needs to disable L3 caching.
Performance Information	Kernel submission requires coherency with CPU, this may impact performance.
Performance Information	Null local workgroup size detected, Following sizes will be used for execution

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Developer Guide for Intel® SDK for OpenCL™ Applications 2017

Host-Side Analysis Optimization Tips