Intel® FPGA SDK for OpenCL™ Standard Edition: Best Practices Guide

ID 683176
Date 9/24/2018
Public
Document Table of Contents

6. Strategies for Improving NDRange Kernel Data Processing Efficiency

Consider the following kernel code:

__kernel void sum (__global const float * restrict a,
                   __global const float * restrict b,
                   __global float * restrict answer)
{
    size_t gid = get_global_id(0);

    answer[gid] = a[gid] + b[gid];
}

This kernel adds arrays a and b, one element at a time. Each work-item is responsible for adding two elements, one from each array, and storing the sum into the array answer. Without optimization, the kernel performs one addition per work-item.

To maximize the performance of your OpenCL™ kernel, consider implementing the applicable optimization techniques to improve data processing efficiency.

Did you find the information on this page useful?

Characters remaining:

Feedback Message