Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide

ID 683521
Date 12/19/2022
Public
Document Table of Contents

10.1.1. Reducing the Number of Kernels

Instead of partitioning your design across multiple kernels, consider consolidating the design into fewer kernels. For Intel® Stratix® 10 designs, Intel® recommends that you only use separate kernels for truly asynchronous execution.

The following example shows a producer kernel and a consumer kernel communicating via channels:

kernel producer(unsigned N) {
   int result;
   for (unsigned int i = 0; i < N; i++) {
      write_channel_intel(Produce(i));
   }
} 
 
kernel consumer(unsigned N) {
   for (unsigned int i = 0; i < N; i++) {
      Consume(i, read_channel_intel());
   }
}

The optimized code below merges the two kernels in the example above into a single kernel, which uses the computation results directly without channel accesses:

kernel fused(unsigned N) {
   for (unsigned int i = 0; i < N; i++) {
      Consume(i, Produce(i));
   }
}