You might wonder if you can set the
to 1 (i.e.,
) to minimize the latency. In the figure above, you may notice small gaps between the kernels in the streaming design (for example, between
). This is caused by the overhead of launching kernels and detecting kernel completion on the host. These gaps increase the total processing time and therefore, decrease the throughput of the design (that is, when compared to the offload design, it takes more time to process the same amount of data). If these gaps are negligible, then the throughput is negligibly affected.