Developer Guide

Contents

Prepinning Memory

You must consider how the transfer of data from the host to the device occurs when optimizing kernel memory accesses. For designs that have longer data transfer times than the compute time, the data transfer time may be a bottleneck. On devices supporting greater than a PCIe Gen3 x 8 transfer rates, prepinning the memory that is on the host prior to its transfer allows for it to transfer at a higher bandwidth.
For example, on the Intel® FPGA Programmable Acceleration Card (PAC) D5005 (previously known as
Intel® FPGA Programmable Acceleration Card (PAC) with Intel® Stratix® 10 SX FPGA
) that has a PCIe Gen3 x16 transfer rate, memory transfer with prepinning achieves approximately 12 GB/s in half-duplex and 21 GB/s in full-duplex. The following is an example of how to copy the prepinned memory to the device global memory when using such a board:
intel::fpga_selector device_selector; auto device_queue = queue(device_selector); int* data = malloc_host<int>(1024, device_queue); … // initialize the data int* data_device = malloc_device<int>(1024, device_queue); device_queue.template copy<int>(data_device, data, 1024);
  • Most BSPs implement the Unified Shared Memory (USM) call
    malloc_host()
    using prepinned memory. Hence, a prepinned memory is available only on devices that support USM host allocation.
  • SYCL USM host allocations are only supported by some BSPs, such as the Intel® FPGA Programmable Acceleration Card (PAC) D5005 (previously known as
    Intel® FPGA Programmable Acceleration Card (PAC) with Intel® Stratix® 10 SX FPGA
    ). Check with your BSP vendor to see if they support SYCL USM host allocations.
Pinned memory is a scarce resource on the system, so carefully consider which buffers you want to pin to avoid exceeding the system limit. In addition, pinning itself is an expensive operation, so for optimal performance, ensure that the creation of pinned buffers takes place outside the main compute loop.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.