5.2.12. Specifying Number of Compute Units

Intel® FPGA SDK for OpenCL™ Pro Edition: Programming Guide

Download PDF

ID 683846

Date 10/04/2021

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Document Table of Contents x

1. Intel® FPGA SDK for OpenCL™ Overview 2. Intel® FPGA SDK for OpenCL™ Offline Compiler Kernel Compilation Flows 3. Obtaining General Information on Software, Compiler, and Custom Platform 4. Managing an FPGA Board 5. Structuring Your OpenCL Kernel 6. Designing Your Host Application 7. Compiling Your OpenCL Kernel 8. Emulating and Debugging Your OpenCL Kernel 9. Developing OpenCL Applications Using Third-party IDEs 10. Developing OpenCL™ Applications Using Intel® Code Builder for OpenCL™ 11. Intel® FPGA SDK for OpenCL™ Advanced Features A. Support Statuses of OpenCL Features B. Intel FPGA SDK for OpenCL Pro Edition Programming Guide Archives C. Document Revision History of the Intel® FPGA SDK for OpenCL™ Pro Edition Programming Guide

1. Intel® FPGA SDK for OpenCL™ Overview x

1.1. Intel® FPGA SDK for OpenCL™ Pro Edition Programming Guide Prerequisites 1.2. Intel® FPGA SDK for OpenCL™ FPGA Programming Flow

2. Intel® FPGA SDK for OpenCL™ Offline Compiler Kernel Compilation Flows x

2.1. One-Step Compilation for Simple Kernels 2.2. Multistep Intel® FPGA SDK for OpenCL™ Pro Edition Design Flow

3. Obtaining General Information on Software, Compiler, and Custom Platform x

3.1. Displaying the Software Version (version) 3.2. Displaying the Compiler Version (-version) 3.3. Listing the Intel® FPGA SDK for OpenCL™ Utility Command Options (help) 3.4. Listing the Intel® FPGA SDK for OpenCL™ Offline Compiler Command Options (no argument, -help, or -h) 3.5. Listing the Available FPGA Boards and Custom Platforms (-list-boards and -list-board-packages) 3.6. Displaying the Compilation Environment of an OpenCL Binary (env)

3.3. Listing the Intel® FPGA SDK for OpenCL™ Utility Command Options (help) x

3.3.1. Displaying Information on an Intel® FPGA SDK for OpenCL™ Utility Command Option (help <command_option>)

4. Managing an FPGA Board x

4.1. Installing an FPGA Board (install) 4.2. Uninstalling an FPGA Board (uninstall) 4.3. Querying the Device Name of Your FPGA Board (diagnose) 4.4. Running a Board Diagnostic Test (diagnose <device_name>) 4.5. Programming the FPGA Offline or without a Host (program <device_name>) 4.6. Programming the Flash Memory (flash <device_name>)

5. Structuring Your OpenCL Kernel x

5.1. Guidelines for Naming the Kernel 5.2. Programming Strategies for Optimizing Data Processing Efficiency 5.3. Programming Strategies for Optimizing Pointer-to-Local Memory Size 5.4. Implementing the Intel® FPGA SDK for OpenCL™ Channels Extension 5.5. Implementing OpenCL Pipes 5.6. Implementing Arbitrary Precision Integers 5.7. Using Predefined Preprocessor Macros in Conditional Compilation 5.8. Declaring __constant Address Space Qualifiers 5.9. Including Structure Data Types as Arguments in OpenCL Kernels 5.10. Inferring a Register 5.11. Enabling Double Precision Floating-Point Operations 5.12. Single-Cycle Floating-Point Accumulator for Single Work-Item Kernels 5.13. Integer Promotion Rules

5.2. Programming Strategies for Optimizing Data Processing Efficiency x

5.2.1. Unrolling a Loop (unroll Pragma) 5.2.2. Disabling Pipelining of a Loop (disable_loop_pipelining Pragma) 5.2.3. Coalescing Nested Loops 5.2.4. Fusing Adjacent Loops (loop_fuse Pragma) 5.2.5. Marking Loops to Prevent Automatic Fusion (nofusion Pragma) 5.2.6. Specifying a Loop Initiation interval (II) 5.2.7. Loop Concurrency (max_concurrency Pragma) 5.2.8. Loop Speculation (speculated_iterations Pragma) 5.2.9. Loop Interleaving Control (max_interleaving Pragma) 5.2.10. Floating Point Optimizations (fp contract and fp reassociate Pragma) 5.2.11. Specifying Work-Group Sizes 5.2.12. Specifying Number of Compute Units 5.2.13. Specifying Number of SIMD Work-Items 5.2.14. Specifying the private_copies Memory Attribute 5.2.15. Specifying the use_stall_enable_clusters Cluster-control Attribute

5.4. Implementing the Intel® FPGA SDK for OpenCL™ Channels Extension x

5.4.1. Overview of the Intel® FPGA SDK for OpenCL™ Channels Extension 5.4.2. Channel Data Behavior 5.4.3. Multiple Work-Item Ordering for Channels 5.4.4. Restrictions in the Implementation of Intel® FPGA SDK for OpenCL™ Channels Extension 5.4.5. Enabling the Intel® FPGA SDK for OpenCL™ Channels for OpenCL Kernel

5.4.3. Multiple Work-Item Ordering for Channels x

5.4.3.1. Work-Item Serial Execution of Channels

5.4.5. Enabling the Intel® FPGA SDK for OpenCL™ Channels for OpenCL Kernel x

5.4.5.1. Declaring the Channel Handle 5.4.5.2. Implementing Blocking Channel Writes 5.4.5.3. Implementing Blocking Channel Reads 5.4.5.4. Implementing I/O Channels Using the io Channels Attribute 5.4.5.5. Emulating I/O Channels 5.4.5.6. Use Models of Intel® FPGA SDK for OpenCL™ Channels Implementation 5.4.5.7. Implementing Buffered Channels Using the depth Channels Attribute 5.4.5.8. Enforcing the Order of Channel Calls

5.5. Implementing OpenCL Pipes x

5.5.1. Overview of the OpenCL Pipe Functions 5.5.2. Pipe Data Behavior 5.5.3. Multiple Work-Item Ordering for Pipes 5.5.4. Restrictions in OpenCL Pipes Implementation 5.5.5. Enabling OpenCL Pipes for Kernels 5.5.6. Direct Communication with Kernels via Host Pipes

5.5.3. Multiple Work-Item Ordering for Pipes x

5.5.3.1. Work-item Serial Execution of Pipes

5.5.5. Enabling OpenCL Pipes for Kernels x

5.5.5.1. Ensuring Compatibility with Other OpenCL SDKs 5.5.5.2. Declaring the Pipe Handle 5.5.5.3. Implementing Pipe Writes 5.5.5.4. Implementing Pipe Reads 5.5.5.5. Implementing Buffered Pipes Using the depth Attribute 5.5.5.6. Implementing I/O Pipes Using the io Attribute 5.5.5.7. Enforcing the Order of Pipe Calls

5.5.6. Direct Communication with Kernels via Host Pipes x

5.5.6.1. Optional intel_host_accessible Kernel Argument Attribute 5.5.6.2. API Functions for Interacting with cl_mem Pipe Objects Bound to Host-Accessible Pipe Kernel Arguments 5.5.6.3. Creating a Host Accessible Pipe 5.5.6.4. Example Use of the cl_intel_fpga_host_pipe Extension

5.9. Including Structure Data Types as Arguments in OpenCL Kernels x

5.9.1. Matching Data Layouts of Host and Kernel Structure Data Types 5.9.2. Disabling Insertion of Data Structure Padding 5.9.3. Specifying the Alignment of a Struct

5.10. Inferring a Register x

5.10.1. Inferring a Shift Register

5.12. Single-Cycle Floating-Point Accumulator for Single Work-Item Kernels x

5.12.1. Programming Strategies for Inferring the Accumulator

6. Designing Your Host Application x

6.1. Host Programming Requirements 6.2. Allocating OpenCL Buffers for Manual Partitioning of Global Memory 6.3. Triggering Collection Profiling Data During Kernel Execution 6.4. Accessing Custom Platform-Specific Functions 6.5. Modifying Host Program for Structure Parameter Conversion 6.6. Managing Host Application 6.7. Allocating Shared Memory for OpenCL Kernels Targeting SoCs 6.8. Sharing Multiple Devices Across Multiple Host Programs

6.1. Host Programming Requirements x

6.1.1. Host Machine Memory Requirements 6.1.2. Host Binary Requirement 6.1.3. Multiple Host Threads 6.1.4. Out-of-order Command Queues 6.1.5. Requirement for Multiple Command Queues to Execute Kernels Concurrently

6.2. Allocating OpenCL Buffers for Manual Partitioning of Global Memory x

6.2.1. Partitioning Buffers Across Multiple Interfaces of the Same Memory Type 6.2.2. Partitioning Buffers Across Different Memory Types (Heterogeneous Memory) 6.2.3. Creating a Pipe Object in Your Host Application 6.2.4. Enabling All Global Memory

6.3. Triggering Collection Profiling Data During Kernel Execution x

6.3.1. Profiling Autorun Kernels 6.3.2. Profile Data Acquisition

6.3.1. Profiling Autorun Kernels x

6.3.1.1. Multiple Autorun Profiling Calls

6.6. Managing Host Application x

6.6.1. Displaying Example Makefile Fragments (example-makefile or makefile) 6.6.2. Compiling and Linking Your Host Application 6.6.3. Using OpenCL ICD Extension APIs 6.6.4. Programming an FPGA via the Host 6.6.5. Termination of the Runtime Environment and Error Recovery

6.6.2. Compiling and Linking Your Host Application x

6.6.2.1. Linking Your Host Application to the Khronos ICD Loader Library 6.6.2.2. Displaying Flags for Compiling Host Application (compile-config) 6.6.2.3. Displaying Paths to OpenCL Host Runtime and MMD Libraries (ldflags) 6.6.2.4. Listing OpenCL Host Runtime and MMD Libraries (ldlibs) 6.6.2.5. Displaying Information on OpenCL Host Runtime and MMD Libraries (link-config or linkflags)

6.6.4. Programming an FPGA via the Host x

6.6.4.1. Programming Multiple FPGA Devices

7. Compiling Your OpenCL Kernel x

7.1. Compiling Your Kernel to Create Hardware Configuration File 7.2. Compiling Your Kernel without Building Hardware (-c) 7.3. Compiling and Linking Your Kernels or Object Files without Building Hardware (-rtl) 7.4. Specifying the Location of Header Files (-I=<directory>) 7.5. Specifying the Name of an Intel® FPGA SDK for OpenCL™ Offline Compiler Output File (-o <filename>) 7.6. Compiling a Kernel for a Specific FPGA Board and Custom Platform (-board=<board_name>) and (-board-package=<board_package_path>) 7.7. Resolving Hardware Generation Fitting Errors during Kernel Compilation (-high-effort) 7.8. Specifying Schedule Fmax Target for Kernels (-clock=<clock_target>) 7.9. Defining Preprocessor Macros to Specify Kernel Parameters (-D<macro_name>) 7.10. Generating Compilation Progress Report (-v) 7.11. Displaying the Estimated Resource Usage Summary On-Screen (-report) 7.12. Suppressing Warning Messages from the Intel® FPGA SDK for OpenCL™ Offline Compiler (-W) 7.13. Converting Warning Messages from the Intel® FPGA SDK for OpenCL™ Offline Compiler into Error Messages (-Werror) 7.14. Removing Debug Data from Compiler Reports and Source Code from the .aocx File (-g0) 7.15. Disabling Burst-Interleaving of Global Memory (-no-interleaving=<global_memory_type>) 7.16. Forcing Ring Interconnect for Global Memory (-global-ring) 7.17. Forcing a Single Store Ring to Reduce Area at the Expense of Write Throughput to Global Memory (-force-single-store-ring) 7.18. Forcing Fewer Read Data Reorder Units to Reduce Area at the Expense of Read Throughput to Global Memory (-num-reorder) 7.19. Configuring Constant Memory Cache Size (-const-cache-bytes=<N>) 7.20. Relaxing the Order of Floating-Point Operations (-ffp-reassociate) 7.21. Reducing Floating-Point Rounding Operations (-ffp-contract=fast) 7.22. Speeding Up Your OpenCL Compilation (-fast-compile) 7.23. Compiling Your Kernel Incrementally (-incremental) 7.24. Compiling Your Kernel with Memory Error Correction Coding (-ecc) 7.25. Disabling Hardware Kernel Invocation Queue (-no-hardware-kernel-invocation-queue) 7.26. Modifying the Handshaking Protocol (-hyper-optimized-handshaking) 7.27. Pipelining Loops in Non-task Kernels (-auto-pipeline)

7.23. Compiling Your Kernel Incrementally (-incremental) x

7.23.1. The Incremental Compile Report 7.23.2. Additional Command Options for Incremental Compilation 7.23.3. Limitations of the Incremental Compilation Feature

8. Emulating and Debugging Your OpenCL Kernel x

8.1. Setting up the Emulator 8.2. Modifying Channels Kernel Code for Emulation 8.3. Compiling a Kernel for Emulation (-march=emulator) 8.4. Emulating Your OpenCL Kernel 8.5. Debugging Your OpenCL Kernel on Linux 8.6. Limitations of the Intel® FPGA SDK for OpenCL™ Emulator 8.7. Discrepancies in Hardware and Emulator Results 8.8. Emulator Environment Variables 8.9. Extensions Supported by the Emulator 8.10. Emulator Known Issues

8.2. Modifying Channels Kernel Code for Emulation x

8.2.1. Emulating a Kernel that Passes Pipes or Channels by Value 8.2.2. Emulating Channel Depth 8.2.3. Emulating Applications with a Channel That Reads or Writes to an I/O Channel

9. Developing OpenCL Applications Using Third-party IDEs x

9.1. FPGA Workflows in Microsoft Visual Studio 9.2. FPGA Workflows in Eclipse 9.3. Limitations

9.1. FPGA Workflows in Microsoft Visual Studio x

9.1.1. Preparing the Visual Studio Environment 9.1.2. Creating an FPGA OpenCL Template 9.1.3. Configuring the Build Targets 9.1.4. Configuring Build Options for a Project 9.1.5. Generating the High-level Design Report 9.1.6. Building and Running the FPGA Template

9.2. FPGA Workflows in Eclipse x

9.2.1. Preparing the Eclipse Environment 9.2.2. Creating a Simple FPGA application 9.2.3. Creating a Makefile Project 9.2.4. Building a Project

9.2.2. Creating a Simple FPGA application x

9.2.2.1. Creating a Project 9.2.2.2. Reviewing the Code and Building the Project 9.2.2.3. Running the Application

10. Developing OpenCL™ Applications Using Intel® Code Builder for OpenCL™ x

10.1. Configuring the Intel® Code Builder for OpenCL™ Offline Compiler Plug-in for Microsoft Visual Studio 10.2. Configuring the Intel® Code Builder for OpenCL™ Offline Compiler Plug-in for Eclipse 10.3. Creating a Session in the Intel® Code Builder for OpenCL™ 10.4. Configuring a Session

11. Intel® FPGA SDK for OpenCL™ Advanced Features x

11.1. OpenCL Library 11.2. Memory Attributes for Configuring Kernel Memory Systems 11.3. Kernel Attributes for Reducing the Overhead on Hardware Usage 11.4. Kernel Replication Using the num_compute_units(X,Y,Z) Attribute 11.5. Intra-Kernel Registered Assignment Built-In Function

11.1. OpenCL Library x

11.1.1. Creating Library Objects From OpenCL Code 11.1.2. Understanding RTL Modules and the OpenCL Pipeline 11.1.3. Packaging an OpenCL Helper Function File for an OpenCL Library 11.1.4. Packaging an RTL Component for an OpenCL Library 11.1.5. Verifying the RTL Modules 11.1.6. Specifying an OpenCL Library when Compiling an OpenCL Kernel 11.1.7. Debugging Your OpenCL Library Through Simulation (Preview) 11.1.8. Using an OpenCL Library that Works with Simple Functions (Example 1) 11.1.9. Using an OpenCL Library that Works with External Memory (Example 2) 11.1.10. OpenCL Library Command-Line Options

11.1.1. Creating Library Objects From OpenCL Code x

11.1.1.1. Creating an Object File From OpenCL Code 11.1.1.2. Packaging Object Files into a Library File

11.1.2. Understanding RTL Modules and the OpenCL Pipeline x

11.1.2.1. Overview: Intel FPGA SDK for OpenCL Pipeline Approach 11.1.2.2. Integration of an RTL Module into the Intel FPGA SDK for OpenCL Pipeline 11.1.2.3. Stall-Free RTL 11.1.2.4. RTL Module Interfaces 11.1.2.5. Avalon Streaming Interface 11.1.2.6. RTL Reset and Clock Signals 11.1.2.7. Object Manifest File Syntax of an RTL Module 11.1.2.8. Interaction between RTL Module and External Memory 11.1.2.9. Order of Threads Entering an RTL Module 11.1.2.10. OpenCL C Model of an RTL Module 11.1.2.11. Potential Incompatibility between RTL Modules and Partial Reconfiguration

11.1.2.6. RTL Reset and Clock Signals x

11.1.2.6.1. Intel® Stratix® 10 Design-Specific Reset Requirements for Stall-Free and Stallable RTL Modules

11.1.2.7. Object Manifest File Syntax of an RTL Module x

11.1.2.7.1. XML Elements for ATTRIBUTES 11.1.2.7.2. XML Elements for INTERFACE 11.1.2.7.3. XML Elements for RESOURCES

11.1.4. Packaging an RTL Component for an OpenCL Library x

11.1.4.1. Restrictions and Limitations in RTL Support for the Intel® FPGA SDK for OpenCL™ Library Feature

11.1.7. Debugging Your OpenCL Library Through Simulation (Preview) x

11.1.7.1. Compiling a Library for Simulation (-march=simulator) 11.1.7.2. Simulating Your OpenCL* Library 11.1.7.3. Troubleshooting Simulator Issues

11.2. Memory Attributes for Configuring Kernel Memory Systems x

11.2.1. Restrictions on the Use of Variable-specific Attributes

11.3. Kernel Attributes for Reducing the Overhead on Hardware Usage x

11.3.1. Hardware for Kernel Interface

11.3.1. Hardware for Kernel Interface x

11.3.1.1. Omit Hardware that Generates and Dispatches Kernel IDs 11.3.1.2. Omit Communication Hardware between the Host and the Kernel 11.3.1.3. Omit Hardware to Support the global_work_offset Argument in the clEnqueueNDRangeKernel API

11.4. Kernel Replication Using the num_compute_units(X,Y,Z) Attribute x

11.4.1. Customization of Replicated Kernels Using the get_compute_id() Function 11.4.2. Using Channels with Kernel Copies

A. Support Statuses of OpenCL Features x

A.1. Support Statuses of OpenCL 1.0 Features A.2. Support Statuses of OpenCL 1.2 Features A.3. Support Statuses of OpenCL 2.0 Features A.4. Intel® FPGA SDK for OpenCL™ Allocation Limits

A.1. Support Statuses of OpenCL 1.0 Features x

A.1.1. OpenCL 1.0 C Programming Language Implementation A.1.2. OpenCL C Programming Language Restrictions A.1.3. Argument Types for Built-in Geometric Functions A.1.4. Numerical Compliance Implementation A.1.5. Image Addressing and Filtering Implementation A.1.6. Atomic Functions A.1.7. Embedded Profile Implementation

A.2. Support Statuses of OpenCL 1.2 Features x

A.2.1. OpenCL 1.2 Runtime Implementation A.2.2. OpenCL 1.2 C Programming Language Implementation

A.3. Support Statuses of OpenCL 2.0 Features x

A.3.1. OpenCL 2.0 Headers A.3.2. OpenCL 2.0 Runtime Implementation A.3.3. OpenCL 2.0 C Programming Language Restrictions for Pipes

1. Intel® FPGA SDK for OpenCL™ Overview

1.1. Intel® FPGA SDK for OpenCL™ Pro Edition Programming Guide Prerequisites

1.2. Intel® FPGA SDK for OpenCL™ FPGA Programming Flow

2. Intel® FPGA SDK for OpenCL™ Offline Compiler Kernel Compilation Flows

2.1. One-Step Compilation for Simple Kernels

2.2. Multistep Intel® FPGA SDK for OpenCL™ Pro Edition Design Flow

3. Obtaining General Information on Software, Compiler, and Custom Platform

3.1. Displaying the Software Version (version)

3.2. Displaying the Compiler Version (-version)

3.3. Listing the Intel® FPGA SDK for OpenCL™ Utility Command Options (help)

3.3.1. Displaying Information on an Intel® FPGA SDK for OpenCL™ Utility Command Option (help <command_option>)

3.4. Listing the Intel® FPGA SDK for OpenCL™ Offline Compiler Command Options (no argument, -help, or -h)

3.5. Listing the Available FPGA Boards and Custom Platforms (-list-boards and -list-board-packages)

3.6. Displaying the Compilation Environment of an OpenCL Binary (env)

4. Managing an FPGA Board

4.1. Installing an FPGA Board (install)

4.2. Uninstalling an FPGA Board (uninstall)

4.3. Querying the Device Name of Your FPGA Board (diagnose)

4.4. Running a Board Diagnostic Test (diagnose <device_name>)

4.5. Programming the FPGA Offline or without a Host (program <device_name>)

4.6. Programming the Flash Memory (flash <device_name>)

5. Structuring Your OpenCL Kernel

5.1. Guidelines for Naming the Kernel

5.2. Programming Strategies for Optimizing Data Processing Efficiency

5.2.1. Unrolling a Loop (unroll Pragma)

5.2.2. Disabling Pipelining of a Loop (disable_loop_pipelining Pragma)

5.2.3. Coalescing Nested Loops

5.2.4. Fusing Adjacent Loops (loop_fuse Pragma)

5.2.5. Marking Loops to Prevent Automatic Fusion (nofusion Pragma)

5.2.6. Specifying a Loop Initiation interval (II)

5.2.7. Loop Concurrency (max_concurrency Pragma)

5.2.8. Loop Speculation (speculated_iterations Pragma)

5.2.9. Loop Interleaving Control (max_interleaving Pragma)

5.2.10. Floating Point Optimizations (fp contract and fp reassociate Pragma)

5.2.11. Specifying Work-Group Sizes

5.2.12. Specifying Number of Compute Units

5.2.13. Specifying Number of SIMD Work-Items

5.2.14. Specifying the private_copies Memory Attribute

5.2.15. Specifying the use_stall_enable_clusters Cluster-control Attribute

5.3. Programming Strategies for Optimizing Pointer-to-Local Memory Size

5.4. Implementing the Intel® FPGA SDK for OpenCL™ Channels Extension

5.4.1. Overview of the Intel® FPGA SDK for OpenCL™ Channels Extension

5.4.2. Channel Data Behavior

5.4.3. Multiple Work-Item Ordering for Channels

5.4.3.1. Work-Item Serial Execution of Channels

5.4.4. Restrictions in the Implementation of Intel® FPGA SDK for OpenCL™ Channels Extension

5.4.5. Enabling the Intel® FPGA SDK for OpenCL™ Channels for OpenCL Kernel

5.4.5.1. Declaring the Channel Handle

5.4.5.2. Implementing Blocking Channel Writes

5.4.5.3. Implementing Blocking Channel Reads

5.4.5.4. Implementing I/O Channels Using the io Channels Attribute

5.4.5.5. Emulating I/O Channels

5.4.5.6. Use Models of Intel® FPGA SDK for OpenCL™ Channels Implementation

5.4.5.7. Implementing Buffered Channels Using the depth Channels Attribute

5.4.5.8. Enforcing the Order of Channel Calls

5.5. Implementing OpenCL Pipes

5.5.1. Overview of the OpenCL Pipe Functions

5.5.2. Pipe Data Behavior

5.5.3. Multiple Work-Item Ordering for Pipes

5.5.3.1. Work-item Serial Execution of Pipes

5.5.4. Restrictions in OpenCL Pipes Implementation

5.5.5. Enabling OpenCL Pipes for Kernels

5.5.5.1. Ensuring Compatibility with Other OpenCL SDKs

5.5.5.2. Declaring the Pipe Handle

5.5.5.3. Implementing Pipe Writes

5.5.5.4. Implementing Pipe Reads

5.5.5.5. Implementing Buffered Pipes Using the depth Attribute

5.5.5.6. Implementing I/O Pipes Using the io Attribute

5.5.5.7. Enforcing the Order of Pipe Calls

5.5.6. Direct Communication with Kernels via Host Pipes

5.5.6.1. Optional intel_host_accessible Kernel Argument Attribute

5.5.6.2. API Functions for Interacting with cl_mem Pipe Objects Bound to Host-Accessible Pipe Kernel Arguments

5.5.6.3. Creating a Host Accessible Pipe

5.5.6.4. Example Use of the cl_intel_fpga_host_pipe Extension

5.6. Implementing Arbitrary Precision Integers

5.7. Using Predefined Preprocessor Macros in Conditional Compilation

5.8. Declaring __constant Address Space Qualifiers

5.9. Including Structure Data Types as Arguments in OpenCL Kernels

5.9.1. Matching Data Layouts of Host and Kernel Structure Data Types

5.9.2. Disabling Insertion of Data Structure Padding

5.9.3. Specifying the Alignment of a Struct

5.10. Inferring a Register

5.10.1. Inferring a Shift Register

5.11. Enabling Double Precision Floating-Point Operations

5.12. Single-Cycle Floating-Point Accumulator for Single Work-Item Kernels

5.12.1. Programming Strategies for Inferring the Accumulator

5.13. Integer Promotion Rules

6. Designing Your Host Application

6.1. Host Programming Requirements

6.1.1. Host Machine Memory Requirements

6.1.2. Host Binary Requirement

6.1.3. Multiple Host Threads

6.1.4. Out-of-order Command Queues

6.1.5. Requirement for Multiple Command Queues to Execute Kernels Concurrently

6.2. Allocating OpenCL Buffers for Manual Partitioning of Global Memory

6.2.1. Partitioning Buffers Across Multiple Interfaces of the Same Memory Type

6.2.2. Partitioning Buffers Across Different Memory Types (Heterogeneous Memory)

6.2.3. Creating a Pipe Object in Your Host Application

6.2.4. Enabling All Global Memory

6.3. Triggering Collection Profiling Data During Kernel Execution

6.3.1. Profiling Autorun Kernels

6.3.1.1. Multiple Autorun Profiling Calls

6.3.2. Profile Data Acquisition

6.4. Accessing Custom Platform-Specific Functions

6.5. Modifying Host Program for Structure Parameter Conversion

6.6. Managing Host Application

6.6.1. Displaying Example Makefile Fragments (example-makefile or makefile)

6.6.2. Compiling and Linking Your Host Application

6.6.2.1. Linking Your Host Application to the Khronos ICD Loader Library

6.6.2.2. Displaying Flags for Compiling Host Application (compile-config)

6.6.2.3. Displaying Paths to OpenCL Host Runtime and MMD Libraries (ldflags)

6.6.2.4. Listing OpenCL Host Runtime and MMD Libraries (ldlibs)

6.6.2.5. Displaying Information on OpenCL Host Runtime and MMD Libraries (link-config or linkflags)

6.6.3. Using OpenCL ICD Extension APIs

6.6.4. Programming an FPGA via the Host

6.6.4.1. Programming Multiple FPGA Devices

6.6.5. Termination of the Runtime Environment and Error Recovery

6.7. Allocating Shared Memory for OpenCL Kernels Targeting SoCs

6.8. Sharing Multiple Devices Across Multiple Host Programs

7. Compiling Your OpenCL Kernel

7.1. Compiling Your Kernel to Create Hardware Configuration File

7.2. Compiling Your Kernel without Building Hardware (-c)

7.3. Compiling and Linking Your Kernels or Object Files without Building Hardware (-rtl)

7.4. Specifying the Location of Header Files (-I=<directory>)

7.5. Specifying the Name of an Intel® FPGA SDK for OpenCL™ Offline Compiler Output File (-o <filename>)

7.6. Compiling a Kernel for a Specific FPGA Board and Custom Platform (-board=<board_name>) and (-board-package=<board_package_path>)

7.7. Resolving Hardware Generation Fitting Errors during Kernel Compilation (-high-effort)

7.8. Specifying Schedule Fmax Target for Kernels (-clock=<clock_target>)

7.9. Defining Preprocessor Macros to Specify Kernel Parameters (-D<macro_name>)

7.10. Generating Compilation Progress Report (-v)

7.11. Displaying the Estimated Resource Usage Summary On-Screen (-report)

7.12. Suppressing Warning Messages from the Intel® FPGA SDK for OpenCL™ Offline Compiler (-W)

7.13. Converting Warning Messages from the Intel® FPGA SDK for OpenCL™ Offline Compiler into Error Messages (-Werror)

7.14. Removing Debug Data from Compiler Reports and Source Code from the .aocx File (-g0)

7.15. Disabling Burst-Interleaving of Global Memory (-no-interleaving=<global_memory_type>)

7.16. Forcing Ring Interconnect for Global Memory (-global-ring)

7.17. Forcing a Single Store Ring to Reduce Area at the Expense of Write Throughput to Global Memory (-force-single-store-ring)

7.18. Forcing Fewer Read Data Reorder Units to Reduce Area at the Expense of Read Throughput to Global Memory (-num-reorder)

7.19. Configuring Constant Memory Cache Size (-const-cache-bytes=<N>)

7.20. Relaxing the Order of Floating-Point Operations (-ffp-reassociate)

7.21. Reducing Floating-Point Rounding Operations (-ffp-contract=fast)

7.22. Speeding Up Your OpenCL Compilation (-fast-compile)

7.23. Compiling Your Kernel Incrementally (-incremental)

7.23.1. The Incremental Compile Report

7.23.2. Additional Command Options for Incremental Compilation

7.23.3. Limitations of the Incremental Compilation Feature

7.24. Compiling Your Kernel with Memory Error Correction Coding (-ecc)

7.25. Disabling Hardware Kernel Invocation Queue (-no-hardware-kernel-invocation-queue)

7.26. Modifying the Handshaking Protocol (-hyper-optimized-handshaking)

7.27. Pipelining Loops in Non-task Kernels (-auto-pipeline)

8. Emulating and Debugging Your OpenCL Kernel

8.1. Setting up the Emulator

8.2. Modifying Channels Kernel Code for Emulation

8.2.1. Emulating a Kernel that Passes Pipes or Channels by Value

8.2.2. Emulating Channel Depth

8.2.3. Emulating Applications with a Channel That Reads or Writes to an I/O Channel

8.3. Compiling a Kernel for Emulation (-march=emulator)

8.4. Emulating Your OpenCL Kernel

8.5. Debugging Your OpenCL Kernel on Linux

8.6. Limitations of the Intel® FPGA SDK for OpenCL™ Emulator

8.7. Discrepancies in Hardware and Emulator Results

8.8. Emulator Environment Variables

8.9. Extensions Supported by the Emulator

8.10. Emulator Known Issues

9. Developing OpenCL Applications Using Third-party IDEs

9.1. FPGA Workflows in Microsoft Visual Studio

9.1.1. Preparing the Visual Studio Environment

9.1.2. Creating an FPGA OpenCL Template

9.1.3. Configuring the Build Targets

9.1.4. Configuring Build Options for a Project

9.1.5. Generating the High-level Design Report

9.1.6. Building and Running the FPGA Template

9.2. FPGA Workflows in Eclipse

9.2.1. Preparing the Eclipse Environment

9.2.2. Creating a Simple FPGA application

9.2.2.1. Creating a Project

9.2.2.2. Reviewing the Code and Building the Project

9.2.2.3. Running the Application

9.2.3. Creating a Makefile Project

9.2.4. Building a Project

9.3. Limitations

10. Developing OpenCL™ Applications Using Intel® Code Builder for OpenCL™

10.1. Configuring the Intel® Code Builder for OpenCL™ Offline Compiler Plug-in for Microsoft Visual Studio

10.2. Configuring the Intel® Code Builder for OpenCL™ Offline Compiler Plug-in for Eclipse

10.3. Creating a Session in the Intel® Code Builder for OpenCL™

10.4. Configuring a Session

11. Intel® FPGA SDK for OpenCL™ Advanced Features

11.1. OpenCL Library

11.1.1. Creating Library Objects From OpenCL Code

11.1.1.1. Creating an Object File From OpenCL Code

11.1.1.2. Packaging Object Files into a Library File

11.1.2. Understanding RTL Modules and the OpenCL Pipeline

11.1.2.1. Overview: Intel FPGA SDK for OpenCL Pipeline Approach

11.1.2.2. Integration of an RTL Module into the Intel FPGA SDK for OpenCL Pipeline

11.1.2.3. Stall-Free RTL

11.1.2.4. RTL Module Interfaces

11.1.2.5. Avalon Streaming Interface

11.1.2.6. RTL Reset and Clock Signals

11.1.2.6.1. Intel® Stratix® 10 Design-Specific Reset Requirements for Stall-Free and Stallable RTL Modules

11.1.2.7. Object Manifest File Syntax of an RTL Module

11.1.2.7.1. XML Elements for ATTRIBUTES

11.1.2.7.2. XML Elements for INTERFACE

11.1.2.7.3. XML Elements for RESOURCES

11.1.2.8. Interaction between RTL Module and External Memory

11.1.2.9. Order of Threads Entering an RTL Module

11.1.2.10. OpenCL C Model of an RTL Module

11.1.2.11. Potential Incompatibility between RTL Modules and Partial Reconfiguration

11.1.3. Packaging an OpenCL Helper Function File for an OpenCL Library

11.1.4. Packaging an RTL Component for an OpenCL Library

11.1.4.1. Restrictions and Limitations in RTL Support for the Intel® FPGA SDK for OpenCL™ Library Feature

11.1.5. Verifying the RTL Modules

11.1.6. Specifying an OpenCL Library when Compiling an OpenCL Kernel

11.1.7. Debugging Your OpenCL Library Through Simulation (Preview)

11.1.7.1. Compiling a Library for Simulation (-march=simulator)

11.1.7.2. Simulating Your OpenCL* Library

11.1.7.3. Troubleshooting Simulator Issues

11.1.8. Using an OpenCL Library that Works with Simple Functions (Example 1)

11.1.9. Using an OpenCL Library that Works with External Memory (Example 2)

11.1.10. OpenCL Library Command-Line Options

11.2. Memory Attributes for Configuring Kernel Memory Systems

11.2.1. Restrictions on the Use of Variable-specific Attributes

11.3. Kernel Attributes for Reducing the Overhead on Hardware Usage

11.3.1. Hardware for Kernel Interface

11.3.1.1. Omit Hardware that Generates and Dispatches Kernel IDs

11.3.1.2. Omit Communication Hardware between the Host and the Kernel

11.3.1.3. Omit Hardware to Support the global_work_offset Argument in the clEnqueueNDRangeKernel API

11.4. Kernel Replication Using the num_compute_units(X,Y,Z) Attribute

11.4.1. Customization of Replicated Kernels Using the get_compute_id() Function

11.4.2. Using Channels with Kernel Copies

11.5. Intra-Kernel Registered Assignment Built-In Function

A. Support Statuses of OpenCL Features

A.1. Support Statuses of OpenCL 1.0 Features

A.1.1. OpenCL 1.0 C Programming Language Implementation

A.1.2. OpenCL C Programming Language Restrictions

A.1.3. Argument Types for Built-in Geometric Functions

A.1.4. Numerical Compliance Implementation

A.1.5. Image Addressing and Filtering Implementation

A.1.6. Atomic Functions

A.1.7. Embedded Profile Implementation

A.2. Support Statuses of OpenCL 1.2 Features

A.2.1. OpenCL 1.2 Runtime Implementation

A.2.2. OpenCL 1.2 C Programming Language Implementation

A.3. Support Statuses of OpenCL 2.0 Features

A.3.1. OpenCL 2.0 Headers

A.3.2. OpenCL 2.0 Runtime Implementation

A.3.3. OpenCL 2.0 C Programming Language Restrictions for Pipes

A.4. Intel® FPGA SDK for OpenCL™ Allocation Limits

B. Intel FPGA SDK for OpenCL Pro Edition Programming Guide Archives

C. Document Revision History of the Intel® FPGA SDK for OpenCL™ Pro Edition Programming Guide

5.2.12. Specifying Number of Compute Units

To increase the data-processing efficiency of an OpenCL™ kernel, you can instruct the Intel® FPGA SDK for OpenCL™ Offline Compiler to generate multiple kernel compute units. Each compute unit is capable of executing multiple work-groups simultaneously.

CAUTION:

Multiplying the number of kernel compute units increases data throughput at the expense of FPGA resource consumption and global memory bandwidth contention between compute units.

To specify the number of compute units for a kernel, insert the num_compute_units(N) attribute in the kernel source code.

For example, the code fragment below directs the offline compiler to instantiate two compute units in a kernel:

__attribute__((num_compute_units(2)))
__kernel void test(__global const float * restrict a,
                   __global const float * restrict b,
                   __global float * restrict answer)
{
   size_t gid = get_global_id(0);
   answer[gid] = a[gid] + b[gid];
}

The offline compiler dynamically distributes work-groups across the specified number of compute units.

Note: To identify the specific compute unit on which a work-item is executing, call the get_compute_id() intrinsic function. Refer to Customization of Replicated Kernels Using the get_compute_id() Function for more information.

Related Information

Customization of Replicated Kernels Using the get_compute_id() Function

Level Two Title

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® FPGA SDK for OpenCL™ Pro Edition: Programming Guide

5.2.12. Specifying Number of Compute Units