Execution Model Overview Thread Mapping and GPU Occupancy Kernels Using Libraries for GPU Offload Host/Device Memory, Buffer and USM Host/Device Coordination Using Multiple Heterogeneous Devices Compilation OpenMP Offloading Tuning Guide Multi-GPU, Multi-Stack and Multi-C-Slice Architecture and Programming Level Zero Performance Profiling and Analysis Configuring GPU Device
Sub-Groups and SIMD Vectorization Removing Conditional Checks Registerization and Avoiding Register Spills Small Register Mode vs. Large Register Mode Shared Local Memory Pointer Aliasing and the Restrict Directive Synchronization among Threads in a Kernel Considerations for Selecting Work-Group Size Reduction Kernel Launch Executing Multiple Kernels on the Device at the Same Time Submitting Kernels to Multiple Queues Avoiding Redundant Queue Constructions Programming Intel® XMX Using SYCL Joint Matrix Extension Doing I/O in the Kernel
oneAPI GPU Optimization Guide
Welcome to the oneAPI GPU Optimization Guide. This document gives tips for getting the best GPU performance for oneAPI programs.
Did you find the information on this page useful?