A newer version of this document is available. Customers should click here to go to the newest version.
                                    
                                    
                                        
                                        
                                            Execution Model Overview
                                        
                                        
                                    
                                        
                                        
                                            Thread Mapping and GPU Occupancy
                                        
                                        
                                    
                                        
                                            Kernels
                                        
                                        
                                        
                                    
                                        
                                            Using Libraries for GPU Offload
                                        
                                        
                                        
                                    
                                        
                                            Host/Device Memory, Buffer and USM
                                        
                                        
                                        
                                    
                                        
                                            Host/Device Coordination
                                        
                                        
                                        
                                    
                                        
                                        
                                            Using Multiple Heterogeneous Devices
                                        
                                        
                                    
                                        
                                            Compilation
                                        
                                        
                                        
                                    
                                        
                                            OpenMP Offloading Tuning Guide
                                        
                                        
                                        
                                    
                                        
                                            Multi-GPU, Multi-Stack and Multi-C-Slice Architecture and Programming
                                        
                                        
                                        
                                    
                                        
                                            Level Zero
                                        
                                        
                                        
                                    
                                        
                                            Performance Profiling and Analysis
                                        
                                        
                                        
                                    
                                        
                                        
                                            Configuring GPU Device
                                        
                                        
                                    
                                
                            
                                                
                                                
                                                    
                                                    
                                                        Sub-Groups and SIMD Vectorization
                                                    
                                                    
                                                
                                                    
                                                    
                                                        Removing Conditional Checks
                                                    
                                                    
                                                
                                                    
                                                    
                                                        Registerization and Avoiding Register Spills
                                                    
                                                    
                                                
                                                    
                                                    
                                                        Small Register Mode vs. Large Register Mode
                                                    
                                                    
                                                
                                                    
                                                    
                                                        Shared Local Memory
                                                    
                                                    
                                                
                                                    
                                                    
                                                        Pointer Aliasing and the Restrict Directive
                                                    
                                                    
                                                
                                                    
                                                        Synchronization among Threads in a Kernel
                                                    
                                                    
                                                    
                                                
                                                    
                                                    
                                                        Considerations for Selecting Work-Group Size
                                                    
                                                    
                                                
                                                    
                                                    
                                                        Reduction
                                                    
                                                    
                                                
                                                    
                                                    
                                                        Kernel Launch
                                                    
                                                    
                                                
                                                    
                                                    
                                                        Executing Multiple Kernels on the Device at the Same Time
                                                    
                                                    
                                                
                                                    
                                                    
                                                        Submitting Kernels to Multiple Queues
                                                    
                                                    
                                                
                                                    
                                                    
                                                        Avoiding Redundant Queue Constructions
                                                    
                                                    
                                                
                                                    
                                                    
                                                        Programming Intel® XMX Using SYCL Joint Matrix Extension
                                                    
                                                    
                                                
                                                    
                                                    
                                                        Doing I/O in the Kernel
                                                    
                                                    
                                                
                                            
                                        General-Purpose Computing on GPU
Traditionally, GPUs are used for creating computer graphics such as images, videos, etc. Due to their large number of execution units for massively parallelism, modern GPUs are also used for computing tasks that are conventionally performed on CPU. This is commonly referred to as General-Purpose Computing on GPU or GPGPU.
Many high performance computing and machine learning applications benefit greatly from GPGPU.
- Execution Model Overview
 - Thread Mapping and GPU Occupancy
 - Kernels
 - Using Libraries for GPU Offload
 - Host/Device Memory, Buffer and USM
 - Host/Device Coordination
 - Using Multiple Heterogeneous Devices
 - Compilation
 - OpenMP Offloading Tuning Guide
 - Multi-GPU, Multi-Stack and Multi-C-Slice Architecture and Programming
 - Level Zero
 - Performance Profiling and Analysis
 - Configuring GPU Device