A newer version of this document is available. Customers should click here to go to the newest version.
                        
                        
                            
                            
                                Introduction
                            
                        
                            
                            
                                Getting Started
                            
                        
                            
                            
                                Parallelization
                            
                        
                            
                            
                                Intel® Iris® Xe GPU Architecture
                            
                        
                            
                            
                                GPU Execution Model Overview
                            
                        
                            
                            
                                SYCL* Thread Mapping and GPU Occupancy
                            
                        
                            
                                Kernels
                            
                            
                        
                            
                                Using Libraries for GPU Offload
                            
                            
                        
                            
                                Host/Device Memory, Buffer and USM
                            
                            
                        
                            
                                Host/Device Coordination
                            
                            
                        
                            
                            
                                Using Multiple Heterogeneous Devices
                            
                        
                            
                                Compilation
                            
                            
                        
                            
                                Optimizing Media Pipelines
                            
                            
                        
                            
                                OpenMP Offloading Tuning Guide
                            
                            
                        
                            
                                Debugging and Profiling
                            
                            
                        
                            
                            
                                GPU Analysis with Intel® Graphics Performance Analyzers (Intel® GPA)
                            
                        
                            
                            
                                Reference
                            
                        
                            
                            
                                Terms and Conditions
                            
                        
                    
                
                                    
                                    
                                        
                                        
                                            Sub-groups and SIMD Vectorization
                                        
                                        
                                    
                                        
                                        
                                            Removing Conditional Checks
                                        
                                        
                                    
                                        
                                        
                                            Registerization and Avoid Register Spills
                                        
                                        
                                    
                                        
                                        
                                            Shared Local Memory
                                        
                                        
                                    
                                        
                                        
                                            Pointer Aliasing and the Restrict Directive
                                        
                                        
                                    
                                        
                                            Synchronization among Threads in a Kernel
                                        
                                        
                                        
                                    
                                        
                                        
                                            Considerations for Selecting Work-group Size
                                        
                                        
                                    
                                        
                                        
                                            Reduction
                                        
                                        
                                    
                                        
                                        
                                            Kernel Launch
                                        
                                        
                                    
                                        
                                        
                                            Executing Multiple Kernels on the Device at the Same Time
                                        
                                        
                                    
                                        
                                        
                                            Submitting Kernels to Multiple Queues
                                        
                                        
                                    
                                        
                                        
                                            Avoid Redundant Queue Construction
                                        
                                        
                                    
                                
                            OpenMP Offloading Tuning Guide
Intel® LLVM-based C/C++ and Fortran compilers, icx, icpx, and ifx, support OpenMP offloading onto GPUs. When using OpenMP, the programmer inserts device directives in the code to direct the compiler to offload certain parts of the application onto the GPU. Offloading compute-intensive code can yield better performance.
This section covers various topics related to OpenMP offloading, and how to improve the performance of offloaded code.