A newer version of this document is available. Customers should click here to go to the newest version.
                        
                        
                            
                            
                                Introduction
                            
                        
                            
                            
                                Getting Started
                            
                        
                            
                            
                                Parallelization
                            
                        
                            
                            
                                Intel® Iris® Xe GPU Architecture
                            
                        
                            
                            
                                GPU Execution Model Overview
                            
                        
                            
                            
                                SYCL* Thread Mapping and GPU Occupancy
                            
                        
                            
                                Kernels
                            
                            
                        
                            
                                Using Libraries for GPU Offload
                            
                            
                        
                            
                                Host/Device Memory, Buffer and USM
                            
                            
                        
                            
                                Host/Device Coordination
                            
                            
                        
                            
                            
                                Using Multiple Heterogeneous Devices
                            
                        
                            
                                Compilation
                            
                            
                        
                            
                                Optimizing Media Pipelines
                            
                            
                        
                            
                                OpenMP Offloading Tuning Guide
                            
                            
                        
                            
                                Debugging and Profiling
                            
                            
                        
                            
                            
                                GPU Analysis with Intel® Graphics Performance Analyzers (Intel® GPA)
                            
                        
                            
                            
                                Reference
                            
                        
                            
                            
                                Terms and Conditions
                            
                        
                    
                
                                    
                                    
                                        
                                        
                                            Sub-groups and SIMD Vectorization
                                        
                                        
                                    
                                        
                                        
                                            Removing Conditional Checks
                                        
                                        
                                    
                                        
                                        
                                            Registerization and Avoid Register Spills
                                        
                                        
                                    
                                        
                                        
                                            Shared Local Memory
                                        
                                        
                                    
                                        
                                        
                                            Pointer Aliasing and the Restrict Directive
                                        
                                        
                                    
                                        
                                            Synchronization among Threads in a Kernel
                                        
                                        
                                        
                                    
                                        
                                        
                                            Considerations for Selecting Work-group Size
                                        
                                        
                                    
                                        
                                        
                                            Reduction
                                        
                                        
                                    
                                        
                                        
                                            Kernel Launch
                                        
                                        
                                    
                                        
                                        
                                            Executing Multiple Kernels on the Device at the Same Time
                                        
                                        
                                    
                                        
                                        
                                            Submitting Kernels to Multiple Queues
                                        
                                        
                                    
                                        
                                        
                                            Avoid Redundant Queue Construction
                                        
                                        
                                    
                                
                            Using Libraries for GPU Offload
Several libraries are available with oneAPI toolkits that can simplify the programming process by providing specialized APIs for use in optimized applications. This section provides steps on using the libraries, including code samples, for application accelerations. Detailed information about each library, including the available APIs, is available in the main documentation for the specific library.