Legal Information
                            
                        
                            
                            
                                Getting Help and Support
                            
                        
                            
                                Introduction
                            
                            
                        
                            
                                Coding for the Intel® Processor Graphics
                            
                            
                        
                            
                                Platform-Level Considerations
                            
                            
                        
                            
                                Application-Level Optimizations
                            
                            
                        
                            
                                Optimizing OpenCL™ Usage with Intel® Processor Graphics
                            
                            
                        
                            
                                Check-list for OpenCL™ Optimizations
                            
                            
                        
                            
                                Performance Debugging
                            
                            
                        
                            
                                Using Multiple OpenCL™ Devices
                            
                            
                        
                            
                                Coding for the Intel® CPU OpenCL™ Device
                            
                            
                        
                            
                                OpenCL™ Kernel Development for Intel® CPU OpenCL™ device
                            
                            
                        
                    
                
                                    
                                    
                                        
                                        
                                            Mapping Memory Objects
                                        
                                        
                                    
                                        
                                        
                                            Using Buffers and Images Appropriately
                                        
                                        
                                    
                                        
                                        
                                            Using Floating Point for Calculations
                                        
                                        
                                    
                                        
                                        
                                            Using Compiler Options for Optimizations
                                        
                                        
                                    
                                        
                                        
                                            Using Built-In Functions
                                        
                                        
                                    
                                        
                                        
                                            Loading and Storing Data in Greatest Chunks
                                        
                                        
                                    
                                        
                                        
                                            Applying Shared Local Memory
                                        
                                        
                                    
                                        
                                        
                                            Using Specialization in Branching
                                        
                                        
                                    
                                        
                                        
                                            Considering native_ and half_ Versions of Math Built-Ins
                                        
                                        
                                    
                                        
                                        
                                            Using the Restrict Qualifier for Kernel Arguments
                                        
                                        
                                    
                                        
                                        
                                            Avoiding Handling Edge Conditions in Kernels
                                        
                                        
                                    
                                
                            
                                    
                                    
                                        
                                        
                                            Using Shared Context for Multiple OpenCL™ Devices
                                        
                                        
                                    
                                        
                                        
                                            Sharing Resources Efficiently
                                        
                                        
                                    
                                        
                                        
                                            Synchronization Caveats
                                        
                                        
                                    
                                        
                                        
                                            Writing to a Shared Resource
                                        
                                        
                                    
                                        
                                        
                                            Partitioning the Work
                                        
                                        
                                    
                                        
                                        
                                            Keeping Kernel Sources the Same
                                        
                                        
                                    
                                        
                                        
                                            Basic Frequency Considerations
                                        
                                        
                                    
                                        
                                        
                                            Eliminating Device Starvation
                                        
                                        
                                    
                                        
                                        
                                            Limitations of Shared Context with Respect to Extensions
                                        
                                        
                                    
                                
                            
                                    
                                    
                                        
                                        
                                            Why Optimizing Kernel Code Is Important?
                                        
                                        
                                    
                                        
                                        
                                            Avoid Spurious Operations in Kernel Code
                                        
                                        
                                    
                                        
                                        
                                            Perform Initialization in a Separate Task
                                        
                                        
                                    
                                        
                                        
                                            Use Preprocessor for Constants
                                        
                                        
                                    
                                        
                                        
                                            Use Signed Integer Data Types
                                        
                                        
                                    
                                        
                                        
                                            Use Row-Wise Data Accesses
                                        
                                        
                                    
                                        
                                        
                                            Tips for Auto-Vectorization
                                        
                                        
                                    
                                        
                                        
                                            Local Memory Usage
                                        
                                        
                                    
                                        
                                        
                                            Avoid Extracting Vector Components
                                        
                                        
                                    
                                        
                                        
                                            Task-Parallel Programming Model Hints
                                        
                                        
                                    
                                
                            Using the Restrict Qualifier for Kernel Arguments
Consider using the restrict (defined by the C99) type qualifier for kernel arguments (pointers) in the kernel signature. The qualifier declares that pointers do not alias each other, which helps the compiler limit the effects of pointer aliasing, while aiding the caching optimizations.
__kernel void foo( __constant float* restrict a,
                          __constant float* restrict b,
                          __global float* restrict result) 
  
     NOTE: 
   
 You can use the restrict qualifier only with kernel arguments. In the specific example above, it enables the compiler to assume that pointers a, b, and result do point to the different locations. So you must ensure that the pointers do not point to overlapping memory regions.