Legal Information
                            
                        
                            
                            
                                Getting Help and Support
                            
                        
                            
                                Introduction
                            
                            
                        
                            
                                Coding for the Intel® Processor Graphics
                            
                            
                        
                            
                                Platform-Level Considerations
                            
                            
                        
                            
                                Application-Level Optimizations
                            
                            
                        
                            
                                Optimizing OpenCL™ Usage with Intel® Processor Graphics
                            
                            
                        
                            
                                Check-list for OpenCL™ Optimizations
                            
                            
                        
                            
                                Performance Debugging
                            
                            
                        
                            
                                Using Multiple OpenCL™ Devices
                            
                            
                        
                            
                                Coding for the Intel® CPU OpenCL™ Device
                            
                            
                        
                            
                                OpenCL™ Kernel Development for Intel® CPU OpenCL™ device
                            
                            
                        
                    
                
                                    
                                    
                                        
                                        
                                            Mapping Memory Objects
                                        
                                        
                                    
                                        
                                        
                                            Using Buffers and Images Appropriately
                                        
                                        
                                    
                                        
                                        
                                            Using Floating Point for Calculations
                                        
                                        
                                    
                                        
                                        
                                            Using Compiler Options for Optimizations
                                        
                                        
                                    
                                        
                                        
                                            Using Built-In Functions
                                        
                                        
                                    
                                        
                                        
                                            Loading and Storing Data in Greatest Chunks
                                        
                                        
                                    
                                        
                                        
                                            Applying Shared Local Memory
                                        
                                        
                                    
                                        
                                        
                                            Using Specialization in Branching
                                        
                                        
                                    
                                        
                                        
                                            Considering native_ and half_ Versions of Math Built-Ins
                                        
                                        
                                    
                                        
                                        
                                            Using the Restrict Qualifier for Kernel Arguments
                                        
                                        
                                    
                                        
                                        
                                            Avoiding Handling Edge Conditions in Kernels
                                        
                                        
                                    
                                
                            
                                    
                                    
                                        
                                        
                                            Using Shared Context for Multiple OpenCL™ Devices
                                        
                                        
                                    
                                        
                                        
                                            Sharing Resources Efficiently
                                        
                                        
                                    
                                        
                                        
                                            Synchronization Caveats
                                        
                                        
                                    
                                        
                                        
                                            Writing to a Shared Resource
                                        
                                        
                                    
                                        
                                        
                                            Partitioning the Work
                                        
                                        
                                    
                                        
                                        
                                            Keeping Kernel Sources the Same
                                        
                                        
                                    
                                        
                                        
                                            Basic Frequency Considerations
                                        
                                        
                                    
                                        
                                        
                                            Eliminating Device Starvation
                                        
                                        
                                    
                                        
                                        
                                            Limitations of Shared Context with Respect to Extensions
                                        
                                        
                                    
                                
                            
                                    
                                    
                                        
                                        
                                            Why Optimizing Kernel Code Is Important?
                                        
                                        
                                    
                                        
                                        
                                            Avoid Spurious Operations in Kernel Code
                                        
                                        
                                    
                                        
                                        
                                            Perform Initialization in a Separate Task
                                        
                                        
                                    
                                        
                                        
                                            Use Preprocessor for Constants
                                        
                                        
                                    
                                        
                                        
                                            Use Signed Integer Data Types
                                        
                                        
                                    
                                        
                                        
                                            Use Row-Wise Data Accesses
                                        
                                        
                                    
                                        
                                        
                                            Tips for Auto-Vectorization
                                        
                                        
                                    
                                        
                                        
                                            Local Memory Usage
                                        
                                        
                                    
                                        
                                        
                                            Avoid Extracting Vector Components
                                        
                                        
                                    
                                        
                                        
                                            Task-Parallel Programming Model Hints
                                        
                                        
                                    
                                
                            Note on Intel® Quick Sync Video
Check and adjust the device load when dealing with transcoding pipelines. For example running the Intel® Quick Sync Video encoding might reduce benefits of using the Intel® Graphics for some OpenCL™ frame preprocessing. The reason is that the Intel® Quick Sync Video encoding already loads the Intel® Graphics units quite substantially. In some cases using the CPU device for OpenCL tasks reduces the burden and improves the overall performance. Consider experimenting to find the best solution.