Local Memory Usage
One typical GPU-targeted optimization uses local memory for caching of intermediate results. For CPU, all OpenCL™ memory objects are cached by hardware, so explicit caching by use of local memory just introduces unnecessary (moderate) overhead.