memory go through the L3 cache and LLC. In addition,
memory that spill from registers do the same. If multiple OpenCL work-items in the same hardware thread make requests to the same L3 cache line, these requests are collapsed to a single request. This means that the effective
memory bandwidth is determined by the number of the accessed L3 cache lines that are accessed.