In the example above, created on the Intel microarchitecture code name Skylake, the
function as one of the biggest hotspots that took much CPU time.
detected that the back-end portion of the pipeline caused the stalls. For the back-end, the
Memory Bound > L1 Bound
issue as a dominant bottleneck. 14.6% of Clockticks used in this function was stalled missing L1 data cache. This means that if you focus on this function hotspot and optimize it, you can potentially gain ~15% speed-up for this function.