Many-Core Graph Workload Analysis



Graph applications have specific characteristics that are not common in other application domains and therefore require thorough analysis to guide future graph processing hardware design. In this paper, we analyze multiple graph applications on current multi and many-core processors, and provide conclusions and recommendations for future designs. We restate well-known characteristics of graph applications, such as a low compute to memory ratio and irregular memory access patterns, but we also provide new important insights on executing graph applications on many-core processors. Our main novel observations are (i) some memory streams do show locality, while others show no locality, (ii) thread imbalance becomes a major problem with many threads, and (iii) many threads are required to saturate high-bandwidth memories. The first observation calls for a selective memory access policy, where accesses with locality are cached and prefetched, while accesses without locality can remain uncached to save cache capacity, and can fetch only one element from memory instead of a full cache line to save on memory bandwidth. The last two observations are contradicting: more threads are needed, but they are not used efficiently due to thread imbalance. Our recommendation is therefore to thoroughly revise the graph analysis algorithms to provide more scalable parallelism to be able to exploit the potential of many-core architectures with high-bandwidth memory. In addition, providing a few high-performance cores can speed up sections with low parallelism.