• <More on Intel.com

Radix Sort on CPUs, GPUs and Intel® MIC Architectures

We are sorry, This PDF is available in download format only

Executive Summary
Sort is a fundamental kernel used in many database operations. In-memory sorts are now feasible; sort performance is limited by compute flops and main memory bandwidth rather than I/O. In [29], we had earlier presented a competitive analysis of comparison and non-comparison based sorting algorithms on CPUs and GPUs. In this report, we extend this comparison to the Intel Many Integrated Core (MIC) architecture. We evaluate radix sort on Knights Ferry (an implementation of Intel MIC architecture), obtaining a performance gain of 2.2X and 1.7X over the best sort performance on the Intel Core i7 CPU and GTX 280 respectively. We also improve the performance of GPU radix sort by 1.6X over previous results.

Introduction
Sorting is of fundamental importance in databases. Common applications of sorting in database systems include index creation, user-requested sort queries, and operations such as duplicate removal, ranking and merge-join operations. Sorting on large databases has traditionally focused on external sorting algorithms. However, the rapid increase in main memory capacity has made in- memory sorting feasible. In previous work, we evaluated how two main memory sorting algorithms, a radix and a merge sort, performed on CPUs and GPUs.

In this work, we evaluate radix sort on the upcoming Intel MIC architecture, a Larrabee based silicon platform for many-core research and software development. In particular, we implement our radix sort on Knights Ferry (KNF), an implementation of this architecture that has 32-cores each with 512-bit SIMD units. The implementation of merge sort on KNF is work-in-progress. We also improve the GPU radix performance by 1.6X from that reported in using techniques to improve SIMD efficiency and Instruction-Level Parallelism.

Read the full Radix Sort on CPUs, GPUs and Intel® MIC Architectures Report.