Big data analytics requires high programmer productivity and high performance simultaneously on large-scale clusters. However, current big data analytics frameworks (e.g. Apache Spark) have prohibitive runtime overheads since they are library-based. We introduce a novel auto-parallelizing compiler approach that exploits the characteristics of the data analytics domain such as the map/reduce parallel pattern and is robust, unlike previous auto-parallelization methods...
Related Content
Enabling Factor Analysis on Thousand-Subject Neuroimaging Datasets
The scale of functional magnetic resonance image data is rapidly increasing as large multi-subject datasets are becoming widely available and....
Bridging the Gap between HPC and Big Data....
Apache Spark is a popular framework for data analytics with attractive features such as fault tolerance and interoperability with the....
Integrating Real-Time and Batch Processing in a Polystore
This paper describes a stream processing engine called S-Store and its role in the BigDAWG polystore. Fundamentally, S-Store acts as....
A Case against Tiny Tasks in Iterative Analytics
Big data systems such as Spark are built around the idea of splitting an iterative parallel program into tiny tasks....