Letter from the Editor

Let’s Talk about High-Performance Data Analytics

October 2021

Recent issues of The Parallel Universe have emphasized oneAPI; namely, DPC++ and the component libraries like oneMKL. This issue focuses on data science; in particular, training machine learning and deep learning models. Our feature article, Getting Started with Habana Gaudi for Deep Learning Training, describes the Gaudi HPU (Habana Processor Unit) architecture and shows you how to use it. Speeding Up the Databricks Runtime for Machine Learning discusses Intel optimizations for doing artificial intelligence in the cloud. A Novel Scale-Out Training Solution for Deep Learning Recommender Systems presents the results of a recent collaboration with Facebook to improve the scalability of training. Finally, Cost Matters: On the Importance of Cost-Aware Hyperparameter Optimization presents the results of a recent collaboration with Facebook and Amazon to improve hyperparameter tuning.

From there, we look at another important part of the end-to-end data analytics pipeline: graph analytics. Intel has a long history in graph processing research and has active collaborations with many of the top practitioners, e.g.: the GraphBLAS specification, the LDBC Graphalytics benchmark, comprehensive graph analytics analyses, and the PIUMA architecture for efficient and scalable graph analysis. Data scientists have a great package, NetworkX, for graph and network analysis, but it’s not known for performance. Fortunately, our friends at Katana Graph just released a high-performance, parallel graph analytics library for Python programmers. Katana’s High-Performance Graph Analytics Library offers an alternative for compute-intensive operations on extremely large graphs.

The R programming language is popular with data scientists and statisticians, but like NetworkX, it’s not known for performance. Accelerate R Code with Intel® oneAPI Math Kernel Library shows you how to improve the performance simply by linking the R programming environment to oneMKL. No code changes are required.

We close this issue with a follow-up to a previous article on vectorization: Optimization of Scan Operations Using Explicit VectorizationOptimizing the Maxloc Operation Using AVX-512 Vector Instructions is another how-to guide to using vector intrinsics to accelerate common kernels; in this case, the maxloc reduction.

As always, don’t forget to check out Tech.Decoded for more information on Intel solutions for code modernization, visual computing, data center and cloud computing, data science, systems and IoT development, and heterogeneous parallel programming with oneAPI.

Henry A. Gabb October 2021



PUM Issue 45

PUM Issue 44

Launch Accelerated, Cross-platform AI Workloads in One Step

Optimize Data Science & Machine Learning Pipelines

Intel® oneAPAI Analytics Toolkit
Accelerate end-to-end machine learning and data science pipelines with optimized deep learning frameworks and high-performing Python* libraries.

Get It Now

See All Tools