We are currently updating this container.
Optimized Analytics Package for Spark* Platform (OAP for Spark* Platform) is a project to optimize Apache Spark* in various aspects including cache, shuffle, execution engine, MLlib and so on. Currently, OAP for Spark* Platform includes the following optimizations:
- SQL Data Source Cache: Optimize Spark* SQL Data Source using PMem as input data cache.
- RDD Cache PMem Extension: Optimize Spark* RDD Cache using PMem.
- Shuffle Remote PMem Extension: Optimize Spark* shuffle using remote PMem and RDMA.
- Remote Shuffle: Shuffle implementation for writing shuffle to HDFS Filesystem compatible remote storage.
- Unified Arrow Data Source and Native SQL Engine: Optimize SQL execuiton engine using vectorization, native, and columnar data.
- OAP MLlib: Optimized implementation of part of MLlib agorithms.
Documentation and Sources
LEGAL NOTICE: By accessing, downloading or using this software and any required dependent software (the “Software Package”), you agree to the terms and conditions of the software license agreements for the Software Package, which may also include notices, disclaimers, or license terms for third party software included with the Software Package. Please refer to the license file for additional details.
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.