Pull Command
We are currently updating this container.
Description
Optimized Analytics Package for Spark* Platform (OAP for Spark* Platform) is a project to optimize Apache Spark* in various aspects including cache, shuffle, execution engine, MLlib and so on. Currently, OAP for Spark* Platform includes the following optimizations:
- SQL Data Source Cache: Optimize Spark* SQL Data Source using PMem as input data cache.
- RDD Cache PMem Extension: Optimize Spark* RDD Cache using PMem.
- Shuffle Remote PMem Extension: Optimize Spark* shuffle using remote PMem and RDMA.
- Remote Shuffle: Shuffle implementation for writing shuffle to HDFS Filesystem compatible remote storage.
- Unified Arrow Data Source and Native SQL Engine: Optimize SQL execuiton engine using vectorization, native, and columnar data.
- OAP MLlib: Optimized implementation of part of MLlib agorithms.
Documentation and Sources
Get Started
Docker* Repository
Main GiHub* Repository
Readme
Release Notes
Get Started Guide
Code Sources
Dockerfile
Report Issue
License Agreement
LEGAL NOTICE: By accessing, downloading or using this software and any required dependent software (the “Software Package”), you agree to the terms and conditions of the software license agreements for the Software Package, which may also include notices, disclaimers, or license terms for third party software included with the Software Package. Please refer to the license file for additional details.