Accelerate Innovations of Unified Data Analytics and AI at Scale

Published: 06/13/2019  

Last Updated: 06/13/2019

By Jinquan Dai

To realize the full potential of Artificial Intelligence, we’re seeing increasing demand to move AI from experimentation to production. To achieve that, it is critical for developers to efficiently apply new deep learning technologies (computer vision, natural language processing, neural recommendations, generative adversarial networks, etc.) to production data analysis pipelines.

Today, deep learning and data analytics workloads typically run on separate software and hardware infrastructures, and developers are struggling to write ad hoc glue codes to manually “stitch together” separate components, such as TensorFlow*, Keras*, PyTorch*, Caffe*, Apache Hadoop*, Apache Hive*, and Apache Spark*.

Unfortunately, this fragmented workflow can lead to issues such as low developer productivity, complex infrastructure management, poor data governance, inefficient scaling, and many more.

To address this challenge, Intel has developed new open source software technologies that unify data analytics and AI an integrated workflow: BigDL, a distributed deep learning framework for Apache Spark; and Analytics Zoo, a unified analytics plus AI platform for distributed TensorFlow, Keras and BigDL on Apache Spark. With these technologies, developers write their programs with TensorFlow, Keras, or Spark as an end-to-end, integrated data analytics and AI pipeline, which can transparently scale out to the production data platform in a distributed fashion.

The Need for End-to-End, Integrated Data Analytics and AI Solutions

The life of an AI application usually begins with the prototyping using sample data on the developer’s laptop. Once the prototype is set up, the developer will try to experiment with historical data (e.g., prior three months), typically stored in some production data system (such as Hive*) and processed in a distributed cluster architecture.

Once the developer is satisfied with the experimentation, they deploy the solution to production (for A/B testing), which usually needs to be integrated within production big data pipelines. As shown below, moving AI from experimentation to production tis a huge and complex undertaking today, and it can often lead to code rewrites, data duplication, fragmented workflow, and poor scalability in the real world.

Prototype to deployment pipeline

To make it easy to build and productize end-to-end AI applications, a unified data analytics and AI platform like Analytics Zoo (see the following figure) is required. With Analytics Zoo, developers can easily prototype end-to-end pipelines on their laptops, then directly move their prototypes to run on the distributed cluster architecture, process production data in a scalable fashion, and seamlessly deploy on production data pipelines, all with almost zero code changes.


Early users of Analytics Zoo including Midea*, Yunda*, BaoSight*, Microsoft Azure* and CERN*, have already built end-to-end AI applications, including computer vision based product inspection, unsupervised time series anomaly detection, NLP based customer service chatbot, and particle classifier for high energy physics.

To further accelerate innovation and deployment for integrated data analytics and AI scale, I am happy to share that Intel is establishing a Data Analytics and AI Innovation Center based in China. This virtual development and collaboration center will connect Intel analytics and AI experts with technology partners and customers to help them develop, optimize, and scale new use cases and solutions across various vertical industry segments. The center will also provide access to the latest Intel data-centric advancements in hardware platform technologies as well as the related optimized libraries, software, and tools.

For more information, please visit the Analytics Zoo project on GitHub* .

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at