Intel has been developing big data analytics frameworks and libraries built on Software Defined Infrastructure with open standard building blocks. From open enterprise-ready software platforms to analytics building blocks, runtime optimizations, tools, benchmarks, and use cases, Intel® software makes big data and analytics faster, easier, and more insightful. Examples include Apache Hadoop* and Spark* optimized frameworks, Intel® Data Analytics Acceleration Libraries (Intel® DAAL), and BigDL: Distributed Deep Learning on Apache Spark* which runs over Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN).
These Intel frameworks and libraries are being integrated with Intel® FPGA acceleration options. Customers can run unmodified applications which at run time can run on Intel® Xeon® platform, or Intel® FPGA, or other Intel platforms. Intel is also providing FPGA acceleration frameworks with end to end orchestration, virtualization, and security. Intel together with a partner ecosystem is offering unstructured, NoSQL, and relational data store acceleration with multi-function single Intel® FPGAs, which accelerate data streams, networking, data access, and algorithms.
Traditional relational databases can benefit from significant acceleration with inline acceleration and protocol offload of networking, data streaming, and data access. Inline accelerators include compression, filtering, and encryption. The FPGA can also be used for memory access tasks such as cache management or memory mapped access. Indexing/lookups and filtering run very fast as FPGA’s excel at hashing and pattern matching with their flexible datapaths. A strong requirement is that customer's SQL applications and database schemas should run without change.
Intel is developing better compression for Hadoop/Spark reduce or “shuffle” phase with an approach which completely hides the FPGA by integrated to the Intel frameworks. There are three additional opportunities for Spark acceleration - Ingest/Kafka, BigDL, and Machine Learning MLlib.
NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. NoSQL databases are increasingly used in big data and real-time web applications. Motivations for this approach include: Simplicity of design, simpler "horizontal" scaling to clusters of machines (which is a problem for relational databases), and finer control over availability.
As big data or machine learning initiatives graduate from research projects with small data sets and small server clusters to become an integral part of the business, the data sources leveraged by data scientists expand dramatically. The current scaling solutions for such growth, scale-up or scale-out, can be costly and have diminishing returns once they reach a certain size. The data analytics industry has turned to hardware accelerators, such as the Intel® Programmable Acceleration Card (Intel® PAC) and Acceleration Stack, to overcome this challenge for both enterprise and cloud implementations. Data scientists can leverage the acceleration capabilities of the PAC card in pre-qualified servers without needing to know the intricacies of FPGA design. Bigstream’s hyper-acceleration technology automates the process of accelerating the big data platform analytics such as Spark SQL so users can experience up to an order of magnitude performance gain without changing a single line of code in their application.
Bigstream’s software solution combined with the Intel® FPGA technology can significantly increase the computational power to run big data analytics faster and at a lower TCO than traditional approaches.
The Apache Cassandra* NoSQL database is used widely for the data-intensive use cases that are shaping the modern era - from IoT and fraud detection to personalization and financial services. While Cassandra meets many enterprise-class requirements, it reaches limitations when processing transactional and AI applications. rENIAC offers an innovative "intermediary" layer between Cassandra clients and database nodes. Comprised of a Data Engine* and the rENIAC software, the solution brings storage closer to the network through intelligent caching. Running on Intel® architecture and taking advantage of the acceleration possible with Intel® FPGAs, rENIAC delivers outstanding performance, high throughput, and low latency for demanding workloads and applications, simplifying AI adoption.
Data demands on IT continue to increase — from delivering high availability and managing storage to conducting near-real-time analytics. Relational databases and SQL continue to be the backbones for enterprise-class data analytics. Swarm64 offers an innovative add-on to PostgreSQL* that works with most common database and storage applications. It enables IT to handle large amounts of high-velocity data and helps eliminate the risks and costs inherent in introducing new IT systems. Most importantly, Swarm64SDA* is designed to significantly speed up data processing and analytics for demanding workloads. Swarm64SDA supports the Intel® Programmable Acceleration Card (Intel® PAC) and associated Acceleration Stack in pre-qualified OEM servers to deliver industry-leading performance for analytics use cases.
The Swarm64* solution enables seamless cooperation between the CPU and the Intel® FPGA, overcoming the latency increase and bandwidth limitations of storage accessed via the network or in a typical cloud infrastructure. This decouples storage from compute, enabling resource elasticity and an excellent cost-to-performance ratio.