Critical Considerations for AI Deployments
Look Beyond the GPU
It is anticipated that the AI industry will grow to tens of billions of dollars by the mid-2020s, with most of the growth in AI inference.1 Intel Xeon Scalable processors represent approximately 70% of the processor units that are running AI inference workloads in the data center.2
GPUs are effective for training workloads but aren’t required for all of the different stages of AI. In a 2021 study, 56% of study respondents cited cost as their most significant challenge to implementing AI/ML solutions.3
While GPU solutions are impactful for training, AI deployment requires more.
When evaluating AI solutions, make sure you have all the right details to inform your decision
Platform
With proprietary platforms, your organization, enterprise architecture, data infrastructure, middleware, and application stacks all have to be rearchitected with expert teams to maximize value from CAPEX and OPEX intensive resources.
Flexibility
Intel-based solutions provide the most workload flexibility for your organization, including AI. These solutions adapt to your existing organization, deployment context, enterprise architectures, data infrastructures, middleware and application stacks so rearchitecting isn’t required.
Support
Intel offers the most robust toolkit to support your AI needs. Here are the most important things to keep in mind when considering the implementation of AI solutions across the five stages of AI execution.
Critical Considerations for AI
1. Intel Accelerates Data Science
- Data science workflows require highly interactive systems to handle massive volumes of data in memory using algorithms and tools designed for single-node processing -GPUs are generally a poor fit for these tasks.
- Intel platforms with Intel® Optane™ persistent memory (PMem) offer large memory for data science workflows. PMem can make it possible to load larger datasets into memory without falling back to disk.
- Data preprocessing today is done on a CPU and many practitioners spend a significant amount of their time using the highly popular Pandas library.
Intel’s Distribution of Modin open-source library accelerates Pandas applications4
Up to
90X
Up to
90X
2. Intel® Xeon® Scalable Enables Effective AI Data Preprocessing
Data infrastructure is already optimized for Intel and effective ingest.
The result is a completely optimized pipeline scaling from PC and workstation to cloud to edge - customers can scale AI everywhere by leveraging the broad, open software ecosystem and unique Intel tools.
If you are accessing and processing data, then storage and memory are critical - take advantage of a faster Intel storage subsystem that doesn’t require use of a GPU.
3. Intel Provides Scalability for AI Training
Habana Gaudi provides customers with cost-efficient AI training, ease of use, and system scalability –integration of the Gaudi platform eliminates storage bottlenecks and optimizes utilization of AI compute capacity.5
The Habana® Gaudi® AI Training Processor powers Amazon EC2 DL1 instances delivering
Up to
40%
better price performance than comparable Nvidia GPU-based training instances, according to AWS testing6 - this processor is also available for on-premises implementation with the Supermicro X12 Gaudi AI Training Server.
Up to
40%
better price performance than comparable Nvidia GPU-based training instances, according to AWS testing6 - this processor is also available for on-premises implementation with the Supermicro X12 Gaudi AI Training Server.
Time-to-Train on Intel Xeon
Existing Intel Xeon Scalable processors scale well with intermittent training sets during off-peak cycles overnight or on weekends.
The upcoming launch of Next Gen Intel Xeon Scalable processors (code named Sapphire Rapids) with AMX and BrainFloat16 will deliver even higher performance and scalability.7
4. Intel Xeon Scalable Boosts Machine Learning Performance
Elevate effectiveness of machine learning workloads through the performance of Intel hardware.
New built-in acceleration capabilities in 3rd Generation Intel Xeon Scalable processors deliver up to
1.5X
greater AI performance than other CPUs across 20 key customer workloads, the majority of which are machine learning workloads.8
The AI accelerators built into Xeon Scalable processors provide
10 to 100X
performance improvements9 for AI frameworks and libraries like Spark for data processing, TensorFlow, PyTorch, Scikit-learn, NumPy, and XGBoost.
New built-in acceleration capabilities in 3rd Generation Intel Xeon Scalable processors deliver up to
1.5X
greater AI performance than other CPUs across 20 key customer workloads, the majority of which are machine learning workloads.8
The AI accelerators built into Xeon Scalable processors provide
10 to 100X
performance improvements9 for AI frameworks and libraries like Spark for data processing, TensorFlow, PyTorch, Scikit-learn, NumPy, and XGBoost.
You don’t need to break the bank for effective graph analytics – a single Xeon-based server with sufficient memory is a much better choice for large-scale, general-purpose graph analytics.
Get faster analytics insights up to
2X
faster graph analytics computations (Katana Graph) for recommender systems and fraud detection.
Get fraud detection up to
2X
faster on average when using 3rd Gen Xeon Scalable Processors with Intel Optane persistent memory 200 series.10
Get faster analytics insights up to
2X
faster graph analytics computations (Katana Graph) for recommender systems and fraud detection.
Get fraud detection up to
2X
faster on average when using 3rd Gen Xeon Scalable Processors with Intel Optane persistent memory 200 series.10
5. For Inference, Xeon Scalable Processors are the Go To Solution
AI deployment is about inference, and Intel is the most globally trusted hardware for inference! Intel Xeon Scalable processors represent approximately 70% of the processor units that are running AI inference workloads in the data center.2
The performance capabilities of Intel hardware can drive the inferencing success your business operation relies on. Here's why:
- Intel Xeon Scalable is the only x86 data center CPU with built-in AI acceleration. Utilize Xeon Scalable for more cost-effective inferencing rather than leveraging new NVIDIA hardware that will add deployment and recurring cost.
- 30% higher average AI Performance across 20 workloads with 3rd Gen Intel Xeon Scalable processor supporting Intel® DL Boost vs Nvidia A100 GPU11 (Geomean of 20 workloads) without adding the cost and complexity of a GPU.
Dual socket servers with Next Gen Intel Xeon Scalable Processors (code-named Sapphire Rapids) can infer over 24k images/second compared with 16k on a Nvidia A30 GPU12
This means Intel can deliver better than
1.5X
the performance of Nvidia's mainstream inferencing GPU for 2022,12 strengthening the recommendation to standardize on Xeon processors– and the next generation will provide even greater performance.
This means Intel can deliver better than
1.5X
the performance of Nvidia's mainstream inferencing GPU for 2022,12 strengthening the recommendation to standardize on Xeon processors– and the next generation will provide even greater performance.
6. Intel Delivers End-to-End AI Performance
Optimize your workload for the Intel Xeon Scalable processors you already have installed to get better end-to-end performance without introducing delays or burden.
- Leverage the Intel-based technologies you know.
- Complexity of non-Intel integration challenges will result in extended latencies.
End-to-End Document Level Sentiment Analysis (DSLA)
Lower is Better.13 Preprocessing can dominate time to solution, and the GPU is typically idle.
17X
lower cost on Intel Xeon processor systems, without the added GPU complexity.14
7. Intel Open-Source Software Avoids Lock-in
Write once, use anywhere with open-source software. DL/ML framework users can reap all performance and productivity benefits through drop-in acceleration without the need to learn new APIs or low-level foundational libraries as many AI frameworks are already running on Intel.
Maintain flexibility with oneAPI and OpenVINO.
- Intel’s end-to-end portfolio of AI tools and framework optimizations for customers is built on the foundation of the open, standards-based, unified oneAPI programming model and constituent libraries.
- Utilizing OpenVINO allows developers to write once and deploy anywhere with tools designed to optimize and deploy DL inference models.
- Intel oneDNN library is being adopted widely – oneDNN provides the building blocks for deep learning applications with very fast performance across x86_64 processors, and provides a wider breadth of performance optimizations for developers.
CUDA-based tools restrict developer choice and lock-in any models created to that platform.
- CUDA created a closed ecosystem that is challenging to grow beyond.
- It creates difficult porting without re-coding or CUDA engineer support.
Along with developing Intel-optimized distributions for leading AI frameworks, Intel also up-streams optimizations into the main versions of these frameworks, delivering increased performance and productivity to your AI applications even when using default versions of these frameworks.
Faster performance vs. prior generation: TensorFlow Intel optimization
Up to
11X
higher batch AI inference performance on ResNet50 with 3rd Gen Intel Xeon Scalable processor.15
Up to
11X
higher batch AI inference performance on ResNet50 with 3rd Gen Intel Xeon Scalable processor.15
8. Intel’s Extensive AI Portfolio
AI is a complex and varied ecosystem. Intel provides a product portfolio of performance hardware and open-source software to achieve evolving AI needs with maximum performance and cost efficiency for any workload. Explore Intel's AI portfolio.
- Intel offers the broadest AI portfolio for customers, including CPUs, FPGAs, VPUs, ASICs, forthcoming discrete GPUs and more, allowing us to position the right HW for any customer use case.
Xeon Advantages
No matter the AI deployment type, the Intel portfolio provides the hardware and software capabilities you need for success.
Product and Performance Information
Based on Intel market modeling of the worldwide installed base of data center servers running AI Inference workloads as of December 2021.
https://techdecoded.intel.io/resources/one-line-code-changes-to-boost-pandas-scikit-learn-and-tensorflow-performance/#gs.bzkn2n for workloads and configurations. Results may vary.
MLPerf results for Training v1.0 published on June 30, 2021. See https://mlcommons.org/en/training-normal-10/
See [43] at www.intel.com/3gen-xeon-config
See claim 4 at https://edc.intel.com/content/www/us/en/products/performance/benchmarks/intel-optane-persistent-memory-200-series/ for workloads and configurations. Results may vary.
See [44] at https://www.intel.com/3gen-xeon-config
See Key100 Sandra Rivera AITI001 Pradeep Dubey Slide 37 at https://edc.intel.com/content/www/us/en/products/performance/benchmarks/innovation-event-claims/
See www.intel.com/InnovationEventClaims, AI001, Meena Arunachalam, #25, for workloads and configurations.
See [100] at https://www.intel.com/3gen-xeon-config