How to Choose Hardware and Software for Edge Inference Solutions

Published: 12/28/2020

Considerations for Edge Inference Solutions

When building an edge inference platform, consider the following factors, especially to support IoT services that need low latency, local data processing, and local data storage:

  • Operating systems that support real-time applications and management techniques such as support for containers.
  • Ability to run image processing through an accelerator.
  • Compute and storage resource optimization.
  • Communications frameworks to allow edge systems to co-operate on control and data processing.

Hardware and Software Requirements

Edge requirements are driving more compact, powerful converged solutions. 

Hardware and Software for Edge Inference Solutions
Hardware
  • Processors and accelerators
  • Micro data centers
  • Advanced SoCs powerful enough to run full-fledged operating system and complex algorithms
Software
  • Converged systems: compute, storage, network
  • Complete edge software stack with analytics and machine learning libraries

Intel offers silicon platforms as well as tailored purpose-built platforms specifically for edge AI. The general purpose CPUs include Intel® Xeon processor, Intel® Core processor, and Intel® Atom processor. 

  • New second generation Intel® Xeon scalable processors have a built-in accelerator that can replace almost any kind of specific AI accelerator for any kind of AI algorithm including video analytics, speech, and natural language.
  • New Intel® 11th generation core processors have DL boost and an embedded XE GPU architecture that gives extra optimization for AI in a low cost, general purpose platform expected with the core CPU.
  • Intel® Movidius™ VPUs offer great performance efficiency at an optimal cost.

  • Intel also utilizes OpenVINO™, a tool suite that allows the optimization and deployment of  AI with much better performance. Using the OpenVINO™ toolkit to optimize your inference solution, you can upgrade your products and have new partner offerings, without changing your hardware. The example below shows the software hierarchy for Gen 3 Intel® Movidius™ VPU.

Edge Inference Solutions in the Market

There are several edge inference solutions in the market:

  • Gen 3 Intel® Movidius™ VPU
  • Intel® Movidius™ Myriad™ X VPU
  • Nvidia* Jetson TX2
  • Nvidia* Jetson NX Xavier
  • Huawei* Atlas 200 Ascend 301
Deep Learning Inference Performance and Compute Efficiency for Different Products
Products Usage ResNet-50 Performance Performance per Watt Gen 3 Intel®  Movidius™ VPU Efficiency is:
Gen 3 Intel® Movidius™ VPU
(SKU 3400VE)
IP Camera, AI Appliance 406 inferences/second 139 inferences/second/watt
Nvidia* Xavier NX IP Camera, AI Appliance 344 inferences/second 69 inferences/second/watt 2.0x vs. Nvidia* Xavier NX
Nvidia* Jetson Nano AI Appliance 20.3 inferences/second 5.1 inferences/second/watt 27x vs. Nvidia* Jetson Nano
HiSilicon Ascend 301 AI Appliance 319 inferences/second 40 inferences/second/watt 3.5x vs. HiSilicon Ascend 310

 

Note: Intel Performance results are based on testing as of 31-Oct-2019 and may not reflect all publicly available updates. No product or component can be absolutely secure.  Intel Configuration:  DL inference performance on ResNet-50 benchmark measured using INT8, batch size = 1, employing Gen 3 Intel® Movidius™ VPU’s native optimizations.  ResNet-50 performance shown reflects low-level optimizations for max performance capability measured as of 31-Oct-2019, with pre-production silicon and tools.  Measurement using single ResNet-50 network as standalone workload.  ResNet-50 model trained using weight sparsity at 50%. Indicated max performance benchmark expected to change, and customer results may vary based on forthcoming tools releases.  Power efficiency (inferences/sec/W) measured as of 31-Oct-2019 for Gen 3 Intel® Movidius™ 3400VE SKU.  All performance and power efficiency measurements may be updated with further changes to software tools. Competitor performance shown is measured performance for ResNet-50 (using INT8, Batch Size=1); power efficiency calculated as peak performance divided by power. HiSilicon measured as of 29-Aug-2019  and Nvidia measured as of 20-Aug-2020. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. See below for more configuration details.

Configuration Details for Different Products
Product Gen 3 Intel® Movidius™ VPU Nvidia* Xavier NX Nvidia* Jetson Nano HiSilicon Ascend 301
Tested  10/31/2019 8/20/2020 8/20/2020 8/29/2019
Precision INT8 INT8 INT8 INT8
Batch Size 1 1 1 1
Product Type Gen 3 Intel® Movidius™ VPU (pre-production) Jetson Xavier NX developer kit Jetson Nano developer kit Atlas 200 developer kit
Memory (GB) 65536 MB DDR-4-2666 8 8 15.5
Processor N/A 6-core NVIDIA Carmel ARM v8.2 4-core ARM A57 ARM* A53 x 8
Graphics N/A NVIDIA Volta architecture with 384 NVIDIA CUDA cores and 48 Tensor cores 128-core Maxwell N/A
OS N/A (CPU core not used) Ubuntu 18.04 LTS (64-bit) Ubuntu 18.04 LTS (64-bit) Ubuntu 16.04
Hard Disk N/A N/A N/A N/A
Software Performance demo firmware JetPack: 4.4 JetPack: 4.4 MindSpore Studio, DDK B883
Listed TDP N/A 15W 10W 20W

 

Why Intel?

  • Intel uniquely offers OEMs a vendor that can supply silicon platforms end to end (camera/gateway/cloud/client) with many common software tools, memory, networking components, FPGAs, OSs, security and computer vision software components. Intel is delivering the Gen 3 Intel® Movidius™ VPU, which is a purpose-built, integrated SoC providing unparalleled computer vision performance efficiency with a flexible architecture for workload partitioning and optimization.
    • Disruptive in-camera computer vision technology (both traditional Video Analytics and emerging Neural Net inferencing), enabling new use cases, like intelligent traffic solutions, retail analytics, digital safety and security, industrial automation, and VR/AR. 
    • VPUs: World-leading CV+DL performance / watt as pre- or co-processor to AP in cameras, and as offload engine in NVRs/servers (up to many in arrays – with better performance/watt and performance/cost than most competitors). 
    • SoCs: Performance, power and cost optimized for leading edge CV+DL performance per watt per dollar and high-volume smart DSS cameras.  
    • A scalable VPU-based architecture - Smart Camera SoCs, plus VPU co/pre-processors in IA gateways and cloud servers - allows for the ability to develop distributed computer vision and media workloads across optimized systems, camera to cloud.
    • Cross-platform, cross-generation APIs which allow customers & ecosystem partners to consolidate software development focus, saving money and allowing them to focus on higher levels of application software value.
  • Solution ecosystem of external partners that help developers to find or create missing parts of the overall solution such as board vendors, software providers for the cloud, and storage providers.
  • Community support: Multiple forums that developers can share and learn from others.
  • Robust software development tools and support. 
  • Uncompromised performance with hardened security.

Intel® Tools 

Tools Description What you can do?

Intel® DevCloud for the Edge

  • A development sandbox from the cloud that lets you experience, develop, and test workload samples on Intel's latest hardware, without purchasing any hardware or development kits from Intel. 

 
  • Run inference applications on multiple Intel hardware platforms for performance comparison.
  • Run your own codes on Intel platforms without any installation.
  • It also allows you to optimize your application to get the best performance on Intel platforms.
  • Learn about DL Streamer concept and create a pipeline.
  • Benchmark pipeline performance

OpenVINO™ Deep Learning Workbench

  • Tune and profile your AI models to run on Intel platforms.
  • Convert your model into OpenVINO™ format
  • Optimize models by quantizing to INT-8 or high-compute algorithms for accelerated performance.
  • Create deployment packages with tuned model and OpenVINO™ runtime components for integration into your AI applications.

OpenVINO™ Toolkit

  • Toolkit to run inference using pre-trained models for specific use cases.
  • Requires installation on Intel platforms with 6th to 8th generation of processors.
  • Allows you to choose a pre-trained model from Open Model Zoo.
  • Use the Model Optimizer to performs optimizations to remove excess layers or group operations when possible into simpler, faster graphs.
  • You can test, tune, and benchmark your inference models using the Deep Learning Workbench.
  • Create a deployment package with model, IR files, application and associated dependencies into runtime package for your target device using the Deployment Manager. 

Intel® Software Hub

  • Provides a list of pre-validated reference applications to deploy on your hardware.
  • These sample applications run inference using the pre-trained models from the OpenVINO™ toolkit.
  • Allows you to install and run simulations on the containerized reference applications.
  • Provide the basic components required to build a specific use case application.
  • Provide optional components to build a complete solution for your use case.
  • Manages your software containers and nodes deployment.
Developer kits and ready-to-use hardware
  • Provides a list of validated development kits and ready-to-use platforms you can purchase.
  • Built on pre-validated and certified Intel® architecture.
  • Includes an integrated software stack with an operating system, drivers, tools, libraries, and samples.
  • Kick start your targeted application development with a superior out-of-the-box experience.
  • Allows you to get up and running with your hardware and application deployment quickly and smoothly, saving you valuable time-to-market.

Gen 3 Intel® Movidius™ VPU (coming soon...)

Gen 3 Intel® Movidius™ VPU is the latest generation of Intel® Movidius™ VPU, a compute-efficient SoC with the following advantages:

  • More than 10 times inference performance compared with previous generation Intel® Movidius™ Myriad™ X VPU.
  • Focused on Deep Learning Inference and supported by the OpenVINO™ toolkit.
  • Provides high performance per watt per dollar.
  • Has optimized hardware Codec with acceleration for computer vision (CV) and deep learning (DL) as one-chip solution.
  • Delivers flexible architecture with the new Neural Compute engine.

Gen 3 Intel® Movidius™ VPU supports both accelerator and standalone use. 

Gen 3 Intel® Movidius™ VPU Features
Features 3400VE 3400VE 3700VE
Summary Edge AI processor (Accelerator mode) Smart camera SoC (Camera mode) Performance optimized, Edge AI processor
Process VPU
Clock Frequency

12 nm TSMC
500 MHz (Nominal)

12 nm TSMC
500 MHz (Nominal)

12 nm TSMC
700 MHz (Nominal)

ResNet-50 Performance;
Max TOPS (AI Inference)

406 inference/sec
5.1 TOPS

240 inference/sec
3.0 TOPS

565 inference/sec
7.1 TOPS

Computer Vision Support CV/Warp Acceleration
1.0 GP/s
CV/Warp Acceleration
1.0 GP/s
CV/Warp Acceleration
1.4 GP/s
Video Codec

4K75 (encode) 4K60 (decode);
Decode: 10 channels of 1080 30 fps

4K75 (encode) 4K60 (decode);
Decode: 10 channels of 1080 30 fps

4K75 (encode) 4K60 (decode);
Decode: 10 channels of 1080 30 fps

ISP

Up to 4 cameras
500 MP/s HDR, TNF

SHAVE
(Processors included)
16 12 16

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.