Hardware and Software for Edge Inference Solutions

Considerations for Edge Inference Solutions

When building an edge inference platform, consider the following factors, especially to support IoT services that need low latency, local data processing, and local data storage:

Operating systems that support real-time applications and management techniques such as support for containers.
Ability to run image processing through an accelerator.
Compute and storage resource optimization.
Communications frameworks to allow edge systems to co-operate on control and data processing.

Hardware and Software Requirements

Edge requirements are driving more compact, powerful converged solutions.

Hardware and Software for Edge Inference Solutions
Hardware	Processors and accelerators Micro data centers Advanced SoCs powerful enough to run full-fledged operating system and complex algorithms
Software	Converged systems: compute, storage, network Complete edge software stack with analytics and machine learning libraries

Intel offers silicon platforms as well as tailored purpose-built platforms specifically for edge AI. The general purpose CPUs include Intel® Xeon processor, Intel® Core processor, and Intel® Atom processor.

New second generation Intel® Xeon scalable processors have a built-in accelerator that can replace almost any kind of specific AI accelerator for any kind of AI algorithm including video analytics, speech, and natural language.
New Intel® 11th generation core processors have DL boost and an embedded XE GPU architecture that gives extra optimization for AI in a low cost, general purpose platform expected with the core CPU.
Intel® Movidius™ VPUs offer great performance efficiency at an optimal cost.

Intel also utilizes OpenVINO™, a tool suite that allows the optimization and deployment of AI with much better performance. Using the OpenVINO™ toolkit to optimize your inference solution, you can upgrade your products and have new partner offerings, without changing your hardware. The example below shows the software hierarchy for Gen 3 Intel® Movidius™ VPU.

Edge Inference Solutions in the Market

There are several edge inference solutions in the market:

Gen 3 Intel® Movidius™ VPU
Intel® Movidius™ Myriad™ X VPU
Nvidia* Jetson TX2
Nvidia* Jetson NX Xavier
Huawei* Atlas 200 Ascend 301

Deep Learning Inference Performance and Compute Efficiency for Different Products
Products	Usage	ResNet-50 Performance	Performance per Watt	Gen 3 Intel® Movidius™ VPU Efficiency is:
Gen 3 Intel® Movidius™ VPU (SKU 3400VE)	IP Camera, AI Appliance	406 inferences/second	139 inferences/second/watt	–
Nvidia* Xavier NX	IP Camera, AI Appliance	344 inferences/second	69 inferences/second/watt	2.0x vs. Nvidia* Xavier NX
Nvidia* Jetson Nano	AI Appliance	20.3 inferences/second	5.1 inferences/second/watt	27x vs. Nvidia* Jetson Nano
HiSilicon Ascend 301	AI Appliance	319 inferences/second	40 inferences/second/watt	3.5x vs. HiSilicon Ascend 310

Note: Intel Performance results are based on testing as of 31-Oct-2019 and may not reflect all publicly available updates. No product or component can be absolutely secure. Intel Configuration: DL inference performance on ResNet-50 benchmark measured using INT8, batch size = 1, employing Gen 3 Intel® Movidius™ VPU’s native optimizations. ResNet-50 performance shown reflects low-level optimizations for max performance capability measured as of 31-Oct-2019, with pre-production silicon and tools. Measurement using single ResNet-50 network as standalone workload. ResNet-50 model trained using weight sparsity at 50%. Indicated max performance benchmark expected to change, and customer results may vary based on forthcoming tools releases. Power efficiency (inferences/sec/W) measured as of 31-Oct-2019 for Gen 3 Intel® Movidius™ 3400VE SKU. All performance and power efficiency measurements may be updated with further changes to software tools. Competitor performance shown is measured performance for ResNet-50 (using INT8, Batch Size=1); power efficiency calculated as peak performance divided by power. HiSilicon measured as of 29-Aug-2019 and Nvidia measured as of 20-Aug-2020. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. See below for more configuration details.

Configuration Details for Different Products
Product	Gen 3 Intel® Movidius™ VPU	Nvidia* Xavier NX	Nvidia* Jetson Nano	HiSilicon Ascend 301
Tested	10/31/2019	8/20/2020	8/20/2020	8/29/2019
Precision	INT8	INT8	INT8	INT8
Batch Size	1	1	1	1
Product Type	Gen 3 Intel® Movidius™ VPU (pre-production)	Jetson Xavier NX developer kit	Jetson Nano developer kit	Atlas 200 developer kit
Memory (GB)	65536 MB DDR-4-2666	8	8	15.5
Processor	N/A	6-core NVIDIA Carmel ARM v8.2	4-core ARM A57	ARM* A53 x 8
Graphics	N/A	NVIDIA Volta architecture with 384 NVIDIA CUDA cores and 48 Tensor cores	128-core Maxwell	N/A
OS	N/A (CPU core not used)	Ubuntu 18.04 LTS (64-bit)	Ubuntu 18.04 LTS (64-bit)	Ubuntu 16.04
Hard Disk	N/A	N/A	N/A	N/A
Software	Performance demo firmware	JetPack: 4.4	JetPack: 4.4	MindSpore Studio, DDK B883
Listed TDP	N/A	15W	10W	20W

Why Intel?

Intel uniquely offers OEMs a vendor that can supply silicon platforms end to end (camera/gateway/cloud/client) with many common software tools, memory, networking components, FPGAs, OSs, security and computer vision software components. Intel is delivering the Gen 3 Intel® Movidius™ VPU, which is a purpose-built, integrated SoC providing unparalleled computer vision performance efficiency with a flexible architecture for workload partitioning and optimization.
- Disruptive in-camera computer vision technology (both traditional Video Analytics and emerging Neural Net inferencing), enabling new use cases, like intelligent traffic solutions, retail analytics, digital safety and security, industrial automation, and VR/AR.
- VPUs: World-leading CV+DL performance / watt as pre- or co-processor to AP in cameras, and as offload engine in NVRs/servers (up to many in arrays – with better performance/watt and performance/cost than most competitors).
- SoCs: Performance, power and cost optimized for leading edge CV+DL performance per watt per dollar and high-volume smart DSS cameras.
- A scalable VPU-based architecture - Smart Camera SoCs, plus VPU co/pre-processors in IA gateways and cloud servers - allows for the ability to develop distributed computer vision and media workloads across optimized systems, camera to cloud.
- Cross-platform, cross-generation APIs which allow customers & ecosystem partners to consolidate software development focus, saving money and allowing them to focus on higher levels of application software value.
Solution ecosystem of external partners that help developers to find or create missing parts of the overall solution such as board vendors, software providers for the cloud, and storage providers.
Community support: Multiple forums that developers can share and learn from others.
Robust software development tools and support.
Uncompromised performance with hardened security.

Intel® Tools

Tools	Description	What you can do?
Intel® DevCloud for the Edge	A development sandbox from the cloud that lets you experience, develop, and test workload samples on Intel's latest hardware, without purchasing any hardware or development kits from Intel.	Run inference applications on multiple Intel hardware platforms for performance comparison. Run your own codes on Intel platforms without any installation. It also allows you to optimize your application to get the best performance on Intel platforms. Learn about DL Streamer concept and create a pipeline. Benchmark pipeline performance
OpenVINO™ Deep Learning Workbench	Tune and profile your AI models to run on Intel platforms.	Convert your model into OpenVINO™ format Optimize models by quantizing to INT-8 or high-compute algorithms for accelerated performance. Create deployment packages with tuned model and OpenVINO™ runtime components for integration into your AI applications.
OpenVINO™ Toolkit	Toolkit to run inference using pre-trained models for specific use cases. Requires installation on Intel platforms with 6th to 8th generation of processors.	Allows you to choose a pre-trained model from Open Model Zoo. Use the Model Optimizer to performs optimizations to remove excess layers or group operations when possible into simpler, faster graphs. You can test, tune, and benchmark your inference models using the Deep Learning Workbench. Create a deployment package with model, IR files, application and associated dependencies into runtime package for your target device using the Deployment Manager.
Intel® Software Hub	Provides a list of pre-validated reference applications to deploy on your hardware. These sample applications run inference using the pre-trained models from the OpenVINO™ toolkit.	Allows you to install and run simulations on the containerized reference applications. Provide the basic components required to build a specific use case application. Provide optional components to build a complete solution for your use case. Manages your software containers and nodes deployment.
Developer kits and ready-to-use hardware	Provides a list of validated development kits and ready-to-use platforms you can purchase. Built on pre-validated and certified Intel® architecture. Includes an integrated software stack with an operating system, drivers, tools, libraries, and samples.	Kick start your targeted application development with a superior out-of-the-box experience. Allows you to get up and running with your hardware and application deployment quickly and smoothly, saving you valuable time-to-market.

Gen 3 Intel® Movidius™ VPU (coming soon...)

Gen 3 Intel® Movidius™ VPU is the latest generation of Intel® Movidius™ VPU, a compute-efficient SoC with the following advantages:

More than 10 times inference performance compared with previous generation Intel® Movidius™ Myriad™ X VPU.
Focused on Deep Learning Inference and supported by the OpenVINO™ toolkit.
Provides high performance per watt per dollar.
Has optimized hardware Codec with acceleration for computer vision (CV) and deep learning (DL) as one-chip solution.
Delivers flexible architecture with the new Neural Compute engine.

Gen 3 Intel® Movidius™ VPU supports both accelerator and standalone use.

Gen 3 Intel® Movidius™ VPU Features
Features	3400VE	3400VE	3700VE
Summary	Edge AI processor (Accelerator mode)	Smart camera SoC (Camera mode)	Performance optimized, Edge AI processor
Process VPU Clock Frequency	12 nm TSMC 500 MHz (Nominal)	12 nm TSMC 500 MHz (Nominal)	12 nm TSMC 700 MHz (Nominal)
ResNet-50 Performance; Max TOPS (AI Inference)	406 inference/sec 5.1 TOPS	240 inference/sec 3.0 TOPS	565 inference/sec 7.1 TOPS
Computer Vision Support	CV/Warp Acceleration 1.0 GP/s	CV/Warp Acceleration 1.0 GP/s	CV/Warp Acceleration 1.4 GP/s
Video Codec	4K75 (encode) 4K60 (decode); Decode: 10 channels of 1080 30 fps	4K75 (encode) 4K60 (decode); Decode: 10 channels of 1080 30 fps	4K75 (encode) 4K60 (decode); Decode: 10 channels of 1080 30 fps
ISP	–	Up to 4 cameras 500 MP/s HDR, TNF	–
SHAVE (Processors included)	16	12	16

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in