Intel® Neural Compute Stick 2 and Half-Precision Floating Point (FP16)

Published: 05/31/2019  

Last Updated: 05/31/2019

With the Intel® Distribution of OpenVINO™ toolkit, the Intel® Neural Compute Stick 2 empowers deep learning developers to profile, tune, and deploy convolutional neural networks (CNNs) on low-power applications that require real-time inferencing. Advance the rapid development of high-performance computer vision solutions to enable fast, efficient deep learning workloads on Intel® platforms.

The Intel® NCS 2 is a USB stick with a dedicated neural network inference accelerator. With the Intel® Distribution of OpenVINO™ toolkit, the Intel® NCS 2 can offer the following:

  • Throughput or affordability challenges
  • Next wave of innovation

The Intel® Distribution of OpenVINO™ toolkit supports Half Precision Floating Point (FP16).

Half-precision floating point (FP16)

The emergence of small and compact hardware form factors for running computer vision applications has begun.

Intel has released the USB stick based Intel® NCS 2, which is essentially a Vision Processing Unit (VPU).

FP16 can reduce the number of bits in half, reducing the exponent from 8 bits to 5, and the mantissa from 23 bits to 10.

While GPU CPU and APIs support single precision or FP32 instructions natively, the extra precision provided by this representation does not necessarily also provide notable extra classification accuracy compared to half-precision or FP16. FP16 on the other hand does cut the number of bits required for storage in half, reducing the exponent from 8 bits to 5, and the mantissa from 23 bits to 10.

Additional information about 16 Float Point and 32 Float Point are found in the table below.

Using FP16 enables developers to train and run inference on deep learning models fast.

Table 1. 16 Float Point and 32 Float Point

FP 16

FP 32

Most weights and gradients fall in the 16-bit FP range. For deep learning, in most cases, we don’t really need all the precision or magnitude (FP32).

Range can represent numbers smaller and larger than what you need.

For the gradients that do not fall in the 16 bit range – scaling the gradient up works to achieve convergence.

Enough precision to distinguish numbers.

FP16 can reduce the number of bits in half, reducing the exponent from 8 bits to 5, and the mantissa from 23 bits to 10.

Exponent (magnitude) = 8 bits to 5

Mantissa (precision) = 23 bits to 10

32FP reserves 8 bits for the magnitude and 23 bits for the precision.

Most neural networks do not need all that precision or magnitude.

Why is this important?

One challenge with computer vision, especially with prototype boards, is having enough power to develop machine vision applications. Vision accelerators such as Intel® NCS 2, enable developers to bring products quickly to market.

  • Rapid prototyping with an accelerator.
  • Low Power Consumption – The Intel® NCS 2 is a low power device designed to run on USB 2.0 or 3.0. The board, a Raspberry Pi*, for example, will supply power to the USB port – while the Pi is powered by micro USB.
  • With its small form factor, developers can add this accelerator to their development boards such as UP Squared* board and use the Intel® Distribution of OpenVINO™ toolkit with out of the box FP16 pre-trained models for prototyping solutions involving detection, recognition, and segmentation.
  • Low cost of hardware: Intel® NCS 2.

Buy Now

Use Cases for Intel® Neural Compute Stick 2 half-precision Floating Point

There are several pre-trained models optimized to use FP16. For more information about available pre-trained models, visit the Pre-trained Models page.

Half-precision floating point (FP16) Reference Implementations that can be deployed on Intel® NCS 2 to address various vertical use cases such as Digital Security and Surveillance (DSS), Retail, and Industrial Smart Factory are featured in the reference implementations below.

Table 2. Pre-Built Projects: Open Source Reference Implementations

Open Source Reference Implementations

Use Cases

Intruder Detector

Build an application that alerts you when someone enters a restricted area. Learn how to use models for multi-class object detection.

intruder detection

Python* Intruder Dector

C++ Intruder Detector

Record and send alerts on activity in controlled spaces

Machine Operator Monitor

Send notifications when an employee appears to be distracted when operating machinery.

machine operator

C++ Machine Operator Monitor

Google Go* Machine Operator Monitor

  • Industrial or manufacturing facilities
  • Construction sites
  • Warehouses

Restricted Zone Notifier

Secure work areas and send alerts if someone enters the restricted space.

person detection

Python Restricted Zone Notifier

C++ Restricted Zone Notifier

Go Restricted Zone Notifier

  • Track worker activity in proximity to heavy machinery
  • Develop safety solutions using computer vision technologies

Shopper Gaze Monitor

Build a solution to analyze customer expressions and reactions to product advertising collateral that is positioned on retail shelves.

shopper gaze

Python Shopper Gaze Monitor

C++ Shopper Gaze Monitor

  • Measure active versus inactive user product engagement
  • Capture analytics on shopper reactions to visual ads

Shopper Mood Monitor

Detect the mood of shoppers when looking at a retail or kiosk display.

store traffic

C++ Shopper Mood Monitor

Go Shopper Mood Monitor

  • Mall shoppers using interactive or map kiosk
  • Grocery store shoppers viewing digital signage ads
  • Hospitals using a kiosk to assist patients or visitors

Store Traffic Monitor

Monitor three different streams of video that count people inside and outside of a facility. This application also counts product inventory.

store traffic

Python Store Traffic Monitor

C++ Store Traffic Monitor

  • Movement of people
  • Foot activity in retail or warehouse spaces
  • Inventory availability of products on shelves

Parking Lot Tracker

Receive or post information on available parking spaces by tracking how many vehicles enter and exit a parking lot.

parking lot counter

C++ Parking Lot Counter

Go Parking Lot Counter

  • Track and analyze vehicle activity
  • Report on parking space availability

Conclusion

The Intel® Neural Compute Stick 2 is a cost effective, low power, portable solution for prototyping to create simple solutions that can be scaled. The Intel® Distribution of OpenVINO™ toolkit supports Half Precision Floating Point (FP16). Use the Intel® Neural Compute Stick 2 with pre-trained FP16 models.

Further reading and experimentations

Buy Now

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.