Facilitating AI Power in Smart Retail Using Intel® Optimization for Caffe* on Intel® Xeon® Scalable Processors

Published: 06/24/2019  

Last Updated: 06/24/2019


Object classification and detection is a relevant topic to computer vision in AI world. There are lots of live applications taking advantage of deep learning algorithms, and among them smart retail proves to become practical scenario for AI and benefit from AI as well. Concrete examples are food and goods detection, package/label detection, goods count, textile of cloth recognition, face recognition, face payment and many more. Most of all reaction time of these solutions in smart retail becomes extremely important for evaluating provided services as well as customer experience.

This article shows inference performance gains can be achieved using Intel® Optimization for Caffe* compared to open BVLC Caffe* on Intel® Xeon® Scalable processors for two main tasks: object classification and object detection. We tested various topologies such as GoogLeNet*, ResNet*, PVANet*, etc. using BVLC Caffe* and Intel® Optimization for Caffe* on Intel® Xeon® Scalable processors.

Solution Architecture and Design

Since there are many different uses for object classification and detection, practitioners cannot use only one model/topology to be generalized in all applications. We have to find each model with algorithms to match its best end use. In object classification and detection there are a couple of popular topologies that can achieve these tasks; VGG, and ResNet can be used as feature extraction models or classification models, whereas PVANet and Faster-RCNN as detection models.

The advantages of using Intel Optimization for Caffe are:

Intel Optimization for Caffe takes good advantage of libraries such as Intel® Math Kernel Library (Intel® MKL) and Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) to accelerate matrix multiply and add computation.

  1. It has topologies optimization as well which means similar layers are fused together and computed only once during its lifetime. Thus, other layers can query and acquire the computed result very quickly.
  2. Most of all training parts for classification and detection were achieved by Intel Optimization for Caffe in order to keep models and end-to-end pipeline simply consistent.

Our experiments were based on Pascal VOC 2007 dataset. 9,963 images were used to evaluate classification inference, while 4,952 images were used to evaluate detection inference.

Our work diagram is designed as below:

architecture workflow
Architecture workflow


As discussed above we use diverse popular topologies in this experiment listed as follows.

  • Classification: AlexNet*, GoogLeNet, VGG16, ResNet50, ResNet101
  • Detection: PVANet, Faster-RCNN

Hardware Configuration

Test Date 25/09/2018
Platform S2600BP
# Nodes 1
# Sockets 2
Processor Intel® Xeon® Gold 6140 processor (24.75M Cache, 2.30 GHz)
Cores/socket, Threads/socket 18, 36
ucode 0x200004d
Turbo On
BIOS Version SE5C620.86B.00.01.0014.070920180847
System DDR Mem Config: slots / cap / run-speed 6 slots / 16 GB / 2400 MHz
Total Memory/Node (DDR+DCPMM) 96 GB
Storage - boot SSD 480 GB
Storage - application drives SSD 480 GB (shared with boot)
OS Cent OS 7.4.1708 (Core)
Kernel 3.10.0-693.17.1.el7.x86_64
IBRS (0=disable, 1=enable) 0
eIBRS (0=disable, 1=enable) 0
Retpoline (0=disable, 1=enable) 0
IBPB (0=disable, 1=enable) 0
PTI (0=disable, 1=enable) 0
Mitigation variants (1,2,3,3a,4,L1TF) https://github.com/speed47/spectre-meltdown-checker Mitigated
Compiler GCC 4.8.5
Libraries OpenCV 3.4
Frameworks version Intel® Optimization for Caffe* 1.1.0
BVLC Caffe* 1.0
Dataset Pascal VOC 2007
Topology AlexNet, GoogLeNet, VGG16, ResNet50, ResNet101, PVANet, Faster-RCNN
Batch size 1, 100
Training size / Test size 9963 / 4952 images

Software Used

Intel® Optimization for Caffe* v1.1.0
BVLC Caffe* v1.0
Open Python v2.7
Classification AlexNet, GoogLeNet, VGG16, ResNet50, ResNet101
Detection PVANet, Faster-RCNN

Installing Required Software

Follow these links to install the required software:

Installing Intel® Optimization for Caffe*

Installing BVLC Caffe*

Test Command

caffe --phase TEST --iterations 100 --model <model.caffemodel> --engine <mkl/mkldnn>

Above is a script for inferencing, where model.caffemodel stands for the trained Caffe model, and mkl/mkldnn denotes the engine using either Intel® MKL or Intel® MKL-DNN.

What Was Evaluated

  1. We compared the performance between Intel Optimization for Caffe and BVLC Caffe with different batch size and topologies. From aspect of software we used BVLC Caffe (Public Caffe*) as baseline whereas Intel Optimization for Caffe as performance compared target. (See figure 1)
  2. Clear images of classification and detection inference were shown in figure 2 and figure 3.

Experiment Results

The below table is the complete experiment result. Metrics from topology, batch size (bs), type of Caffe* we evaluated how much these metrics affected the inference performance.

classification and detection inference result comparison
Figure 1. Classification and detection inference result comparison by topologies, batch sizes and type of Caffe*

The graph below represents inference on Pascal VOC dataset for classification using Intel Optimization for Caffe with different topologies, batch size set to 1, we can see GoogLeNet with 21.37X improvement and 56.36 FPS.

Figure 2. Intel® Optimization for Caffe* classification inference FPS on Intel® Xeon® Gold 6140

The last graph shows the inference on Pascal VOC dataset for detection using BVLC Caffe and Intel Optimization for Caffe on Intel® Xeon® Gold 6140. The results show that Intel® optimized Faster-RCNN achieved 8.27X performance gain.

Intel® Optimization for Caffe detection inference FPS on Intel Xeon Gold 6140
Figure 3. Intel® Optimization for Caffe* detection inference FPS on Intel® Xeon® Gold 6140


Minimum 6X performance increase for commonly used classification topologies such as VGG16, ResNet50/101, 10X gains for AlexNet and 21X plus for GoogLeNet. Performance was greatly improved by choosing bigger batch size running on Intel® Xeon® Gold 6140 (See Figure 1). More than 8X performance gain with optimized detection topology Faster-RCNN, whereas even 3X with non-optimized PVANet.

Finally we come into conclusion, with well optimized topologies, such as AlexNet, GoogLeNet, ResNet50 and Faster-RCNN, Intel Optimized Caffe can further unleash performances on detection and classification, which facilitates AI power in smart retail with much more efficient and robust boost.


1. Installing Intel® Optimization for Caffe*

2. Installing BVLC Caffe* on GitHub*

3. Intel® MKL

4. Intel® MKL-DNN on GitHub


Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks. Configurations: Intel® Xeon® Gold 6140, Intel® Optimization for Caffe* 1.1.0. Test by ISV on 25/09/2018.

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.