Executive summary

ST Engineering is a global technology, defense, and engineering group with a diverse portfolio of businesses across the aerospace, smart city, defense, and public security segments. Its smart city technologies help cities prepare for a more connected, resilient, and sustainable urban future by addressing connectivity, mobility, security, infrastructure, and environmental needs.

ST Engineering’s AGIL Video Analytics Platform (AGIL VAP) is an open architecture solution based on cloud technology that supports a range of video analytics engines for smart city applications involving recognition of people, vehicles, and objects. The scale and complexity of such systems may range from 100 to 10,000 video streams, with deployment of computing resources either in the cloud or at the edge of the network, depending on customer requirements. This flexible, single-platform solution enables customers to scale up or add video analytics engines to meet changing operational requirements.

To achieve industry-leading performance efficiency for AI-powered video analytics, ST Engineering leveraged deep learning inference models included with the Intel® Distribution of OpenVINO™ toolkit, as well as artificial intelligence (AI) acceleration capabilities built into 3rd Generation Intel® Xeon® Scalable processors.

Benchmarking data shows an over 12x increase in performance efficiency with a 90 percent decrease in power consumption when using the OpenVINO toolkit on 3rd Generation Intel Xeon Scalable processors, compared to TensorFlow on 2nd Generation Intel® Xeon® Scalable processors.¹

These results demonstrate the potential for developers of smart city video analytics (VA) solutions to deliver highly cost-effective AI performance by leveraging Intel® technology—while delivering the architectural flexibility needed to balance lower capital expenditures (CapEx) with reasonable long-term operating expenses (OpEx) over the life of the system.

Video analytics in smart cities

As city governments, transportation authorities, and public safety agencies adopt smart city solutions and technologies, they unlock new tools, practices, and insights that can improve responsiveness, services, and quality of life for their citizens.

To help make urban life safer, more efficient, and more sustainable, smart cities use data from embedded, intelligent technologies, including cameras and Internet of Things (IoT) sensors with integrated compute to collect and analyze data from locations around the city in near-real time. Decision-makers then use the knowledge obtained from this data to monitor and protect the city’s properties, capital, and services.
Increasing threats to global and domestic security, along with costs associated with hiring security personnel, are driving demand for VA, which uses AI deep learning inference to recognize people, vehicles, and objects in video footage. The use of VA reduces the hefty cost and tedious process of using human operators to view large amounts of video data for abnormal incidents and behaviors.

The use of this technology must always be paired with an emphasis on data privacy and security, as well as consent and civil rights, by the relevant government, town council, business, or operator. Regardless of local laws, Intel takes a firm stance on eliminating bias in the collection and analysis of data. Use of data must be restricted to public safety and security purposes under a system of checks and balances. In many cases, subjects should be made aware they are being recorded and given a chance to opt in or out.

The availability of cost-effective high performance computing power, along with software toolkits optimized for deep learning inference, makes performing VA computation at the network edge increasingly feasible. VA at the edge lowers bandwidth demands and decreases communication delays, enabling end users to make quicker decisions in sensitive circumstances. Although data travels from endpoints to the cloud in 150 to 200 milliseconds, it takes only 10 milliseconds from endpoints to the edge, enabling more-efficient detection and reaction.

Additionally, public and private sector players responsible for developing smart city infrastructure face pressure to keep CapEx as low as possible. From this perspective, centralized deployment of cloud-based computing resources may be advantageous compared to the CapEx of deploying numerous AI-capable IoT devices around the city. On the other hand, cloud-based VA can be challenging to scale, with ongoing OpEx costs contributing to a higher total cost of ownership (TCO) over time.

From a privacy and security perspective, the implementation of a video analytics system based on a multitier architecture enables the segregation of different data layers to reduce the threat of intrusion and loss. When using public cloud infrastructure, certain use cases (e.g., vehicle and traffic related) may be deemed less sensitive, meriting a less-stringent security architecture.

The ideal VA solution for smart city applications, then, is one that lowers the cost of deployment while giving customers the flexibility to leverage any computing resources available—in the cloud, at the edge, or both—and to adjust that balance to meet a range of scales, budgets, privacy and security needs, and business requirements.

**Potential customers**

Potential customers for ST Engineering’s AGIL VAP include public safety and security agencies, security companies, building owners, mall operators, city planners, and municipal governments.

**System architecture**

ST Engineering’s AGIL VAP is an open architecture cloud-based system that manages multiple video analytics engines on a single platform. AGIL VAP is a full-stack solution that enables operators to seamlessly execute video analytics jobs, generating the necessary alerts to shorten the detection and response cycle.

The solution enables customers to scale up or add VA engines to meet changing operational requirements. VA engines can share a common pool of computing resources, enabling use of the right engine to achieve optimal insights for the application at hand. VA engines can also provide capabilities from multiple types of hardware, enabling them to use all types of computing resources. AGIL VAP is customizable and can be agnostic to different hardware architectures, avoiding compatibility problems across engines, which is a common issue.

As shown in Figure 1, AGIL VAP is designed based on a modern software architecture of microservices coupled with a library of optimized models that go through an automated machine learning operations (MLOps) pipeline and workflow. The codebase for the AI/ML models and the video processing pipeline are optimized at the silicon level for high performance and reliability.

AGIL VAP takes in video streams through the real-time streaming protocol (RTSP) using video decoders built into the Intel® CPU. The data then undergoes a deep learning–based object detection algorithm based on models included with the Intel Distribution of OpenVINO toolkit, which optimizes deep learning through Vector Neural Network Instructions (VNNI) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512) operations. The resulting model is then passed to other microservices in AGIL VAP.

**Figure 1.** System architecture of ST Engineering's video analytics engine.
Software development and optimization

ST Engineering originally used TensorFlow to build a baseline generic pipeline that worked across numerous silicon architectures, including Intel® x86. The engineers decided to use the Intel Distribution of OpenVINO toolkit to explore cost optimization without the need for an additional graphics accelerator to reduce potential points of failure during deployment.

By using Intel® performance optimization tools to analyze and tune reference models provided with the OpenVINO toolkit, ST Engineering achieved substantial improvements in performance efficiency and power consumption. In addition, they were able to develop a solution more quickly than would otherwise have been possible.

Intel Distribution of OpenVINO toolkit

OpenVINO is a comprehensive toolkit for quickly developing applications and solutions that solve various AI-powered tasks, such as VA. Based on the latest generations of artificial neural networks, including convolutional neural networks (CNNs) and recurrent and attention-based networks, the toolkit maximizes performance by extending computer vision and nonvision workloads across Intel® hardware.

ST Engineering’s OpenVINO-optimized pipeline enables the use of a range of accelerators and Intel CPU architectures with minimal code changes. OpenVINO enables developers to deploy the same application across combinations of host processors, accelerators, and environments, including CPUs, GPUs, VPUs, FPGAs, whether on-premises or on device, in the browser or in the cloud.

OpenVINO’s Open Model Zoo provided ST Engineering with quantized pretrained models for VA, offering targeted cost and performance and reducing potential points of failure. Data type conversion from fp32 to int8 was also provided by a model download from Open Model Zoo. In addition, OpenVINO provides a transfer-learning toolkit to accommodate specific retraining capabilities.

OpenVINO is a well-documented framework with extensive tools and examples, enabling ST Engineering’s team to get up to speed quickly and dive deep into optimization. Intel provided access to experts in the VA domain, including the ability to speak directly to teams that developed the Intel hardware and software they were using. ST Engineering worked with Intel’s Internet of Things Group (IOTG) in Singapore and Malaysia, and with OpenVINO developers in India and China.

CPU upgrade

ST Engineering set out to benchmark AGIL VAP on 3rd Gen Intel Xeon Scalable processors in order to take advantage of built-in acceleration for training and inference workloads as well as increased cache architecture and memory bandwidth and channels. Upgrading from 2nd Gen to 3rd Gen Intel Xeon Scalable processors brought significant performance improvements.¹

Intel® Deep Learning Boost (Intel® DL Boost) with VNNI

The second generation of Intel Xeon Scalable processors introduced a collection of features designed to accelerate AI/DL inference, packaged together as Intel Deep Learning Boost. These features include VNNI, which increases throughput for inference applications with support for int8 convolutions by combining multiple machine instructions from previous generations into one machine instruction.

Based on Intel AVX-512, Intel DL Boost VNNI delivers a significant performance improvement by combining three instructions into one—maximizing the use of compute resources, better utilizing the cache, and avoiding potential bandwidth bottlenecks.

ST Engineering’s results leveraged special instructions, such as Intel AVX-512 and VNNI, which resulted in an uplift in instructions per cycle (IPC), with the process nodes providing better high-frequency sustaining when Intel AVX-512 and VNNI were used.
The video analytics pipeline is an asynchronous process that leverages the multicore topology and noninclusive last-level cache (LLC) features of the 3rd Generation Intel Xeon Scalable processor.

The asynchronous process generally consumes a certain amount of CPU cycle on the OS scheduler for context switching. With unoptimized video pipeline process scheduling, the overhead of kernel context switching overwhelmed the CPU, as shown in Figure 3, resulting in lower efficiency and less data being processed by the application. (The report output shown in Figures 3 and 4 is from htop, an interactive process viewer.)

Pinning the core and allocating local memory for the application prevents the kernel from performing excessive context switching. Remote socket memory access, which tends to introduce CPU stall while waiting for memory transfer via Intel® Ultra Path Interconnect (Intel® UPI), is reduced, as shown in Figure 4.

The user application is allocated to manage and process video streams local to the network interface controller (NIC) card to prevent excessive cross-socket memory transfer, which might reduce the efficiency of the caching and home agent (CHA).

**Intel® VTune™ Profiler**

The ST Engineering team used Intel VTune Profiler to optimize application performance, system performance, and system configuration for AGIL VAP. Using Intel VTune, core utilization headroom was catered for additional demand requests.

Core pinning reduces core switching by restricting computations within certain virtual cores. This has a large performance effect on the GStreamer pipeline within the VA engine, which, as a multithreaded framework, otherwise automatically distributes thread-based workloads to cores across the entire server.

A workload of 40 video analytics pipelines, each processing a video, was measured to produce the results shown in Figure 5. CPU workloads on pinned cores require much less computation to do the same amount of work because they use cores more effectively, whereas unpinned cores need to wait for the thread pool manager to instantiate context switching before they can run their workloads.

**Figure 5. Summary of Intel® VTune™ Profiler analysis.**

*Figures 3 and 4 show the pre- and postoptimization process scheduling.*

In postoptimization analysis, the scheduler CPU use is reduced by 160 percent, though it is still the most active function due to the asynchronous nature of the video analytic pipeline.
White Paper | Accelerate Video Analytics Performance

Effective CPU utilization histogram
This histogram displays a percentage of the wall time the specific number of CPUs were running simultaneously. Spin and overhead time add to the idle CPU utilization value.

**Preoptimization**

Effective CPU utilization histogram
This histogram displays a percentage of the wall time the specific number of CPUs were running simultaneously. Spin and overhead time add to the idle CPU utilization value.

**Postoptimization**

**Figure 6.** Histogram of Intel® VTune™ Profiler analysis.

Figure 6 is a histogram showing the number of simultaneously used logical CPU cores. In the preoptimization histogram, the platform uses only 30 to 50 cores out of 80 simultaneously within the 30 seconds of platform performance sampling, with the average number of cores being used at the same time around 21.

In the postoptimization histogram, the average logical core utilization is improved to about 51—more than double preoptimization. This demonstrates that the platform optimization technique can be effectively applied to the multiprocess video pipeline.

**Figure 7.** Bottom-up function call analysis from Intel® VTune™ Profiler.

Figure 7 shows the VTune function called bottom-up analysis. Preoptimization analysis shows the kernel scheduling using more than 68 percent of CPU time, showing room for improvement using core pinning to reduce remote socket memory access and UPI memory transfer.¹

In the postprocessing analysis, the top active processes are occupied by the actual user space workload, such as the Intel® Threading Building Blocks (Intel® TBB) library and GStreamer, instead of being crowded out by kernel process overhead.

Figure 8 summarizes the results of the optimization performed with Intel VTune, showing improvements in core utilization efficiency and kernel overhead percentage.³

**Figure 8.** Summary of Intel® VTune™ optimization.

---

¹ Effective CPU utilization histogram
This histogram displays a percentage of the wall time the specific number of CPUs were running simultaneously. Spin and overhead time add to the idle CPU utilization value.

² Effective CPU utilization histogram
This histogram displays a percentage of the wall time the specific number of CPUs were running simultaneously. Spin and overhead time add to the idle CPU utilization value.

³ Figure 7 shows the VTune function called bottom-up analysis. Preoptimization analysis shows the kernel scheduling using more than 68 percent of CPU time, showing room for improvement using core pinning to reduce remote socket memory access and UPI memory transfer. In the postprocessing analysis, the top active processes are occupied by the actual user space workload, such as the Intel® Threading Building Blocks (Intel® TBB) library and GStreamer, instead of being crowded out by kernel process overhead.

Figure 8 summarizes the results of the optimization performed with Intel VTune, showing improvements in core utilization efficiency and kernel overhead percentage.
Results

ST Engineering collaborated with Intel to measure performance benchmarks on the AGIL VAP code. By leveraging deep learning inference models included with the Intel Distribution of OpenVINO toolkit, along with AI acceleration capabilities built into 3rd Generation Intel Xeon Scalable processors, ST Engineering was able to achieve industry-leading performance efficiency for AI-powered video analytics.

As shown in Figure 9, benchmarking data showed a 5x improvement in performance efficiency by moving from TensorFlow to OpenVINO on 2nd Generation Xeon Scalable processors with fp32 and a 2x improvement in moving from 2nd Generation to 3rd Generation Intel Xeon Scalable processors with int8, for an overall improvement of over 12x.1

![Figure 9. Performance efficiency benchmarking results for ST Engineering’s AGIL Video Analytics Platform ($/FPS).](https://example.com/figure9)

Performance efficiency results are expressed in dollars per frame per second ($/FPS). $/FPS calculations are based on standardized hardware costs from a server vendor (including CPU, RAM, basic storage, networking, and chassis), divided by the number of frames that can be processed through the AGIL VAP software per second. This gives a good estimate of the cost of ownership for a large enough deployment to saturate the processing power of the server CPUs.

Figure 10 shows an 80 percent drop in moving from TensorFlow to OpenVINO on 2nd Gen Intel Xeon Scalable processors with fp32 and a nearly 50 percent drop in moving from 2nd Gen to 3rd Gen Intel Xeon Scalable processors with int8, for an overall drop in power consumption of 90 percent.1 Power consumption is expressed in watts per frames per second, or W/FPS.

![Figure 10. Power consumption benchmarking results for ST Engineering’s AGIL Video Analytics Platform (W/FPS).](https://example.com/figure10)

Conclusion

In this paper, we have detailed the methods used by ST Engineering to achieve industry-leading improvements in performance efficiency for their Video Analytics Platform.

Benchmarking data shows an over 12x increase in performance efficiency with a 90 percent decrease in power consumption when using OpenVINO toolkit on 3rd Generation Intel Xeon Scalable processors, compared to TensorFlow on 2nd Generation Intel Xeon Scalable processors.1 This improvement was achieved in two ways:

- **Hardware**: Upgrading from 2nd Gen to 3rd Gen Intel Xeon Scalable processors brought significant performance improvements due to built-in acceleration for training and inference workloads as well as increased cache architecture and memory bandwidth and channels.

- **Software**: Using Intel performance optimization tools to analyze and tune reference models provided with OpenVINO helped ST Engineering achieve substantial improvements in performance.1 In addition, they were able to develop a solution more quickly than would otherwise have been possible.

These results show the potential for developers of smart city solutions who leverage Intel technology to provide a highly cost-effective path to VA solutions—with the flexibility to optimize and scale deployment across whatever computing resources are available, whether in the cloud, at the edge, or both.

Learn more

- ST Engineering’s Video Analytics Platform
- Intel Distribution of OpenVINO toolkit
- 3rd Generation Intel Xeon Scalable Processors
- Intel Deep Learning Boost


**Notices and disclaimers**

ST Engineering does not own nor have access to data as the developer. It is the responsibility of the customer to comply with the laws and regulations of the relevant jurisdictions.

Performance varies by use, configuration, and other factors. Learn more at intel.com/PerformanceIndex.

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure.

Intel® Advanced Vector Extensions (Intel® AVX) provides higher throughput to certain processor operations. Due to varying processor power characteristics, using AVX instructions may cause, a) some parts to operate at less than the rated frequency and, b) some parts with Intel® Turbo Boost Technology 2.0 to not achieve any or maximum turbo frequencies. Performance varies depending on hardware, software, and system configuration, and you can learn more at intel.com/go/turbo.

Intel is committed to respecting human rights and avoiding complicity in human rights abuses. See Intel’s Global Human Rights Principles. Intel® products and software are intended only to be used in applications that do not cause or contribute to a violation of an internationally recognized human right.

Intel® technologies may require enabled hardware, software, or service activation.

Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.