Kongsberg Maritime: Accelerating Image Recognition

The company changed from GPUs to CPUs for their AI solutions to easily simplify and consolidate their servers.

Kongsberg Maritime is pioneering autonomous ships and using artificial intelligence (AI) to support crews with navigation at sea. The company’s AI solutions have traditionally been based on GPUs, but Kongsberg Maritime would prefer to use CPUs so it can simplify and consolidate its servers more easily. Working with Intel, Kongsberg Maritime was able to increase its solution’s performance on standard server hardware by 4.8x1 on one of the company’s demonstrator projects.

Challenge

  • Optimize AI performance, so that Kongsberg Maritime’s object recognition solution for marine navigation can process more images per second
  • Enable server consolidation and redundancy by meeting Kongsberg Maritime’s performance expectations on standard servers

Solution

  • The Intel® Distribution of OpenVINO™ toolkit was used to accelerate the performance of Kongsberg Maritime’s TensorFlow* model running on the Intel® Xeon® Platinum processor
  • The Intel® Distribution for Python* was used to share work across 64 threads on the two-socket server
  • Intel’s expert team optimized the server settings and modified OpenVINO toolkit to enhance the performance

Results

  • Image throughput was increased by 4.8x1 compared to the unoptimized baseline
  • Kongsberg Maritime can look at using standard server hardware for its marine navigation solution, increasing redundancy in the architecture and smoothing the path to marine certification

Achieving Fast AI Inference
Over the last 10 years, 1,129 ships have been lost at sea.2 Congested seas can be a significant risk factor in some regions, and human error accounts for three-quarters of all shipping insurance losses, totaling $1.6 billion2 between 2011 and 2016.

Kongsberg Maritime has a vision to improve safety and increase the efficiency of shipping. The company plans to use AI to guide sailors on board, enable remote control from the shore, and ultimately to steer autonomous ocean-going vessels.

By 2025, the company plans to enable a short sea vessel, driven using remote and autonomous controls. Beyond that, international regulation will be the biggest barrier to launching on the open seas.

Kongsberg Maritime has already demonstrated a fully autonomous car ferry, operating in Finnish waters.3 In this demonstration, with 80 VIPs on board, Kongsberg Maritime technologies were used to navigate autonomously on the outbound journey, using sensors and cameras to detect and avoid objects. The ship berthed automatically, and remote control was used to steer the return journey.

The first step towards enabling fully autonomous vehicles, and the first commercially available product from Kongsberg Maritime for this, is called Intelligent Awareness.* It uses radar for long-distance object detection, lidar for a highly accurate analysis of the area nearer the ship, and high-definition cameras to capture a 180-degree view of the sea in front of the ship. The ship’s crew can use dashboards to see the waters around the ship, with the solution highlighting any potential hazards. The solution helps to mitigate against navigator risk, especially in the dark or in adverse weather conditions, or when carrying out tricky maneuvers such as in congested waters or when docking and undocking.

The solution currently uses GPUs for the real-time artificial intelligence analysis, which is known as inference. “We would prefer to get rid of those GPUs,” said Jaakko Saarela, project manager at Kongsberg Maritime. “One important reason is marine certification. It is much easier for us to get our servers certified if we do not use GPUs. Also, we would like to reduce our power consumption. It would be ideal if we could use generic server systems, which are all similar, too. We don’t need GPUs in all the servers, so it would be better if no servers used GPUs so that we have redundancy and can run any application on any server.”

The solution is based on about 10 server-class computers that run different parts of the application, with a high-speed internal network between the components. Kongsberg Maritime would now like to consolidate servers, so it has been investigating how the image processing can be carried out using CPUs instead of GPUs. “The neural network inference is the most challenging part,” said Saarela.

The challenge was to optimize the CPU-based solution so that it would be fast enough to detect potentially fast-moving objects at sea, such as motor boats passing across the bow of the ship.

Solution Details
The Intelligent Awareness solution uses TensorFlow*, a popular open source machine learning framework. Kongsberg Maritime has chosen to use a region-based fully convolutional network (R-FCN) model for object recognition, with ResNet-101* used for image classification in the back end. “We tried several architectures, and found that R-FCN provides a good trade-off between the computational performance (speed) and the inference accuracy,” said Saarela. “The big challenge is the scaling variance. The same objects can appear at different sizes, from 10 pixels square to 100,000 pixels square, depending on their distance.”

Intel worked with Kongsberg Maritime on optimizing the solution, with Kongsberg Maritime providing a pretrained AI model for Intel to use. The Intel Distribution of OpenVINO toolkit helped achieve higher throughput, without sacrificing accuracy. OpenVINO toolkit converts a trained model into an intermediate representation (IR), removing any operations that are only relevant to training and fusing together some of the inference operations so they can be computed more quickly. That intermediate representation is then processed by the OpenVINO inference engine, which returns information about identified objects to the Kongsberg Maritime application, as shown in Figure 1. Modifications were made to OpenVINO R4, which have now been incorporated into OpenVINO R5.

The solution is based on two Intel® Xeon® Platinum 8153 processors with 16 cores each. Each core can process two threads, so a total of 64 models can be processed in parallel (2 processors x 16 cores x 2 threads). To distribute the work across the threads, Intel used the mpi4py* library, which is included in the Intel® Distribution for Python*, and is more usually used for distributing work across separate servers.

To further increase the performance, the Intel team made small modifications to the default processor settings, modifying hyperparameters, including to pin threads to specific cores.

Figure 1. High-level inference procedure using OpenVINO™ toolkit. Tasks performed in a deep learning framework are depicted in light blue, OpenVINO toolkit tasks are in blue, and the user application is in orange.

Intel Enables Transformation
The optimizations were carried out by the Artificial Intelligence Products Group at Intel. The group includes data scientists who work with customers to help them to create effective AI solutions based on Intel® technologies. Intel also provided hardware to enable Kongsberg Maritime to test the solution.

“The Intel team has all the expertise in how to optimize solutions for the Intel® Xeon® platform, and the team is easy to work with,” said Saarela. “I have been working with Intel people from many departments for two years now, and I’ve been really impressed with how professional and proactive they are. They offer us so many possibilities with new tools, and new ways to do things. We have been working with TensorFlow a lot, but the resources usually assume that you will be using GPUs. Working with Intel has enabled us to optimize our solution for CPUs, so we can benefit from using a more standardized server platform.”

Results
Following the optimization process, throughput (measured in frames per second) was increased by 4.8x1 on one of Kongsberg Maritime’s demonstrator projects.

To show that the improvements generalize beyond the R-FCN topology, Intel also tested the optimizations using the single shot multibox detector (SSD) topology, which is typically less accurate than R-FCN. Throughput was increased by 4.5x1 when the optimized platform was compared against the unoptimized platform. Using the Intel Distribution of OpenVINO toolkit alone increased performance by 2.4x.1

“I’m impressed with the results,” said Saarela. “I had assumed we would always need GPUs, but this has changed my mind about what is possible using CPUs.”

Figure 2. Using the Intel® Distribution of OpenVINO™ toolkit and multithreading on an optimized platform, performance increased by 4.8x1, measured in frame per second (FPS).

Technical Components of the Solution

  • Intel Xeon Platinum processor. Intel Xeon Platinum processors are the foundation for secure, agile, hybrid-cloud data centers. With exceptional multi-socket processing performance, these processors are built for mission-critical, real-time analytics, machine learning, artificial intelligence, and multi-cloud workloads. With trusted, hardware-enhanced data service delivery, this processor family delivers monumental leaps in I/O, memory, storage, and network technologies to harness actionable insights from our increasingly data-fueled world.
  • Intel Distribution of OpenVINO toolkit. Based on convolutional neural networks (CNN), the toolkit extends workloads across Intel® hardware (including accelerators) and maximizes performance. It helps developers to create solutions that emulate human vision.
  • Intel Distribution for Python. Using Intel Distribution for Python, you can achieve faster Python application performance with minimal code changes; accelerate the NumPy*, SciPy* and scikitlearn* libraries; and access the latest vectorization and multithreading instructions.

Explore Related Products and Solutions

Intel® Xeon® Scalable Processors

Drive actionable insight, count on hardware-based security, and deploy dynamic service delivery with Intel® Xeon® Scalable processors.

Learn more

OpenVINO™ Toolkit

Build end-to-end computer vision solutions quickly and consistently on Intel® architecture and our deep learning framework.

Learn more

Notices and Disclaimers

Intel® technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at https://www.intel.com. // Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit https://www.intel.com/benchmarks. // Performance results are based on testing as of the date set forth in the configurations and may not reflect all publicly available security updates. See configuration disclosure for details. No product or component can be absolutely secure. // Cost reduction scenarios described are intended as examples of how a given Intel®-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. // Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. // In some test cases, results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance.

Product and Performance Information

1Results measured by Intel on a two-socket server with 2x Intel® Xeon® Platinum 8153 processors (2.00GHz, 16 cores per processor, two threads per core), 376GB memory, 4GB swap memory, 50GB SSD storage, running CentOS* Linux* 7 (Core) with OS kernel version 3.10.0- 862.11.6.el7.x86_64. Work was completed 18th December, 2018 with security mitigations applied.
2Safety and Shipping Review 2018, Allianz Global Corporate & Specialty.