There are powerful features for using Deep Neural Networks (DNNs) for Computer Vision (CV) included with the Intel® Distribution of OpenVINO™ toolkit. It also includes a whole array of more traditional CV filters and algorithms, all of which have been optimized for Intel® processors. Sometimes the best way to solve a particular problem in computer vision is to combine both techniques. This article explains how you can use DNNs and CV algorithms together to create applications with Intel® Distribution of OpenVINO™ and OpenCV.
Counting On Parking
The application that we will consider is one designed for a parking space area mounted camera which monitors available parking space by tracking the counts of the vehicles entering and leaving the parking space area. One of the "Reference Platform" applications from Intel is designed to perform this task, and it is this application that we will study in detail in this article.
The full application code can be found here.
To perform this task, we first need to recognize when a vehicle is in view of the camera. But we also need to determine the vehicles state. A vehicle can be one that we have seen before, or can be a new vehicle that has appeared in front of the camera.
If the vehicle has been seen previously, then we can analyze the detected vehicle to determine if it is parked, or moving. If it in motion, we need to determine in which direction it is headed. This way once the vehicle has left the camera's view, we can use the directional data to determine if it was entering the parking area, or leaving the parking area, and update the counts accordingly.
Detecting Moving Vehicles Using a DNN
There are a number of ways to detect moving objects using traditional computer vision techniques. Probably the most commonly used is known as a cascade classifier. A cascade classifier is an algorithm that uses a form of image analysis based on looking for Haar-like features. These are features that match one of several categories within an image based on pixel intensities calculated using the sums of pixel areas within the image.
Here is a simplified code example that uses the OpenCV cascade classifier:
Mat image;
image = cv::imread("carimage.jpg");
String haarFile = "vehicle-detector.xml";
CascadeClassifier classifier;
classifier.load(haarFile);
vector<Rect> cars;
cascade.detectMultiScale(image, cars);
After calling the above code, the variable cars
will contain the rectangles of any detected vehicles.
However, there are a number of limitations to this approach. One is that cascade classifiers are very sensitive to object rotation. If the vehicle is not at almost exactly the same angle as the images used in the training set, the vehicle will not be recognized.
Another limitation is the consumption of computational resources. The cascade classifier requires more computation as more and more objects are recognized. So as a parking lot became more occupied with cars, it would require more computing resources for the calculations.
A different approach to solving the same problem is by using a DNN object detector to detect vehicles. The Intel® models included with the Intel® Distribution of OpenVINO™ toolkit include a few different models that can perform this task. The "pedestrian-and-vehicle-detector-adas-0001" model is the one used by the reference platform example.
This is a simplified version of the code in the sample used to detect vehicles:
Mat image, blob;
image = cv::imread("carimage.jpg");
String model = "pedestrian-and-vehicle-detector-adas-0001.bin";
String config = "pedestrian-and-vehicle-detector-adas-0001.xml";
Net net = readNet(model, config);
blobFromImage(image, blob, 1.0, Size(672, 384));
net.setInput(blob);
Mat prob = net.forward();
vector<Rect> cars;
float* data = (float*)prob.data;
for (size_t i = 0; i < prob.total(); i += 7)
{
float confidence = data[i + 2];
if (confidence > 0.5)
{
int left = (int)(data[i + 3] * image.cols);
int top = (int)(data[i + 4] * image.rows);
int right = (int)(data[i + 5] * image.cols);
int bottom = (int)(data[i + 6] * image.rows);
int width = right - left + 1;
int height = bottom - top + 1;
cars.push(Rect(left, top, width, height));
}
}
After calling the above code, the variable cars
will contain the rectangles of any detected vehicles, just like the cascade classifier did. It is slightly more code, so what benefits does using a DNN over the traditional algorithmic approach provide?
A DNN is much better at handling vehicles that are at different sizes and angles relative to the camera. Also, the amount of computation needed to process the data is fixed, so we can more easily predict the hardware required in order to achieve a particular performance characteristic. Lastly, by using a DNN we can take advantage of the hardware acceleration of Intel® Distribution of OpenVINO™ to use a connected GPU or VPU without writing any additional code.
Checking If We Have Seen This Vehicle Before
As we detect vehicles, we need to maintain a list of cars that have been seen. This allows the program to determine if any detected vehicle newly appears in the camera's view, or if we have already started tracking it.
One way that we can do this using a traditional CV approach is to use a series of calculations based on the vehicle's centroid. A centroid simply put is the "middle" of a geometric figure. More formally it is the arithmetic mean of all of the dimensions of the figure.
Figure 1: "Triangle Centroid" courtesy of Wikimedia Commons is licensed under Public Domain.
Since the typical DNN object detector returns boxes, we can calculate the centroid for any given vehicle by just dividing the height by 2, and the width by 2.
Now that we have determined the centroid, we can check it against a list of centroids for all of the already detected vehicles. But how can we tell if there is a match? If the object is in motion, then the current centroid would not match any centroids that we have already measured.
One way to handle this, is to calculate the distance from the currently tracked object's centroid, to the most recently measured centroid for each already detected car. If the distance is close enough, then we can decide that it is the same vehicle, despite perhaps having moved within the frame.
This allowable distance needs to be able to be configured by the application, in order to account for how far away that the camera might be from the vehicles being tracked.
The simplest way to calculate the distance between two points is to calculate the Euclidean distance. Most simply, the Euclidean distance is the length of a straight line connecting the two points:
Figure 2: "Euclidean Distance 2D" is courtesy of Kmhkmh is licensed under Creative Commons Attribution 4.0 International
This is one algorithm that we can use to calculate Euclidean distance:
Point p, q;
double dx = double(q.x - p.x);
double dy = double(q.y - p.y);
double dist = sqrt(dx*dx + dy*dy);
As a counterpoint, in order to determine if we have seen a vehicle before using a DNN, you would typically use a reidentification model. There are a couple of reidentification models included in the Intel models that are part of Intel® Distribution of OpenVINO™, however, they are either intended for use with people, or with faces. Training a re-identification model on vehicles can be a substantial task, and is beyond the scope of this article. Not having to concern ourselves with this training is one benefit of the approach we have taken to solve the problem, using a combination of both DNN and traditional CV techniques.
Determining Direction of Vehicles In Motion
Once we have determined if the car is one we've already tracked, or is newly appeared, we can add the centroid to the list of tracked centroids for this vehicle. Once we have at least two centroids in that list, we can use them to calculate the cars most recent movement and direction.
First, we must calculate the movement. We can do this by calculating the mean of all of the centroids for this vehicle.
For example:
int carMovement(vector<Point> traject, string entrance) {
int mean_movement = 0;
for(vector<Point>::size_type i = 0; i != traject.size(); i++) {
// when movement is horizontal only consider trajectory along X axis
if (entrance.compare("l") == 0 || entrance.compare("r") == 0) {
mean_movement = mean_movement + traject[i].x;
}
// when movement is vertical only consider trajectory along Y axis
if (entrance.compare("b") == 0 || entrance.compare("t") == 0) {
mean_movement = mean_movement + traject[i].y;
}
}
// calculate average centroid movement
mean_movement = mean_movement / traject.size();
return mean_movement;
}
Notice that we are calculating this value based on which axis related to the entrance to the area being monitored. In other words, we need to track relative to vehicles entering from the left or right vs. those entering from the top or bottom of the frame.
Once we have calculated the movement, we can then determine a vehicle's direction:
int carDirection(Point p, int movement, string entrance) {
int direction = 0;
// when movement is horizontal only consider trajectory along X axis
if (entrance.compare("l") == 0 || entrance.compare("r") == 0) {
direction = p.x - movement;
}
// when movement is vertical only consider trajectory along Y axis
if (entrance.compare("b") == 0 || entrance.compare("t") == 0) {
direction = p.y - movement;
}
return direction;
}
Determining Vehicles That Are No Longer In View
Any vehicle that is in our list of current vehicles, but has not had a centroid appear recently, we categorize as now being gone. In other words, that car is no longer in view. We can also count the number of times the car was not detected, to ensure that it has really left the area being monitored. The key question is did it enter or exit the parking area? To determine that, we need to look at the last direction it was headed, relative to the entrance to the parking area. Turns out we know exactly that information from the previous step!
Conclusion
There are many different ways to solve any given programming challenge and the hybrid approach presented here is only one way to solve this one. It has served to illustrate that in many cases, a very good way to perform a particular task with computer vision, is to combine both DNNs and more traditional CV algorithms. This article has shown one way in which you can use the Intel® Distribution of OpenVINO™ toolkit and OpenCV together to create these kinds of applications.