Ben: The Self-Driving Bot

Published: 04/17/2019  

Last Updated: 04/17/2019


Ben is an autonomous, self-driving robot built on Intel® architecture. Containing some of the most essential features that a typical self-driving car possesses, Ben can detect, classify vehicles, objects, humans, and can navigate itself accordingly. Developed on an Intel® hardware and software stack, the UP Squared* AI Vision Developer Kit acts as the brain for Ben, doing all the core processing needed for the bot. The Intel® RealSense™ camera works as the eyes for Ben taking in all of its surroundings as a live video feed.

Being completely autonomous in action, Ben operates without needing any human intervention. It showcases the power of the UP Squared AI Vision Developer Kit being used as a single device for running complex deep learning models and giving signals for actuation. The Intel RealSense camera is also handy in terms of productionizing the bot. The night vision and 3D mapping capabilities give Ben even more power in terms of vision and understanding its environment in any circumstances.

Since Ben is currently under development, we are continually working on adding new features to advance the bot even further – see the Future Work section below for more information.

Why do we need Ben?

In 2015, 35,092 people died in car accidents. Someone dies once every 88 million miles driven. That gives you about a 0.011% chance of dying in a car accident in any given year or 0.88% in your lifetime. Most people maintain about 150 relationships, which means that you will probably lose a friend in a car accident. Over 2.6 million people are injured in vehicles every year. This accounts for billions of dollars of car repairs - in deductibles alone. A self-driving car is a great solution to eliminate all these problems. Comparatively, self-driving cars are already precisely that much safer .Let’s now discuss how Ben fits well in this context.

Ben provides developers a simulated approach for building technology that can later be implemented into a real car. The algorithms involved in developing an actual self-driving car are almost the same as those involved in making the bot – deep learning, actuation, rule engines, and visual sensing can be implemented in a much easier and cost-efficient way in a bot like Ben.

The top 3 reasons for building a bot like Ben are:

  1. Cost efficiency

    Ben provides a cost-efficient way of developing a prototype of a self-driving vehicle.

  2. Faster prototyping

    One of the most important advantages of having an autonomous robot like Ben is the ability to do rapid prototyping. The technologies required to power a self-driving car can be implemented into the bot and once tested on the cases, the same technology stack can be up-scaled into a “real car”.

  3. Testing in a simulated environment

    Ben can be tested in a simulated environment by creating a track with road lanes, traffic lights, and signals similar to the real world. The idea is to create an environment that is as realistic as possible and test all the use cases to see how the bot behaves in different scenarios. This reduces the overall cost and risk of an accident while testing. Once the algorithms pass all the test cases on the bot in the simulated environment, the technology can be replicated into a real car, reducing the risk of failure to a large extent.

The Self-Driving Bot
Figure 1. Ben The Self-Driving Bot

Technologies Used



Design Architecture and Implementation


The main flow of the system involves the bot getting motion commands from the UP Squared AI Vision Developer Kit and the kit handling the entire process. The kit receives a live feed from the camera and based on the input processes for detection of any other vehicles. Finally, based on the position of the objects on its trajectory, it decides one of the following actions to execute.

  1. Move front
  2. Left turn
  3. Right turn
  4. Stop
  5. Reverse

As of now, Ben is reading from a single camera and thus the current model has lots of blind spots.

Basic workflow
Figure 2. Basic workflow

The image above depicts the basic flow as the bot receives the input data from a USB camera and passes it to the UP Squared AI Vision Developer Kit. The kit then processes the data and gives the resultant decision to the motor driver for motion execution.

Let’s dig into the process of performing object recognition and making decisions referring to all the process happening.

Figure 3. Program flow of Python* script inside the UP Squared* AI Vision Developer Kit

The flow diagram in figure 3 indicates that we initially capture the frames using OpenCV. After some pre-processing, we pass each of the frames to the TensorFlow* object recognition engine. Here we detect the positions of each object.

The rule engine takes a brute-force approach and checks the direction of the object. Based on the position of the object and whether it lies in the trajectory of the bot, the rule engine sends the action commands to the motor controller using a serial port.

Finally, based on the action command received by the Arduino UNO* via serial communication, the actuation signals are sent to the dual DC motor driver which are directly connected to the DC motors of the bot. This combination runs the motor and the bot moves in the desired direction.

Let’s discuss each of the above components in detail.

Capturing Frames for Processing

For any application involving the use of video, the video frames need to be captured. This is a very easy and straightforward process using OpenCV. Once the frames are captured, we iterate through each frame and pass it through the recognition engine for detection.

import cv2
cap = cv2.VideoCapture(0)
ret, image_np =

The above piece of code simply captures the video frame. The third line is the one which has to be inside a loop as it captures frames.

Object Recognition using TensorFlow*

For object recognition, we are using the object recognition API which is implemented based on SSD architecture. The API supports 90 classes, and we have used only four classes for our purpose.

MODEL_NAME = 'ssd_mobilenet_v1_coco_11_06_2017'
with detection_graph.as_default():
  with tf.Session(graph=detection_graph) as sess:
    while True:
      ret, image_np =
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
      image_np_expanded = np.expand_dims(image_np, axis=0)
      image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
      # Each box represents a part of the image where a particular object was detected.
      boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
      # Each score represent how level of confidence for each of the objects.
      # Score is shown on the result image, together with the class label.
      scores = detection_graph.get_tensor_by_name('detection_scores:0')
      classes = detection_graph.get_tensor_by_name('detection_classes:0')
      num_detections = detection_graph.get_tensor_by_name('num_detections:0')
      # Actual detection.
      (boxes, scores, classes, num_detections) =
          [boxes, scores, classes, num_detections],
          feed_dict={image_tensor: image_np_expanded})
      # Visualization of the results of a detection.
      if scores[0][0] >= 0.5 or scores[0][5] >= 0.5 or scores[0][7] >= 0.5:
        for i,b in enumerate(boxes[0]):
          if classes[0][i] == 3 or classes[0][i] == 6 or classes[0][i] == 8:
            if scores[0][i] >= 0.5:
              mid_x = (boxes[0][i][1]+boxes[0][i][3])/2
              mid_y = (boxes[0][i][0]+boxes[0][i][2])/2
              apx_distance = round(((1 - (boxes[0][i][3] - boxes[0][i][1]))**4),3)
              cv2.putText(image_np, '{}'.format(apx_distance), (int(mid_x*800),int(mid_y*450)), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255,255,255), 2)
              if mid_x>0 and mid_x<0.3:
              elif mid_x>0.7 and mid_x<1:
              if apx_distance <=0.3:
                if mid_x > 0.3 and mid_x < 0.7:
                  cv2.putText(image_np, 'WARNING!!!', (50,50), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0,0,255), 3)

The above code is the main block for recognition where we are capturing the camera frames and processing it. Similarly, we are detecting the position of the object detected using the section below and it is the rule engine that is hard-coded with threshold values.

if mid_x>0 and mid_x<0.3:
              elif mid_x>0.7 and mid_x<1:
              if apx_distance <=0.3:
                if mid_x > 0.3 and mid_x < 0.7:
                  cv2.putText(image_np, 'WARNING!!!', (50,50), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0,0,255), 3)

From here, we are using serial port communication to send the action commands to the motor controller for the bot to act.

Sample Images from Test Run

On the left side of the image below is the agent. We have used toy cars to create a virtual environment of one of the real-world scenarios such as having other vehicles on the road. Behind that is the serial monitor for the Arduino UNO. As shown on the right side, the algorithm works perfectly in detecting the car with the percentage of accuracy being also displayed.

The algorithm not only detects the vehicles surrounding it but also gives a warning.

Frame Captured During Test Run
Figure 4. Frame Captured During Test Run

More test cases are shown below from the sample run:

Bot Detecting a Car and Changes its Path
Figure 5. The Bot Detecting a Car and Changes its Path

More test cases on sample run
Figure 6. More test cases on sample run

Test cases on sample run
Figure 7. More test cases on sample run

The above image (Figure.7) shows that maximum accuracy (97%) is achieved at times.

The frame per second (FPS) rate is nearly 9-10 and can be improved by adding more computational power to the processing unit and optimizing the algorithms. Initially, the algorithm gave approximately 3-4 FPS, but through optimization, we were able to achieve better results.

Work Flow Steps

  1. The camera mounted on the robot captures the image frames. The camera can be replaced with an Intel® RealSense™ Depth Camera as it comes with night vision. In daylight any camera should work perfectly for capturing the surroundings, but the primary challenge comes at night. An Intel RealSense camera can be handy in terms of practical prototyping. The image capturing FPS can be controlled by manipulating the Python code.
  2. Captured image frames are sent for processing for object detection. The processing can be done on a laptop, an Intel® NUC, or on an UP Squared AI Vision Developer Kit. The image frames can be sent to the processing unit either via a wired media or a wireless protocol. A socket can be created and allow client-server communication between the bot and the processing unit. Replacing the laptop with and an Intel NUC will be faster and more flexible. The Intel NUC can easily be mounted on the bot itself leaving the complexities of communication between the bot and the laptop behind.
  3. The captured images are then fed to the object detection algorithm. The TensorFlow object detection can be replaced by the object detection done by the Intel® Distribution of OpenVINO™ toolkit which is faster and more accurate. Once the object detection is done, then an action message, or token, is sent to an Arduino UNO board on the robot that is responsible for the actuation of the bot. If there is an obstacle detected at the front of the bot, for example, then a message from the processing unit will be sent via serial communication to the actuation unit (the Arduino UNO board) to turn right or left (depending on what logic is implemented).
  4. As soon as the signal is received, the Arduino UNO gives corresponding signals to the motor driver circuit that is directly connected to the motors of the bot. And that controls the movement of the robot. The movement is controlled by the Arduino UNO and the motor driver. The processing is done by the processing unit such as a laptop, Intel NUC, or UP Squared AI Vision Developer Kit).


Working on this project was a great way to get to know the capabilities of autonomous robotics and deep learning. Hardware and software powered by Intel® architecture provided efficiency and high performing capabilities to the bot. Ben can be a great way of starting research and development in the field of autonomous robotics, deep learning, and how a self-driving car work. Building a bot like Ben is an efficient way of learning in terms of cost, rapid prototyping, and testing, for students and professionals who want to experiment and devote years of research in the field of self-driving cars, autonomous robotics, computer vision, and AI.

Also, real problems may be revealed in the hardware when you take the neural network from the simulator and put it on a “real” car.

Future Work

Ben is under development, and new features are being added every day. We are continually working in making Ben a complete productionized self-driving bot so the same technology can be implemented a real car. Following are the few areas where we are actively working:

  1. Using advanced computational power for processing. The UP Squared AI Vision Developer Kit can be replaced by an Intel NUC.
  2. Replacing Intel Optimization for TensorFlow for object detection (vehicles and pedestrians) with the Intel Distribution of OpenVINO toolkit for all computer vision algorithms.
  3. Implementing lane detection (which is already done and tested on a real vehicle), traffic sign classifier using the Intel Distribution of OpenVINO toolkit.
  4. Path planning, road segmentation, and localization.

We are continually researching and adding new features to Ben.

Similar Projects for Reference

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at