Autonomous UAV Control and Mapping in Cluttered Outdoor Environments – Update 1

Published: 11/08/2017  

Last Updated: 11/08/2017

By Bruna Pearson

Autonomous and intelligent flight under the canopy of densely forested areas is a challenging problem yet to be addressed. It consists of giving, to an unmanned aerial vehicle (UAV), the ability to decide which is the best flight route to be taken in an unseen environment. This decision is achieved by processing, frame-by-frame, the RGB image captured by the forward-facing camera.

The capability to perform autonomous and intelligent flights under the canopy are crucial for activities such as Search and Rescue (SaR) missions [1], visual exploration of disaster areas [2], [3], aerial reconnaissance and surveillance [4], [5], and assessment of forest structure [6], [7] or riverscape [8].

Currently, we are investigating the use of Deep Learning (DL) to teach the algorithm to recognize trails and possible obstacles. Our main challenges are the variance in luminance and environmental conditions that are usually present in unstructured environments such as densely forested areas (Figure 1).

Deep learning is a subset of Machine Learning (ML) methodologies[9], which aims to simulate the way a human brain processes and learns new information [9]–[11]. Usually, ML algorithms have input and output layers, whereby raw data can be transformed prior to being fed to the input layer. In contrast, DL algorithms may have one or more hidden layers between the input and output layers. Due to this, the algorithm is expected to extract features from the raw sensory information in multiple levels and without any preprocessing or filtering of the raw data [11].  As a result, high quality features are learned autonomously and efficiently [12].

Deep learning algorithms can be modeled as feed-forward or as recurrent neural networks. The former has no feedback connections, meaning no data is used to feedback to the model, while the latter includes feedback connections [12].

Feed-Forward Neural Networks (FFNN) consist of a significantly large number of processing units, also know as nodes, which are organized into layers. Each unit present in a layer is also connected to another unit from the previous layer. Typically, the memory model is simplistic, storing only the hierarchical features set (weights) and a few other parameters [13]. FFNN may share the same weight value or have different ones. Either way, the input data moves through the network, layer by layer, classifying the data and accumulating knowledge until it derives the output in the final layer [13].

The most common type of FFNN is the Convolutional Neural Network (CNN) due to its well-adaptable structure for image classification [13]. Originated in early 1990s with the development of LeNet, over the years due to advancements in computing power a growing number of CNN variants became available, such as: AlexNet (2012) [14], ZFNet(2013) [15], GoogleNet (2014)[16], VGGNet(2014), amongst others. Not surprisingly CNNs are also the most implemented model to train UAV control systems [17].

A current state-of-the-art paper [18] demonstrates a deep data-driven-sensory-motor system that estimates the approximated directions of the trail. It does that by processing frame-by-frame through a Deep Neural Network (DNN).

The DNN presented [18] receives an RGB input image and outputs three values which represent the probability of the trail being located on the left, center or right of the image. Contrary to the approach in [18], this project aims to investigate the performance of Inception Resnet V2 [19]  Network for the problem of trail identification. More details about the Inception Resnet V2 will be presented on the next post. For now, we refer the reader to [20].

During this project we will be using the Intel® Movidius™ Neural Computer Stick (NCS) and the Intel® Aero Ready to Fly Drone. During the first phase of the project we will be training our model using the publicly available IDSIA (Istituto Dalle Molle di Studi sull'Intelligenza Artificiale) dataset [18]. In the second phase data will be gathered using Intel's drone. Finally, in our third phase we explore the algorithm’s ability to reproduce the same trajectory previously recorded by the drone.

Our goal is to use the results and knowledge acquired during the simulation to form the base for further work, whereby we aim to expand the system into a real-world application.


[1]      D. Câmara, “Cavalry to the Rescue: Drones Fleet to Help Rescuers Operations over Disasters Scenarios.”

[2]      G. Rémy, S.-M. Senouci, F. Jan, and Y. Gourhant, “SAR.Drones: Drones for Advanced Search and Rescue Missions.”

[3]      L. Apvrille, T. Tanzi, and J. L. Dugelay, “Autonomous drones for assisting rescue services within the context of natural disasters,” in 2014 31th URSI General Assembly and Scientific Symposium, URSI GASS 2014, 2014.

[4]      A. Gaszczak, T. P. Breckon, and J. Han, Real-time People and Vehicle Detection from UAV Imagery. 2011.

[5]      A. Puri, “A Survey of Unmanned Aerial Vehicles (UAV) for Traffic Surveillance,” Tech. Pap., pp. 1–29, 2005.

[6]      L. P. Koh and S. A. Wich, “Dawn of drone ecology: low-cost autonomous aerial vehicles for conservation,” Trop. Conserv. Sci. Open Access J. -Tropical Conserv. Sci., vol. 55, no. 52, 2012.

[7]      L. Wallace, A. Lucieer, Z. Malenovsk???, D. Turner, and P. Vop??nka, “Assessment of forest structure using two UAV techniques: A comparison of airborne laser scanning and structure from motion (SfM) point clouds,” Forests, 2016.

[8]      J. T. Dietrich, “Riverscape mapping with helicopter-based Structure-from-Motion photogrammetry,” Geomorphology, 2016.

[9]      S. P. Antonio Gulli, Deep Learning with Keras, Nick McClure. Birmingham, 2017.

[10]    L. Tai and M. Liu, “Deep-learning in Mobile Robotics - from Perception to Control Systems: A Survey on Why and Why not,” 2016.

[11]    G. Zaccone, Getting Started with Tensorflow. Packt Publishing, Limited, 2016.

[12]    A. C. Ian Goodfellow, Yoshua Bengio, Deep Learning. London: MIT Press, 2016.

[13]    S. Krig, Computer Vision Metrics: Textbook Edition. Springer International Publishing, 2016.

[14]    Michael A. Nielsen, “Neural Networks and Deep Learning,” 2015. [Online]. Available: [Accessed: 04-May-2017].

[15]    A. Krizhevsky and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” pp. 1–9.

[16]    M. D. Zeiler and R. Fergus, “Visualizing and Understanding Convolutional Networks,” 2012.

[17]    C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going Deeper with Convolutions,” 2014.

[18      A. Giusti, J. Guzzi, D. C. Cirean, F.-L. He, J. P. Rodríguez, F. Fontana, M. Faessler, C. Forster, J. Schmidhuber, G. Di Caro, D. Scaramuzza, and L. M. Gambardella, “A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots,” IEEE Robot. Autom. Lett., pp. 2377–3766, 2015

[19]    K. Kelchtermans and T. Tuytelaars, “How hard is it to cross the room ? - Training (Recurrent) Neural Networks to steer a UAV,” 2017.

[20]    C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v2, Inception-ResNet and the Impact of Residual Connections on Learning,” in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), 2017, pp. 4278–4284.

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at