Machine Learning in Path Planning for Manipulators

Published: 05/08/2017  

Last Updated: 05/08/2017

By yeser meziani


As robotics continues to be ubiquitous in our day to day basis life through multiple applications and products, the traditional approaches in modelling and motion planning comes short in face of the diversity of the real-world scenarios and ultimately fails in mirroring the endless possibilities of the heavily unstructured deployment environments. This needs implies exigencies on the dynamical adaptation of the modeling and planning approaches.

In this era of explosive compute powers and prolific programming tools, adopting the artificial intelligence (AI) in the whole process of path planning has become a must. AI application to vision based or visio-servoing of manipulators have known an increasing interest in the couple last years with many efforts put to harness the AI power for use in robotic solutions.

This work aims to illustrate how applying the AI approach through machine learning (ML) to generate path via the robot’s workspace, eliminates the shortcomings of mathematical models, mainly presented in singularities configurations relative in our case to the 5 degrees of freedom (DoF) structure of the robot manipulator.

 EUROBTEC IR-50P Manipulator Robot

Path Planning:

Path planning is of major interest to most research bodies in the robotics field. This can be easily justified considering that the execution of a given task for any robot arm, wheeled robot, drone, etc. is dictated to the robot via its path planner incorporating also the sensory inputs from the robot’s environment.

Predefined tasks can be laid and modeled to be finally embarked on the robot for execution under the underlying conditions.

This approach is still applicable and abundant through industrial robots, where robot spends its lifetime on a repetitive task. Yet the tendency towards a more collaborative workspace where the human-robot interactions are recurrent and frequent, stretch these approaches to the limit.

Figure 1: The increasing demand continues on domestic robots with over 12 Million units over the horizon of 2020. (Courtesy: IHS Markit)

Not only that robots in industrial environments are in need for more dynamical decision making, the trend of social robotics and domestic robots puts these tools in dynamical environment where static predefined decisions sets cannot reply to the demanding nature of these surroundings and a more generalized approach needs to take place to permit this transition and provide a solid solution to draw the most benefit out of robot technologies. Artificial intelligence (AI) has become pervasive and unquestionably has its roles in robots and is already the tools for vision based decoding of the robot’s environments and for learning tasks. The endless possibilities and combinations of the bodies interacting with the robot and the successions of their occurrences is unaccountable for, thus the AI use is the only solution to perceive and decide on the best course of action within these constraints.

The computational power of the trending CPUs with the state of the market Intel® Xeon Phi™ processors as an example, can solve the main problem with the AI approach: a requirement for multiple trials for tuning algorithm parameters that usually take weeks for multiple training runs and results in higher costs.

Machine Learning to The Rescue:

Machine learning presents a set of learning algorithms that can relatively generalize well and can provide a model free approach for interacting with the environment and which can adapt to the ever-changing path parameterization in these dynamic setups.

Decoding information from input and workspace data permits ultimately to choose the best succession of configuration for joint variables to best map the required motion execution: resulting in a generalized motion planner based on the robots own attainable configurations. This would relieve the planner of the task of checking valid movements and singularity free transitions between configurations.

The main target is to intelligently ponder the best move, based on real-time data acquired via sensors or images to interact fluidly with the changes in the task space on the robot, be in a worket willing to check the parts moved by the robot or a child playing catch the ball with his humanoid toy. We start by exploring the Reinforcement Learning technique which presents many interesting applications in robotics and to path planning in general.

Figure 2: A Promising 2x Market Growth suggesting higher demands on AI powered robotics for unstructured environments (Courtesy: IHS Markit)

Reinforcement Learning:

In robotics, learning from demonstrations (LfD) is very often used to bypass programming movements manually, which proves challenging enough.

Collecting data off the robot on multiple trials of the same movement presents us with the learning data set, in our case a 5DOF has a data set of 5xTxN where T is the number of time steps and N is the number of trials.

A Policy is the mapping function between learning parameters and the trajectory approximation function.

The learning algorithm acts as an optimal solver calculating the best parameters to minimize a cost function defined by the user based on the task and the constraints applied to the movement.

Figure 3: The Reinforcement learning framework, implementations varies depending on: algorithm choice - the policies - constraints applied.

Learning Data:

If neglecting the outer context of our robot, we may start by considering the attainable configurations: the entire possible combinations set of robot’s joint angles Where n is the number of articulations: joint angles. We rely for an instance on the robot’s proprioceptive sensors to measure current configuration and on predefined target configuration as our path extremities.

This succession is also subject to a number of constraint, attainable configurations, imposed via points, predefined path and environment configuration (e.g. obstacles presence) and optimality constraint (e.g. power effective movements).


The generalized path planner has for task to find the best succession of configurations starting from the current state and reaching to target state.

Performing the path generation, we may distinguish two scenarios that should be taken into account by the planner:

  • Path following task: where the path is a pattern or an exact predefined form e.g.: painting, writing, cutting …

  • Path generation task: the path is generated based on the best possible succession between the current position and the target e.g.: positioning the end-effector.

    Often executing a specified task consists of both scenarios: moving the end-effector to the right position to start executing a specific task where a predefined movement is performed.

    In the next blog post I will be introducing the data generation strategies I utilized to collect my learning data-sets and stressing on the advantages engineering has in data context, compared to any other data driven application of machines learning and AI.

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at