Graduation Semester and Year




Document Type


Degree Name

Doctor of Philosophy in Computer Science


Computer Science and Engineering

First Advisor

Fillia Makedon


The coexistence of humans and robots has been the aspiration of many scientific endeavors in the past century. Most anthropomorphic or industrial robots are highly articulated and complex machines, which are designed to carry out tasks that often involve the manipulation of physical objects. Traditionally, robots learn how to perform such tasks with the aid of a human programmer or operator. In this regard, the human acts as a teacher who provides a demonstration of a task. From the data of the demonstration, the robot must learn a state-action mapping that accomplishes the task. This state-action mapping is often addressed in the literature as a policy[1]. A common strategy for the acquisition of robot motor policies for a task is often achieved with Learning from Demonstration (LfD) algorithms [1]. Initial attempts to create LfD methods relied purely on supervised learning algorithms [2], while most modern paradigms rely on Reinforcement Learning (RL). This phenomenon indicates a shift from supervised learning to goal-oriented algorithms [3]. The development of the Dynamic Movement Primitive (DMP) framework [4] was an essential contribution to this trend, as it provides an abstraction layer between the dimensions of state, action, and environment by computing a policy with distinct meta-parameters that affect the behavior of the robot [5]. One of the advantages of the DMP framework is its ability to learn motor policies by transforming motion trajectories (high-dimensional space) to specific motion features (low-dimensional latent space) via regression. The DMP framework learns policies that lie in the trajectory level. However, humans and other animals are capable of learning new behaviors simply by observation. Robots need to achieve the same performance even if there is a substantial domain shift in the environment, embodiment, and perspective between the robot and the teacher [6]. Most modern large deep neural network models can enable complex motor skill representation across different embodiments. As such, we propose a method to learn end-to-end visuomotor policies for robotic arms from demonstrations. The method computes state-action mappings in a supervised learning manner from raw images and motor commands. At the core of the system, a Convolutional Neural Network (CNN) extracts image features and produces motion features. The motion features encode and reproduce motor commands according to the Dynamic Movement Primitives (DMP) framework.


Robotic manipulators, Computer vision, Machine learning, Reinforcement learning, Dynamic movement primitives


Computer Sciences | Physical Sciences and Mathematics


Degree granted by The University of Texas at Arlington