Graduation Semester and Year

2020

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Deokgun Park

Abstract

In Reinforcement Learning, an agent receives feedback from the environment in the form of an extrinsic reward. It learns to take actions that maximize this extrinsic reward. However, to start learning, the agent needs to be able to get feedback from the environment by using random actions. This works in environments with frequent rewards, however, in environments where the rewards are sparse the probability of reaching any reward even once becomes very low. One way to explore an environment efficiently is for the agent to generate its own intrinsic reward by using the prediction error from a model that is trained to predict the next state based on the current state and action. This intrinsic reward is like the phenomena of curiosity and leads the agents to revisit states where the prediction error is large. Since predicting the next state in pixel space is not a trivial task, efforts have been made to reduce the complexity by using different ways to extract a smaller feature space to make the prediction on. This thesis explores a couple of ways to stabilize the training when using a Variational Autoencoder (VAE) to reduce the complexity of the next state prediction. It looks at using a memory to train the VAE so that it does not overfit to a batch, it uses a recurrent layer to improve the next state prediction and it integrates the concept of Learning Progress so that the agent does not get stuck trying to predict something it cannot control.

Keywords

Reinforcement learning, Intrinsic motivation

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS