Graduation Semester and Year




Document Type


Degree Name

Master of Science in Computer Science


Computer Science and Engineering

First Advisor

Deokgun Park


In Reinforcement Learning, an agent receives feedback from the environment in the form of an extrinsic reward. It learns to take actions that maximize this extrinsic reward. However, to start learning, the agent needs to be able to get feedback from the environment by using random actions. This works in environments with frequent rewards, however, in environments where the rewards are sparse the probability of reaching any reward even once becomes very low. One way to explore an environment efficiently is for the agent to generate its own intrinsic reward by using the prediction error from a model that is trained to predict the next state based on the current state and action. This intrinsic reward is like the phenomena of curiosity and leads the agents to revisit states where the prediction error is large. Since predicting the next state in pixel space is not a trivial task, efforts have been made to reduce the complexity by using different ways to extract a smaller feature space to make the prediction on. This thesis explores a couple of ways to stabilize the training when using a Variational Autoencoder (VAE) to reduce the complexity of the next state prediction. It looks at using a memory to train the VAE so that it does not overfit to a batch, it uses a recurrent layer to improve the next state prediction and it integrates the concept of Learning Progress so that the agent does not get stuck trying to predict something it cannot control.


Reinforcement learning, Intrinsic motivation


Computer Sciences | Physical Sciences and Mathematics


Degree granted by The University of Texas at Arlington