Graduation Semester and Year

Fall 2025

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Manfred Huber

Second Advisor

Farhad Kamangar

Third Advisor

David Levine

Abstract

Model-based reinforcement learning promises improved sample efficiency by learning environment dynamics and using them for planning or policy improvement. However, the choice of neural architecture for dynamics prediction significantly impacts the model's ability to capture temporal dependencies and maintain long-term context, capabilities crucial for complex, open-world environments.

This thesis investigates three neural architectures for learning world models: Transformer-based, GRU-based, and a hybrid Transformer+GRU approach. We evaluate these architectures on Crafter, a 2D open-world survival environment that requires long-horizon planning and sequential task completion. In Crafter, agents must perform hierarchical sequences of actions, such as collecting wood, placing a table, and crafting tools, where successful completion of complex tasks depends on remembering and executing prerequisite subtasks over extended time horizons.

Our primary goal is to explore how different architectures learn environment dynamics while maintaining long-term context in their hidden representations. Transformers leverage self-attention to flexibly model dependencies across sequences, while GRUs use recurrent hidden states to compress temporal information. The hybrid approach combines both mechanisms, using Transformers to capture long-range patterns and GRUs to maintain persistent memory of past states, particularly important when objects move in and out of the agent's limited field of view.

We compare these architectures across multiple dimensions: dynamics prediction accuracy over short and long horizons, computational efficiency, and their effectiveness in supporting policy learning for long-horizon tasks. Our evaluation examines how architectural choices affect the model's ability to handle partial observability, maintain spatial consistency when agents revisit locations, and capture the complex dependencies inherent in open-world survival scenarios. The findings provide insights into the trade-offs between different neural architectures for world modeling, offering guidance on when Transformers, recurrent networks, or hybrid approaches are most suitable for model-based reinforcement learning in environments requiring long-term planning and memory.

Keywords

Model-based reinforcement learning, world models, Transformer architectures, recurrent neural networks, GRU, hybrid neural architectures, dynamics prediction, latent space learning, long-horizon planning, open-world environments

Disciplines

Other Computer Engineering | Robotics

License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.