Graduation Semester and Year
Fall 2025
Language
English
Document Type
Thesis
Degree Name
Master of Science in Computer Science
Department
Computer Science and Engineering
First Advisor
Manfred Huber
Second Advisor
Farhad Kamangar
Third Advisor
David Levine
Abstract
Model-based reinforcement learning promises improved sample efficiency by learning environment dynamics and using them for planning or policy improvement. However, the choice of neural architecture for dynamics prediction significantly impacts the model's ability to capture temporal dependencies and maintain long-term context, capabilities crucial for complex, open-world environments.
This thesis investigates three neural architectures for learning world models: Transformer-based, GRU-based, and a hybrid Transformer+GRU approach. We evaluate these architectures on Crafter, a 2D open-world survival environment that requires long-horizon planning and sequential task completion. In Crafter, agents must perform hierarchical sequences of actions, such as collecting wood, placing a table, and crafting tools, where successful completion of complex tasks depends on remembering and executing prerequisite subtasks over extended time horizons.
Our primary goal is to explore how different architectures learn environment dynamics while maintaining long-term context in their hidden representations. Transformers leverage self-attention to flexibly model dependencies across sequences, while GRUs use recurrent hidden states to compress temporal information. The hybrid approach combines both mechanisms, using Transformers to capture long-range patterns and GRUs to maintain persistent memory of past states, particularly important when objects move in and out of the agent's limited field of view.
We compare these architectures across multiple dimensions: dynamics prediction accuracy over short and long horizons, computational efficiency, and their effectiveness in supporting policy learning for long-horizon tasks. Our evaluation examines how architectural choices affect the model's ability to handle partial observability, maintain spatial consistency when agents revisit locations, and capture the complex dependencies inherent in open-world survival scenarios. The findings provide insights into the trade-offs between different neural architectures for world modeling, offering guidance on when Transformers, recurrent networks, or hybrid approaches are most suitable for model-based reinforcement learning in environments requiring long-term planning and memory.
Keywords
Model-based reinforcement learning, world models, Transformer architectures, recurrent neural networks, GRU, hybrid neural architectures, dynamics prediction, latent space learning, long-horizon planning, open-world environments
Disciplines
Other Computer Engineering | Robotics
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Recommended Citation
Gadhiya, Vinal Jitendrabhai, "Transformer and Recurrent Architectures for Dynamics Prediction and Policy Learning on Long-Horizon Tasks" (2025). Computer Science and Engineering Theses. 537.
https://mavmatrix.uta.edu/cse_theses/537