Graduation Semester and Year
2012
Language
English
Document Type
Thesis
Degree Name
Master of Science in Computer Science
Department
Computer Science and Engineering
First Advisor
Manfred Huber
Abstract
Transfer learning and Abstraction are among the new and most interesting research topics in AI and address the use of learned knowledge to improve learning performance in subsequent tasks. While there has been significant recent work on this topic in fully observable domain, it has been less studied for Partially Observable MDPs. This thesis addresses the problem of transferring skills from the previous experiences in POMDP models using high-level actions (Options) in two different kind of algorithms: value iteration and expectation maximization. To do this, this thesis first proves that the optimal value function remains piecewise-linear and convex when policies are made of high-level actions, and explains how value iteration algorithms should be modified to support options. The resulting modifications could be applied to all existing variations of the value iteration and its benefit is demonstrated in an implementation with a basic value iteration algorithm. While the value iteration algorithm is useful for the smaller problems, it is strongly dependent on knowledge of the model. To address this, a second algorithm is developed. In particular, expectation maximization algorithm is modified to learn faster from a set of sampled experiments instead of using exact inference calculations. The goal here is not only to accelerate learning but also to reduce the learner's dependence on complete knowledge of the system model. Using this framework, it is also explained how to plug options in the model when learning the POMDP using a hierarchical EM algorithm. Experiments show how adding options could speed up the learning process.
Disciplines
Computer Sciences | Physical Sciences and Mathematics
License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Recommended Citation
Janzadeh, Hamed, "Learning Partially Observable Markov Decision Processes Using Abstract Actions" (2012). Computer Science and Engineering Theses. 48.
https://mavmatrix.uta.edu/cse_theses/48
Comments
Degree granted by The University of Texas at Arlington