Graduation Semester and Year

2012

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Manfred Huber

Abstract

Transfer learning and Abstraction are among the new and most interesting research topics in AI and address the use of learned knowledge to improve learning performance in subsequent tasks. While there has been significant recent work on this topic in fully observable domain, it has been less studied for Partially Observable MDPs. This thesis addresses the problem of transferring skills from the previous experiences in POMDP models using high-level actions (Options) in two different kind of algorithms: value iteration and expectation maximization. To do this, this thesis first proves that the optimal value function remains piecewise-linear and convex when policies are made of high-level actions, and explains how value iteration algorithms should be modified to support options. The resulting modifications could be applied to all existing variations of the value iteration and its benefit is demonstrated in an implementation with a basic value iteration algorithm. While the value iteration algorithm is useful for the smaller problems, it is strongly dependent on knowledge of the model. To address this, a second algorithm is developed. In particular, expectation maximization algorithm is modified to learn faster from a set of sampled experiments instead of using exact inference calculations. The goal here is not only to accelerate learning but also to reduce the learner's dependence on complete knowledge of the system model. Using this framework, it is also explained how to plug options in the model when learning the POMDP using a hierarchical EM algorithm. Experiments show how adding options could speed up the learning process.

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS