Graduation Semester and Year




Document Type


Degree Name

Doctor of Philosophy in Computer Science


Computer Science and Engineering

First Advisor

Manfred Huber


Learning by trial and error and being able to form levels of abstraction from the past experience has been an important factor for sentient beings to develop intelligent behaviors and cope with an ever-changing environment. Complex control domains, in a similar way, often require the interacting agents to learn adaptive control strategies for time-varying or potentially evolving systems. This dissertation will begin by investigating an example of complex domains, Grid computing networks, through a collaborative effort in the design and implementation of a generic workload management system, PanDA-PF WMS used in the ATLAS experiment. With the incentive of boosting the performance of PanDA-PF WMS and increasing its applicability in a general resource-sharing environment, we will subsequently motivate an automated and adaptive learning approach that optimizes computational resource usage. From the experience of developing Grid applications such as PanDA-PF, we found that a flexible infrastructure still has its limit in performance both from the perspective of high-performance computing (HPC) and high-throughput computing (HTC). The key is that an optimal resource allocation strategy is highly contingent upon many factors hidden in the intricate dynamics behind the scene, including the task distribution and real-time resource profile in addition to compatibility between the user tasks and the allocated machines, etc. The reinforcement learning framework establishes a unique way of solving a wide range of control and planning tasks through the state space representation of the system over which the control policy unfolds as a sequence of control decisions toward a maximum payoff. Intuitively, reinforcement learning seems to be an ideal candidate among machine learning methods for developing an optimal resource allocation strategy that harvests free computation resources by learning their intricate dynamics.However, our hope in applying standard reinforcement learning in the context of resource allocation is diminished due to an inherent limitation in its representation. In particular, the control policy is often formulated from the perspective of decision theoretic planning (DTP) such that actions, as control decisions, are assumed to be atomic with fixed action semantics. Consequently, the derived policy in general lacks the ability in adapting to possible variations in the action outcomes or the action set itself due to versatility of the system. This would be a major barrier in learning an ideal resource allocation strategy where each compute resource is often characterized by time-varying properties that determine its performance. In addition, the available resource may be highly volatile depending on the resource-sharing infrastructure. In a dynamic computational cluster, for instance, the underlying resource is acquired on-demand in terms of distributed virtual machines that may not be persistently available to end users. As a consequence, the optimal strategy for task assignment learned earlier may not be strictly applicable in the future. Inspired by the challenge in complex domains like optimal resource sharing, this dissertation will progressively develop an extended reinforcement learning framework with a concept-driven learning architecture that enables adaptive policy learning over the abstraction of the progressively evolved samples of experience. In particular, we provide an alternative view of reinforcement learning by establishing the notion of the reinforcement field through a collection of policy-embedded particles gathered during the policy learning process. The reinforcement field serves as a policy generalization mechanism over correlated decisions through the use of kernel functions as a state correlation hypothesis in combination with Gaussian process regression as a value function approximator. Subsequently, through "kernelizing" the spectral clustering mechanism, the policy-learning experience retained in the memory of the agent can be further subdivided into a set of concept-driven abstract actions, each of which implicitly encodes a set of context-dependent local policies. We will show from a simulated task-assignment domain that the end result of our generalized reinforcement learning framework will enable both the learning of an action-oriented conceptual model and simultaneously deriving an optimal policy out of the high-level conceptual units. Moreover, to demonstrate the general applicability of our learning approach, we apply the work in a generalized navigation domain - the gridworld without the grid in which the agent is free to move in all directions with stochastic behaviors in actions and subsequently show the learning result in terms of both an improved learning curve and reinforcement field plots.


Computer Sciences | Physical Sciences and Mathematics


Degree granted by The University of Texas at Arlington