ORCID Identifier(s)


Graduation Semester and Year




Document Type


Degree Name

Master of Science in Computer Science


Computer Science and Engineering

First Advisor

David Levine


Monitoring the health of large data centers is a major concern with the ever-increasing demand of grid/cloud computing and the higher need of computational power. In a High Performance Computing (HPC) environment, the need to maintain high availability makes monitoring tasks and hardware more daunting and demanding. As data centers grow it becomes hard to manage the complex interactions between different systems. Many open source systems have been implemented which give specific state of any individual machine using Nagios, Ganglia or Torque monitoring software. In this work we focus on the detection and prediction of data center anomalies by using a machine learning based approach. We present the idea of using monitoring data from multiple monitoring solutions and formulating a single high dimensional vector based model, which further is fed into a machine-learning algorithm. In this approach we will find patterns and associations among the different attributes of a data center, which remain hidden in the single system context. The use of disparate monitoring systems in conjunction will give a holistic view of the cluster with an increase in the probability of finding critical issues before they occur as well as alert the system administrator.


Machine learning, Data center


Computer Sciences | Physical Sciences and Mathematics


Degree granted by The University of Texas at Arlington