Graduation Semester and Year
2014
Language
English
Document Type
Thesis
Degree Name
Master of Science in Computer Science
Department
Computer Science and Engineering
First Advisor
David Levine
Abstract
A machine-to-machine (M2M) communications network hosts millions of heterogeneous devices such as for vehicle tracking, medical services, and home automation and security services. These devices exchange thousands of messages over cellular networks. These messages are Signaling System No. 7 (SS7) messages of various types like authentication, mobility management, and many more, resulting in tera bytes of SS7 signaling traffic data over a period of days. The data generated is diverse, depending on several factors like device activity, hardware manufacturers, and radio / tower interaction. This inherent diversity makes anomaly detection in a M2M network challenging. With millions of messages to analyze, high computation machines are necessary.In this thesis, an automated data mining framework on the cloud to detect anomalous devices in the traffic data is presented. Unsupervised learning is a useful tool here given the lack of static behavioral patterns of devices and numerous ways in which they can fail. Datasets extracted from a leading M2M service provider's network featuring millions of devices are analyzed. The case studies illustrate the anomaly patterns found. One case study identified 27% of devices with an unusual behavior in a dataset of 23k devices. A second case study spotted 0.07% of devices with anomalous behavior in a dataset with 350k devices. This places the spotlight on a small subset of devices for further investigation. In order to achieve the goal of finding anomalous devices, the clusters are labeled based on a few assumptions which are part of unsupervised anomaly detection and cluster analysis techniques. Then, k-nearest neighbors (k-NN) binary classification is applied to evaluate the labelling. This categorizes each device as either "anomalous" or "normal" with quantitative results like accuracy. To generate actionable information, the devices identified are analyzed in light of domain expertise, past events and auxiliary information like the type of the device and its purpose among others.?
Disciplines
Computer Sciences | Physical Sciences and Mathematics
License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Recommended Citation
Datta Kumar, Prathibha, "A Cloud Based Automated Anomaly Detection Framework" (2014). Computer Science and Engineering Theses. 295.
https://mavmatrix.uta.edu/cse_theses/295
Comments
Degree granted by The University of Texas at Arlington