Graduation Semester and Year




Document Type


Degree Name

Master of Science in Computer Science


Computer Science and Engineering

First Advisor

David Levine


A machine-to-machine (M2M) communications network hosts millions of heterogeneous devices such as for vehicle tracking, medical services, and home automation and security services. These devices exchange thousands of messages over cellular networks. These messages are Signaling System No. 7 (SS7) messages of various types like authentication, mobility management, and many more, resulting in tera bytes of SS7 signaling traffic data over a period of days. The data generated is diverse, depending on several factors like device activity, hardware manufacturers, and radio / tower interaction. This inherent diversity makes anomaly detection in a M2M network challenging. With millions of messages to analyze, high computation machines are necessary.In this thesis, an automated data mining framework on the cloud to detect anomalous devices in the traffic data is presented. Unsupervised learning is a useful tool here given the lack of static behavioral patterns of devices and numerous ways in which they can fail. Datasets extracted from a leading M2M service provider's network featuring millions of devices are analyzed. The case studies illustrate the anomaly patterns found. One case study identified 27% of devices with an unusual behavior in a dataset of 23k devices. A second case study spotted 0.07% of devices with anomalous behavior in a dataset with 350k devices. This places the spotlight on a small subset of devices for further investigation. In order to achieve the goal of finding anomalous devices, the clusters are labeled based on a few assumptions which are part of unsupervised anomaly detection and cluster analysis techniques. Then, k-nearest neighbors (k-NN) binary classification is applied to evaluate the labelling. This categorizes each device as either "anomalous" or "normal" with quantitative results like accuracy. To generate actionable information, the devices identified are analyzed in light of domain expertise, past events and auxiliary information like the type of the device and its purpose among others.?


Computer Sciences | Physical Sciences and Mathematics


Degree granted by The University of Texas at Arlington