Mohit Singhal

Graduation Semester and Year




Document Type


Degree Name

Master of Science in Computer Science


Computer Science and Engineering

First Advisor

David Levine


With the increase in the usage of websites as the main source of information gathering, malicious activity especially drive-by download has exponentially increased. A drive-by download refers to unintentional download of malicious code to a user computer that leaves the user open to a cyberattack. It has become the preferred distribution vector for many malware families. Malware is any software intentionally designed to cause damage to a user computer. The purpose of this research is to analyze the malware that were obtained from visiting approximately 100,000 malicious URLs and then running these binaries in sandboxes and then analyzing their runtime behavior with a software tool (YARA) to categorize them and classify what malware family to which they belong. Out of the 1414 program executables (binaries) that were captured, 1000 binaries were executed and 99 were identified as false-positive. Out of the 1414 binaries that were extracted 959 of them were executable, 48% of the binaries were extracted from websites that were hosted in the US. We also found that 105 binaries had the same name but different hashes that is, they were not identical. Out of the 901 binaries, 867 of them were identified as Trojan Horse and we were able to identify 53 type of malware families, with one particular family, Kyrptik, having 176 malware belonging to it which is about 19% and about 4% of the malware families were not identified.


Malware, Drive-by download, Internet, Cyberattack, Software, Sandbox, Cuckoo, VMRay, YARA, Trojan horse


Computer Sciences | Physical Sciences and Mathematics


Degree granted by The University of Texas at Arlington