Graduation Semester and Year
Spring 2026
Language
English
Document Type
Thesis
Degree Name
Master of Science in Computer Science
Department
Computer Science and Engineering
First Advisor
Habeeb Olufowobi
Second Advisor
Nadra Guizani
Third Advisor
Faysal Shezan
Fourth Advisor
Arkajyoti Mitra
Abstract
Graph neural network–based vulnerability detectors are typically evaluated on clean benchmark datasets, yet real-world code frequently undergoes semantics-preserving transformations such as identifier renaming, dead-code insertion, and control-flow restructuring. The extent to which such transformations affect detector reliability remains insufficiently understood. We evaluate ten vulnerability detectors from four architectural families across the Devign, Big-Vul, and DiverseVul datasets. To quantify robustness, we evaluate each model at three transformation budgets: one transform, two transforms combined, and all three together, finding that token-based models degrade under identifier renaming and compound transformations, while models that read only code structure are largely unaffected. We further evaluate a greedy minimum-budget attack that applies transformations incrementally to determine the minimum number required to alter model predictions. We also study DeepWukong as a case study on SARD, where each transformation breaks its pipeline at a different stage. We find that robustness depends mainly on how a model represents code rather than how well it performs on clean test data: models that rely on token information can often be altered by a single transformation. Dead-code insertion improves performance on Devign but reduces performance on Big-Vul, showing that robustness results do not always transfer across datasets. Under our setting, these results empirically indicate that benchmark accuracy on clean datasets alone is insufficient to assess deployment reliability and security, and that robustness to semantics-preserving transformations should be considered when designing and comparing vulnerability detectors.
Keywords
vulnerability detection, code obfuscation, robustness evaluation, program representation, neural detectors, graph neural network
Disciplines
Other Computer Engineering
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Recommended Citation
Chumo, Jesse KS, "Evaluating the Robustness of GNN-Based Vulnerability Detectors Under Semantics-Preserving Code Obfuscation" (2026). Computer Science and Engineering Theses. 2.
https://mavmatrix.uta.edu/cse_theses2/2