ORCID Identifier(s)

ORCID 0009-0003-6665-0823

Graduation Semester and Year

Spring 2026

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Habeeb Olufowobi

Second Advisor

Nadra Guizani

Third Advisor

Faysal Shezan

Fourth Advisor

Arkajyoti Mitra

Abstract

Graph neural network–based vulnerability detectors are typically evaluated on clean benchmark datasets, yet real-world code frequently undergoes semantics-preserving transformations such as identifier renaming, dead-code insertion, and control-flow restructuring. The extent to which such transformations affect detector reliability remains insufficiently understood. We evaluate ten vulnerability detectors from four architectural families across the Devign, Big-Vul, and DiverseVul datasets. To quantify robustness, we evaluate each model at three transformation budgets: one transform, two transforms combined, and all three together, finding that token-based models degrade under identifier renaming and compound transformations, while models that read only code structure are largely unaffected. We further evaluate a greedy minimum-budget attack that applies transformations incrementally to determine the minimum number required to alter model predictions. We also study DeepWukong as a case study on SARD, where each transformation breaks its pipeline at a different stage. We find that robustness depends mainly on how a model represents code rather than how well it performs on clean test data: models that rely on token information can often be altered by a single transformation. Dead-code insertion improves performance on Devign but reduces performance on Big-Vul, showing that robustness results do not always transfer across datasets. Under our setting, these results empirically indicate that benchmark accuracy on clean datasets alone is insufficient to assess deployment reliability and security, and that robustness to semantics-preserving transformations should be considered when designing and comparing vulnerability detectors.

Keywords

vulnerability detection, code obfuscation, robustness evaluation, program representation, neural detectors, graph neural network

Disciplines

Other Computer Engineering

License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.