Computer Science and Engineering Theses

Evaluating the Robustness of GNN-Based Vulnerability Detectors Under Semantics-Preserving Code Obfuscation

Jesse KS Chumo, University of Texas at ArlingtonFollow

ORCID Identifier(s)

ORCID 0009-0003-6665-0823

Graduation Semester and Year

Spring 2026

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Habeeb Olufowobi

Second Advisor

Nadra Guizani

Third Advisor

Faysal Shezan

Fourth Advisor

Arkajyoti Mitra

Abstract

Graph neural network–based vulnerability detectors are typically evaluated on clean benchmark datasets, yet real-world code frequently undergoes semantics-preserving transformations such as identifier renaming, dead-code insertion, and control-flow restructuring. The extent to which such transformations affect detector reliability remains insufficiently understood. We evaluate ten vulnerability detectors from four architectural families across the Devign, Big-Vul, and DiverseVul datasets. To quantify robustness, we evaluate each model at three transformation budgets: one transform, two transforms combined, and all three together, finding that token-based models degrade under identifier renaming and compound transformations, while models that read only code structure are largely unaffected. We further evaluate a greedy minimum-budget attack that applies transformations incrementally to determine the minimum number required to alter model predictions. We also study DeepWukong as a case study on SARD, where each transformation breaks its pipeline at a different stage. We find that robustness depends mainly on how a model represents code rather than how well it performs on clean test data: models that rely on token information can often be altered by a single transformation. Dead-code insertion improves performance on Devign but reduces performance on Big-Vul, showing that robustness results do not always transfer across datasets. Under our setting, these results empirically indicate that benchmark accuracy on clean datasets alone is insufficient to assess deployment reliability and security, and that robustness to semantics-preserving transformations should be considered when designing and comparing vulnerability detectors.

Keywords

vulnerability detection, code obfuscation, robustness evaluation, program representation, neural detectors, graph neural network

Disciplines

Other Computer Engineering

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Recommended Citation

Chumo, Jesse KS, "Evaluating the Robustness of GNN-Based Vulnerability Detectors Under Semantics-Preserving Code Obfuscation" (2026). Computer Science and Engineering Theses. 2.
https://mavmatrix.uta.edu/cse_theses2/2

Download

Included in

Other Computer Engineering Commons

COinS

Computer Science and Engineering Theses

Evaluating the Robustness of GNN-Based Vulnerability Detectors Under Semantics-Preserving Code Obfuscation

ORCID Identifier(s)

Graduation Semester and Year

Language

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Abstract

Keywords

Disciplines

License

Recommended Citation

Included in

Search

Browse

Author & Creator Corner

Links

Computer Science and Engineering Theses

Evaluating the Robustness of GNN-Based Vulnerability Detectors Under Semantics-Preserving Code Obfuscation

Author

ORCID Identifier(s)

Graduation Semester and Year

Language

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Abstract

Keywords

Disciplines

License

Recommended Citation

Included in

Share

Search

Browse

Author & Creator Corner

Links