Graduation Semester and Year
Fall 2025
Language
English
Document Type
Thesis
Degree Name
Master of Science in Computer Science
Department
Computer Science and Engineering
First Advisor
Habeeb Olufowobi
Abstract
Training and deploying Machine Learning (ML) models introduce significant data confidentiality risks, as modern models can inadvertently memorize and leak information about their training data. While attacks such as membership inference and model inversion are well studied, the literature remains fragmented, with inconsistent threat models and unclear relationships across attack classes and defenses. This work presents a Systematization of Knowledge (SoK) that unifies the landscape of training-data privacy attacks and defenses, aligning them with the NIST Adversarial Machine Learning (AML) taxonomy to enable standardized threat modeling and comparison. Our analysis shows that, despite significant progress in characterizing attack vectors, defenses against training data privacy attacks remain incomplete and often fail to address the core utility–privacy trade-off, particularly for emerging paradigms like foundational models (e.g., LLMs).
We provide a unified framework, identify structural gaps in existing defenses, and outline core principles to guide the development of practical, scalable, and privacy-preserving ML systems.
Keywords
Machine Learning security, ML security, training data privacy, membership inference attack, model inversion attack, SoK, systematization of knowledge, data privacy
Disciplines
Computer and Systems Architecture | Other Computer Engineering
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Recommended Citation
Saeed, Mohammad Sufyaan, "Training Data Privacy in Machine Learning: A Systematization of Attacks and Defenses" (2025). Computer Science and Engineering Theses. 536.
https://mavmatrix.uta.edu/cse_theses/536