ORCID Identifier(s)

0009-0003-3848-8910

Graduation Semester and Year

Fall 2025

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Habeeb Olufowobi

Abstract

Training and deploying Machine Learning (ML) models introduce significant data confidentiality risks, as modern models can inadvertently memorize and leak information about their training data. While attacks such as membership inference and model inversion are well studied, the literature remains fragmented, with inconsistent threat models and unclear relationships across attack classes and defenses. This work presents a Systematization of Knowledge (SoK) that unifies the landscape of training-data privacy attacks and defenses, aligning them with the NIST Adversarial Machine Learning (AML) taxonomy to enable standardized threat modeling and comparison. Our analysis shows that, despite significant progress in characterizing attack vectors, defenses against training data privacy attacks remain incomplete and often fail to address the core utility–privacy trade-off, particularly for emerging paradigms like foundational models (e.g., LLMs).

We provide a unified framework, identify structural gaps in existing defenses, and outline core principles to guide the development of practical, scalable, and privacy-preserving ML systems.

Keywords

Machine Learning security, ML security, training data privacy, membership inference attack, model inversion attack, SoK, systematization of knowledge, data privacy

Disciplines

Computer and Systems Architecture | Other Computer Engineering

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Recommended Citation

Saeed, Mohammad Sufyaan, "Training Data Privacy in Machine Learning: A Systematization of Attacks and Defenses" (2025). Computer Science and Engineering Theses - Archive. 536.
https://mavmatrix.uta.edu/cse_theses/536

Download

Included in

Computer and Systems Architecture Commons, Other Computer Engineering Commons

COinS

Computer Science and Engineering Theses - Archive

Training Data Privacy in Machine Learning: A Systematization of Attacks and Defenses

ORCID Identifier(s)

Graduation Semester and Year

Language

Document Type

Degree Name

Department

First Advisor

Abstract

Keywords

Disciplines

License

Recommended Citation

Included in

Search

Browse

Author & Creator Corner

Computer Science and Engineering Theses - Archive

Training Data Privacy in Machine Learning: A Systematization of Attacks and Defenses

Author

ORCID Identifier(s)

Graduation Semester and Year

Language

Document Type

Degree Name

Department

First Advisor

Abstract

Keywords

Disciplines

License

Recommended Citation

Included in

Share

Search

Browse

Author & Creator Corner