Graduation Semester and Year
Spring 2026
Language
English
Document Type
Thesis
Degree Name
Master of Science in Computer Science
Department
Computer Science and Engineering
First Advisor
Jeff Lei
Second Advisor
Faysal Hossain Shezan
Third Advisor
Christoph Csallner
Abstract
Anonymized summarization aims to produce useful summaries while replacing person names with placeholders such as [PERSON]. In practice, privacy is often evaluated by checking that no real names appear in the generated text. However, suppressing a name in the output does not guarantee that the model has removed identity information from its internal computation. This gap is critical in collaborative inference frameworks, where intermediate internal states (ISs) are exposed for safety auditing. We study this question in a controlled setting where a language model reads a document containing a real name and is required to generate a summary that replaces that name with [PERSON]. We ask two questions: does the model still internally encode the true identity at the point where anonymization occurs, and does generating the placeholder actually depend on that identity information? Across two instruction-tuned 7–8B decoder-only language models evaluated on CNN/DailyMail, we find that identity remains strongly and linearly recoverable from hidden representations during anonymized summarization. Yet removing identity-related signal from the internal state used for prediction does not impair the model’s ability to generate [PERSON]. When the model is instead required to produce the real name, the same intervention causes substantial increase in generation loss, confirming that the method detects true dependence. These results show that output-level anonymization can hide names in text while leaving identity information intact inside the model, revealing a privacy risk that output inspection alone cannot detect.
Keywords
AI/ML Interpretability, AI/ML Security and Privacy, Natural Language Processing, Large Language Models Privacy
Disciplines
Computational Engineering
License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Recommended Citation
Begum, Samreen, "MECHANISTIC AUDITING OF PRIVACY RETENTION IN LLM SUMMARIZATION" (2026). Computer Science and Engineering Theses-Archive. 543.
https://mavmatrix.uta.edu/cse_theses/543