Computer Science and Engineering Dissertations

EXAMINING THE UTILITY AND VULNERABILITY OF LARGE LANGUAGE MODELS FOR FACTUAL CLAIM ORGANIZATION AND VERIFICATION

Haiqi Zhang, University of Texas at ArlingtonFollow

ORCID Identifier(s)

0009-0003-0638-7605

Graduation Semester and Year

Fall 2025

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Chengkai Li

Second Advisor

Shirin Nilizadeh

Third Advisor

Jun Yang

Fourth Advisor

Kenny Zhu

Abstract

The modern digital information ecosystem is defined by the rapid, large-scale production and dissemination of factual claims across social media and online news platforms. While this ecosystem enables unprecedented access to information, it simultaneously exacerbates two intertwined challenges that threaten both human understanding and the reliability of AI-driven systems. First, the sheer volume, redundancy, and topical diversity of factual claims render manual organization and analysis infeasible, while existing automated methods lack the semantic granularity and interpretability required for meaningful exploration. Second, even when individual statements are factually correct, selective presentation of evidence (commonly known as cherry-picking) can distort narratives, mislead audiences, and introduce subtle informational bias into the data pipelines that increasingly train and inform large language models (LLMs). This dissertation addresses these challenges by examining both the utility of LLMs for structuring factual claims at scale and their vulnerability to selectively biased information.

The first major contribution of this dissertation is LLMTaxo, a novel, end-to-end framework for the automated construction of fine-grained hierarchical taxonomies of factual claims within a topic domain from social media data. LLMTaxo is designed to transform massive, noisy collections of user-generated content into structured semantic representations that support interpretability, navigation, and downstream analysis. The framework integrates multiple components: 1) check-worthy claim detection to filter irrelevant content, 2) semantic clustering to identify distinct claims and reduce redundancy, 3) and prompt-based topic generation using LLMs to assign claims to a three-level hierarchy of broad, medium, and detailed topics. To stabilize topic generation and mitigate uncontrolled label proliferation, LLMTaxo incorporates a human-in-the-loop process for creating learning examples and a seed taxonomy, enabling few-shot prompting that guides LLMs toward consistent, conceptually coherent outputs.

This dissertation formally defines a taxonomy not merely as a set of labels, but as a structured semantic scaffold that encodes a general-to-specific interpretive pathway. Each topic inherits contextual meaning from its position in the hierarchy, allowing identical labels to represent distinct concepts under different parents. By adopting a single-inheritance tree structure, the taxonomy ensures unambiguous semantic trajectories, facilitating consistent annotation, aggregation, and evaluation.

To rigorously assess taxonomy quality, this dissertation introduces new evaluation metrics. These metrics measure the taxonomy's clarity, hierarchical coherence, orthogonality, and completeness, as well as claim-topic alignment and the topic's granularity. Both automated and human-centered evaluation are adopted to reduce the evaluation bias. Extensive experiments are conducted on three large-scale social media datasets from X and Facebook, spanning diverse domains, including COVID-19 vaccines, climate change, and cybersecurity. Results demonstrate that LLMTaxo produces compact, interpretable, and semantically coherent taxonomies, reducing topic fragmentation by up to 99.5\% compared to prompting approaches without structural guidance. The evaluations show high reliability, with strong inter-annotator agreement, confirming that the generated taxonomies align well with human conceptual organization.

The fine-grained taxonomies produced by LLMTaxo enable new analytical capabilities that are difficult to achieve with coarse topic labels. This dissertation demonstrates how taxonomy-guided organization supports granular analysis of public truthfulness stances toward factual claims, revealing how user attitudes vary not only by claim veracity but also by specific topical subdomains. Building on this foundation, the dissertation presents TrustMap, an interactive system that integrates hierarchical claim organization with truthfulness stance detection and geospatial analysis to visualize how social media users across regions respond to true, false, and mixed claims. In addition, this dissertation also presents a work to analyze social media users' truthfulness stance across different topics under the climate change domain specifically. These applications illustrate how structured claim taxonomies can transform unstructured discourse into interpretable knowledge representations that support social sensing and misinformation research.

The second major contribution of this dissertation investigates cherry-picking as a critical yet understudied form of informational bias. Unlike outright misinformation, cherry-picking constructs misleading narratives through the selective inclusion of factually correct statements while omitting equally important counterevidence. This makes cherry-picking difficult to detect and particularly dangerous in the context of LLMs, which are trained on and conditioned by large corpora of online text. The dissertation proposes an importance-based computational methodology for detecting cherry-picked content in news articles by identifying missing but salient information necessary for a balanced presentation. A dedicated dataset is introduced to support the study of cherry-picking detection.

To further study the impact of cherry-picking on LLMs, the dissertation presents the first systematic investigation of how cherry-picked evidence influences the belief states of modern LLMs. Through controlled experimental designs, the work evaluates multiple state-of-the-art LLMs under varying evidence conditions, disentangling the effects of selective factual evidence from user stance. The results demonstrate that LLMs are consistently susceptible to informational bias introduced by cherry-picking, exhibiting significant shifts in belief when different factual evidence is provided. These findings reveal the LLMs' vulnerability to cherry-picked information, highlighting the risks of deploying LLMs in high-stakes reasoning environments without safeguards against selective truth.

Overall, this dissertation advances the state of the art in computational analysis of factual claims by contributing: 1) a novel framework for fine-grained taxonomy construction using LLMs, 2) rigorous evaluation metrics for assessing taxonomy quality and claim–topic alignment, 3) real-world analytical systems that leverage hierarchical claim organization to study public discourse, and 4) a novel empirical understanding of how selective evidence undermines LLM belief reliability. By unifying structural organization and bias analysis, this work provides both practical tools and conceptual insights for building more interpretable, reliable, and trustworthy AI systems in an increasingly complex information landscape.

Keywords

Taxonomy, Factual claim, Social media, LLM

Disciplines

Artificial Intelligence and Robotics | Data Science | Social Media

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Recommended Citation

Zhang, Haiqi, "EXAMINING THE UTILITY AND VULNERABILITY OF LARGE LANGUAGE MODELS FOR FACTUAL CLAIM ORGANIZATION AND VERIFICATION" (2025). Computer Science and Engineering Dissertations. 430.
https://mavmatrix.uta.edu/cse_dissertations/430

Download

Available for download on Saturday, January 09, 2027

Included in

Artificial Intelligence and Robotics Commons, Data Science Commons, Social Media Commons

COinS

Computer Science and Engineering Dissertations

EXAMINING THE UTILITY AND VULNERABILITY OF LARGE LANGUAGE MODELS FOR FACTUAL CLAIM ORGANIZATION AND VERIFICATION

ORCID Identifier(s)

Graduation Semester and Year

Language

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Abstract

Keywords

Disciplines

License

Recommended Citation

Included in

Search

Browse

Author & Creator Corner

Links

Computer Science and Engineering Dissertations

EXAMINING THE UTILITY AND VULNERABILITY OF LARGE LANGUAGE MODELS FOR FACTUAL CLAIM ORGANIZATION AND VERIFICATION

Author

ORCID Identifier(s)

Graduation Semester and Year

Language

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Abstract

Keywords

Disciplines

License

Recommended Citation

Included in

Share

Search

Browse

Author & Creator Corner

Links