Graduation Semester and Year

Fall 2025

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Chengkai Li

Second Advisor

Kenny Zhu

Third Advisor

Negin Fraidouni

Abstract

This thesis studies how modern decoder-based large language models can be used for sentence-level check-worthiness classification and how their predictions are affected by demographic bias. Existing systems for identifying check-worthy claims rely mainly on encoder-based models and provide limited analysis of model fairness. To address these gaps, this work extends the ClaimBuster dataset by adding new debate transcripts up to the 2024 election cycle and creates an enlarged benchmark called ClaimHack. In addition, a counterfactual augmentation procedure is developed to generate multiple demographic variants of claims, and a Politifact-based evaluation dataset is constructed to measure model sensitivity to protected attribute substitutions.

Several decoder-based language models, including members of the LLaMA, Phi, Mistral, and Qwen families, are fine-tuned using parameter-efficient methods (QLoRA) for binary check-worthiness classification. The models are evaluated under regular and counterfactually augmented training regimes using standard metrics such as F1, precision, recall, and accuracy. A set of quantitative bias metrics is introduced to measure how much model predictions change when only demographic terms in a claim are altered.

Results show that decoder-based models achieve strong performance on the extended dataset after fine-tuning. While models exhibit measurable demographic sensitivity, counterfactual augmentation reduces bias across most protected groups without sacrificing accuracy. This study provides an updated dataset, a fairness evaluation framework, and practical guidelines for building trustworthy and fair check-worthiness detection systems.

Keywords

Classification, Check worthiness, Misinformation

Disciplines

Artificial Intelligence and Robotics | Data Science | Theory and Algorithms

License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Available for download on Friday, January 08, 2027

Share

COinS