Graduation Semester and Year
Fall 2025
Language
English
Document Type
Dissertation
Degree Name
Doctor of Philosophy in Computer Science
Department
Computer Science and Engineering
First Advisor
Chengkai Li
Second Advisor
Vassilis Athitsos
Third Advisor
Shirin Nilizadeh
Fourth Advisor
Kenny Zhu
Abstract
The proliferation of misinformation in the modern digital ecosystem poses a critical threat to public trust and informed decision-making. While automated fact-checking has emerged as a necessary solution, existing approaches often suffer from a lack of transparency and struggle to verify claims against high-volume, structured data sources. This dissertation addresses these challenges by integrating Frame Semantics into the fact-checking pipeline, leveraging the structured representations to enhance both the interpretability and effectiveness of automated verification systems. We present a comprehensive body of work spanning fundamental improvements in frame-semantic parsing, the development of frame-guided fact-checking systems for structured data, the creation of large-scale benchmarks for high-volume data verification, and the deployment of accessible tools for unstructured claim verification.
First, we advance the state-of-the-art in Frame-Semantic Parsing (FSP), the foundational technology enabling our approach. We introduce a novel target identification algorithm that utilizes a Part-of-Speech (POS) tag-aware prefix tree combined with a RoBERTa-based classifier. This method effectively handles discontinuous multi-word targets---a long-standing challenge in the field---achieving a recall of 0.994 in candidate target generation and an F1 score of 0.775 in target identification, surpassing previous systems. For frame identification, we propose a negative sampling training strategy that significantly improves performance on under-represented and ambiguous frames, yielding a +2.9% accuracy gain on rare frames. Furthermore, we conduct an extensive study on the use of Large Language Models (LLMs) for argument identification. We demonstrate that while zero-shot performance is limited, fine-tuned LLMs, particularly Qwen 2.5, can achieve state-of-the-art results (+3.9% F1 improvement over previous systems) when using JSON-based input representations. We also explore unifying the FSP pipeline, finding that leveraging predicted frame elements as evidence for frame identification yields superior results on ambiguous targets (0.862 accuracy).
Building on these parsing advancements, we introduce ClaimLens, an end-to-end fact-checking system that utilizes extracted semantic frames to guide evidence retrieval from structured databases. We evaluate ClaimLens through two detailed case studies: one focusing on U.S. Congressional voting records using the Vote frame and another on international statistics from the Organisation for Economic Co-operation and Development (OECD) using a set of 7 different frames. To support the Vote case study, we develop novel data augmentation techniques---frame element interleaving and permutation---which significantly improve parsing robustness. Our experiments demonstrate that querying databases using extracted frame elements outperforms standard full-claim retrieval methods for voting claims and OECD claims.
To address the scarcity of benchmarks for fact-checking against high-volume structured data, we present MegaTab, a large-scale, multilingual dataset comprising 78,503 claims linked to 434 OECD data tables. Unlike previous datasets that rely on small, curated tables, MegaTab reflects the complexity of real-world statistical databases. We developed a rigorous generation pipeline involving six distinct claim types inspired by common semantic frames and a dual-LLM judge verification process to ensure high quality. We establish a baseline system for this dataset that decomposes claims into atomic sub-claims and generates executable SQL queries. Our baseline significantly outperforms state-of-the-art tabular reasoning systems like TabSQLify (36% vs. 6.4% accuracy), underscoring the difficulty of the task. Additionally, our analysis reveals that LLMs possess negligible internal knowledge of these specific statistical facts (0.4% recall), confirming that performance on MegaTab is driven by reasoning and retrieval rather than memorization.
Finally, we introduce ClaimCheck, a complementary system designed for verifying claims against unstructured Web evidence. ClaimCheck employs a modular, LLM-guided architecture that mirrors human fact-checking workflows, including query planning, evidence summarization, and synthesis. Despite using a relatively small 4-billion parameter model (Qwen3-4B), ClaimCheck achieves state-of-the-art accuracy (76.4%) on the AVeriTeC benchmark, outperforming systems that rely on significantly larger proprietary models. This demonstrates that efficient, transparent, and accessible fact-checking is achievable through careful system design. Together, the contributions of this dissertation---from robust semantic parsing algorithms to large-scale datasets and deployable systems---establish a new paradigm for interpretable, data-driven automated fact-checking.
Keywords
Frame semantics, Fact checking, Misinformation, Large language models, Machine learning
Disciplines
American Politics | Computational Linguistics | Computer Engineering | Semantics and Pragmatics
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Recommended Citation
Devasier, Jacob, "TOWARDS INTERPRETABLE AUTOMATIC FACT-CHECKING WITH FRAME SEMANTICS" (2025). Computer Science and Engineering Dissertations. 429.
https://mavmatrix.uta.edu/cse_dissertations/429
Included in
American Politics Commons, Computational Linguistics Commons, Computer Engineering Commons, Semantics and Pragmatics Commons