ORCID Identifier(s)

0009-0003-5277-8092

Graduation Semester and Year

Fall 2025

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Chengkai Li

Second Advisor

Vassilis Athitsos

Third Advisor

Shirin Nilizadeh

Fourth Advisor

Kenny Zhu

Abstract

The proliferation of misinformation in the modern digital ecosystem poses a critical threat to public trust and informed decision-making. While automated fact-checking has emerged as a necessary solution, existing approaches often suffer from a lack of transparency and struggle to verify claims against high-volume, structured data sources. This dissertation addresses these challenges by integrating Frame Semantics into the fact-checking pipeline, leveraging the structured representations to enhance both the interpretability and effectiveness of automated verification systems. We present a comprehensive body of work spanning fundamental improvements in frame-semantic parsing, the development of frame-guided fact-checking systems for structured data, the creation of large-scale benchmarks for high-volume data verification, and the deployment of accessible tools for unstructured claim verification.

First, we advance the state-of-the-art in Frame-Semantic Parsing (FSP), the foundational technology enabling our approach. We introduce a novel target identification algorithm that utilizes a Part-of-Speech (POS) tag-aware prefix tree combined with a RoBERTa-based classifier. This method effectively handles discontinuous multi-word targets---a long-standing challenge in the field---achieving a recall of 0.994 in candidate target generation and an F1 score of 0.775 in target identification, surpassing previous systems. For frame identification, we propose a negative sampling training strategy that significantly improves performance on under-represented and ambiguous frames, yielding a +2.9% accuracy gain on rare frames. Furthermore, we conduct an extensive study on the use of Large Language Models (LLMs) for argument identification. We demonstrate that while zero-shot performance is limited, fine-tuned LLMs, particularly Qwen 2.5, can achieve state-of-the-art results (+3.9% F1 improvement over previous systems) when using JSON-based input representations. We also explore unifying the FSP pipeline, finding that leveraging predicted frame elements as evidence for frame identification yields superior results on ambiguous targets (0.862 accuracy).

Building on these parsing advancements, we introduce ClaimLens, an end-to-end fact-checking system that utilizes extracted semantic frames to guide evidence retrieval from structured databases. We evaluate ClaimLens through two detailed case studies: one focusing on U.S. Congressional voting records using the Vote frame and another on international statistics from the Organisation for Economic Co-operation and Development (OECD) using a set of 7 different frames. To support the Vote case study, we develop novel data augmentation techniques---frame element interleaving and permutation---which significantly improve parsing robustness. Our experiments demonstrate that querying databases using extracted frame elements outperforms standard full-claim retrieval methods for voting claims and OECD claims.

To address the scarcity of benchmarks for fact-checking against high-volume structured data, we present MegaTab, a large-scale, multilingual dataset comprising 78,503 claims linked to 434 OECD data tables. Unlike previous datasets that rely on small, curated tables, MegaTab reflects the complexity of real-world statistical databases. We developed a rigorous generation pipeline involving six distinct claim types inspired by common semantic frames and a dual-LLM judge verification process to ensure high quality. We establish a baseline system for this dataset that decomposes claims into atomic sub-claims and generates executable SQL queries. Our baseline significantly outperforms state-of-the-art tabular reasoning systems like TabSQLify (36% vs. 6.4% accuracy), underscoring the difficulty of the task. Additionally, our analysis reveals that LLMs possess negligible internal knowledge of these specific statistical facts (0.4% recall), confirming that performance on MegaTab is driven by reasoning and retrieval rather than memorization.

Finally, we introduce ClaimCheck, a complementary system designed for verifying claims against unstructured Web evidence. ClaimCheck employs a modular, LLM-guided architecture that mirrors human fact-checking workflows, including query planning, evidence summarization, and synthesis. Despite using a relatively small 4-billion parameter model (Qwen3-4B), ClaimCheck achieves state-of-the-art accuracy (76.4%) on the AVeriTeC benchmark, outperforming systems that rely on significantly larger proprietary models. This demonstrates that efficient, transparent, and accessible fact-checking is achievable through careful system design. Together, the contributions of this dissertation---from robust semantic parsing algorithms to large-scale datasets and deployable systems---establish a new paradigm for interpretable, data-driven automated fact-checking.

Keywords

Frame semantics, Fact checking, Misinformation, Large language models, Machine learning

Disciplines

American Politics | Computational Linguistics | Computer Engineering | Semantics and Pragmatics

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Recommended Citation

Devasier, Jacob, "TOWARDS INTERPRETABLE AUTOMATIC FACT-CHECKING WITH FRAME SEMANTICS" (2025). Computer Science and Engineering Dissertations. 429.
https://mavmatrix.uta.edu/cse_dissertations/429

Download

Available for download on Saturday, January 09, 2027

Included in

American Politics Commons, Computational Linguistics Commons, Computer Engineering Commons, Semantics and Pragmatics Commons

COinS

Computer Science and Engineering Dissertations

TOWARDS INTERPRETABLE AUTOMATIC FACT-CHECKING WITH FRAME SEMANTICS

ORCID Identifier(s)

Graduation Semester and Year

Language

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Abstract

Keywords

Disciplines

License

Recommended Citation

Included in

Search

Browse

Author & Creator Corner

Links

Computer Science and Engineering Dissertations

TOWARDS INTERPRETABLE AUTOMATIC FACT-CHECKING WITH FRAME SEMANTICS

Author

ORCID Identifier(s)

Graduation Semester and Year

Language

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Abstract

Keywords

Disciplines

License

Recommended Citation

Included in

Share

Search

Browse

Author & Creator Corner

Links