ORCID Identifier(s)

0009-0002-8492-0551

Graduation Semester and Year

Summer 2025

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Chengkai Li

Second Advisor

Vassilis Athitsos

Third Advisor

Shirin Nilizadeh

Fourth Advisor

Gautam Das

Abstract

Misinformation on social media has become a pervasive issue that profoundly influences public opinion and decision-making. As false or misleading claims circulate widely online, there is a critical need for analytical tools to understand how people react to such claims. This dissertation introduces the concept of truthfulness stance as a key lens for social sensing. In essence, truthfulness stance assesses whether a textual utterance believes a factual claim to be true, false, or expresses a neutral stance or no stance toward the claim. Leveraging stance in this manner fills an important gap in misinformation research: it enables us to gauge the public’s collective judgment on what is true or false, thereby offering a proxy measure of misinformation acceptance or rejection at scale. Understanding these stances is vital, as they can reveal how misinformation spreads and influences society, informing strategies in political decision-making and public health interventions.

Despite a rich body of work on stance detection, prior studies have often used varying definitions for “stance,” leading to a fragmented understanding of the concept. This dissertation provides the first in-depth conceptual framework unifying these definitions. In doing so, it delineates how truthfulness stance relates to or differs from other stance types. To place truthfulness stance in context, we propose a conceptual framework that generalizes stance definitions across different studies. Typically, any stance instance involves four components: (1) an utterance (e.g., a social media post or a news article in which the stance is expressed), (2) a target (the entity, topic, or factual claim that the stance is about), (3) an orientation of the stance (e.g., positive, neutral/no stance, negative), and (4) the type of stance being expressed (e.g., favorability, likelihood, or truthfulness).

Since truthfulness stance differs from conventional definitions of stance, there is a pressing need for a dedicated dataset that captures this distinction. To address this gap, this dissertation introduces a novel labeled dataset, TSD-CT, which stands for Truthfulness Stance Detection for Claim–Tweet pairs. TSD-CT contains 5,331 claim–tweet pairs, each consisting of a factual claim and a corresponding social media post (specifically, a tweet) expressing a truthfulness stance toward that claim. The claims were sourced from PolitiFact, a popular fact-checking website. For each claim, we collected tweets that mention or discuss the claim. We then collected human annotations that label the stance of each tweet as “positive,” “negative,” “neutral/no stance,” or “different topic.” These labels correspond to whether the tweet believes the claim is true, believes it is false, is unsure or neutral, does not take a stance, or if the tweet is unrelated to the claim. We included a fifth category, “problematic,” during annotation to flag tweets that were nonsensical (e.g., pure sarcasm or invalid content). The annotations were performed using an in-house web platform with rigorous quality control. Annotators, who were recruited and trained from the university community, were provided with detailed guidelines and examples for each stance category. Multiple mechanisms were in place to monitor agreement and consistency, including an administrative dashboard to track annotator performance in real time. The resulting dataset is publicly available on Zenodo, providing a resource for advancing research in computational social science and social media analysis.

At the core of our approach to identifying the truthfulness stance of social media posts is RATSD, a novel framework for automated stance classification with respect to factual claims. RATSD stands for Retrieval-Augmented Truthfulness Stance Detection, reflecting its hybrid design that combines large language models (LLMs) with information retrieval techniques. The motivation behind RATSD is to overcome the key challenge in stance detection: contextual understanding. Social media posts such as Tweets are often short, filled with colloquialisms or sarcasm, and may lack context, which makes it hard for a model to determine a tweet’s stance towards a claim in isolation. Our idea is to empower the model with additional knowledge and a reasoning process before it makes a stance judgment. Notably, to the best of our knowledge, this work is the first to apply RAG techniques to stance detection. Our experiment results demonstrate that injecting contextual knowledge in this way substantially improves a model’s performance on the task.

We rigorously evaluated RATSD on multiple datasets to validate its effectiveness. First, we tested on our TSD-CT dataset, using it as a benchmark for truthfulness stance detection. Additionally, we included three existing stance datasets in the evaluation: SemEval-2019 (a collection of tweets from a shared task on stance toward the veracity of rumors), the WT–WT (Will-They-Won’t-They) dataset (a large stance dataset focused on predicting outcomes in the financial domain), and COVIDLies (tweets annotated for stance regarding COVID-19 related misinformation). We compared RATSD with several state-of-the-art baseline models from recent literature. The results show a clear advantage for RATSD. Our framework outperforms state-of-the-art methods on all datasets, with particularly strong gains on the TSD-CT dataset. An ablation study confirmed that the components of the framework each contribute to these gains: removing any component led to notable drops in performance.

A key aspect of this research is demonstrating how truthfulness stance detection can power practical tools for misinformation monitoring and analysis. We developed and deployed several proof-of-concept applications that utilize our stance detection approach to provide actionable insights. We created TrustMap, a web-based interactive tool that visualizes the aggregate truthfulness stances of social media posts across different geographic regions. TrustMap ingests streams of tweets about various factual claims and applies our RATSD framework to label each tweet as positive, neutral/no stance, or negative. It clusters and displays these tweets on a map of the United States, allowing users to explore how different regions respond to specific claims under certain topics. The geographical patterns unveiled by TrustMap help researchers and policymakers identify regional variations in misinformation belief. By connecting stance detection with geospatial analysis, TrustMap offers a novel perspective on public engagement with factual claims. We also applied our stance detection framework to the topic of climate change. We collected climate-related factual claims and tweets related to them. Using a variant of RATSD framework, we analyzed public perceptions on climate-related issues. The findings were telling: the public tends to believe most climate-related claims are true, regardless of the claims’ actual veracity, indicating a concerning bias toward accepting information at face value. By identifying where people are overly credulous or confused about facts, stakeholders (such as science communicators or environmental agencies) can better target their educational efforts. Another application is a COVID-19 misinformation dashboard. This dashboard was developed during the COVID-19 pandemic to help track and mitigate the so-called “COVID-19 misinfodemic.” We curated a catalog of known COVID-19 facts (e.g., “Vaccines reduce transmission”) and debunked myths (e.g., “5G spreads the virus”),

and then monitored Twitter to observe how frequently these pieces of information were appearing and whether users were endorsing or rejecting them. Using a BERT-based stance detection model (an early version of our system), the dashboard could match tweets to the closest factual claim or myth and determine the tweet’s stance toward it. This enabled the platform to display, for any U.S. region, which COVID-related rumors were most prevalent and whether the local Twitter discourse was pushing back against those rumors or amplifying them. Such a tool provided health officials and the public with situational awareness of the misinformation landscape during the pandemic. Collectively, these applications show that the methods developed in this dissertation are not merely theoretical. They can be operationalized to tackle real-world challenges.

In conclusion, this dissertation contributes a novel perspective and a toolkit for understanding misinformation through the lens of truthfulness stance. We demonstrated a full pipeline from conceptual foundations and data resources to algorithms and deployment. The approach has proven effective in controlled experiments and useful in practical scenarios. Going forward, we envision truthfulness stance analysis becoming an integral part of misinformation research, helping scholars and practitioners chart the landscape of truth and lies in social media discourse. Ultimately, by shedding light on collective truth perceptions, our work aids in fostering a more informed and resilient society in the digital age.

Keywords

Social Sensing, Misinformation, Truthfulness Stance Detection

Disciplines

Artificial Intelligence and Robotics | Social Influence and Political Communication | Social Media | Theory and Algorithms

License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Available for download on Wednesday, August 12, 2026

Share

COinS