ORCID Identifier(s)

0000-0002-3951-6886

Graduation Semester and Year

2016

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Unknown

Abstract

Public figures such as politicians make claims about "facts" all the time. Oftentimes there are false, exaggerated and misleading claims on important topics, due to careless mistakes and even deliberate manipulation of information. With technology and modern day media helping spread information to mass audiences through all types of channels, there is a pressing need for checking the veracity of factual claims important to the public. Journalists and citizens spend a good amount of time doing that. More and more dedicated platforms and institutes are being created for fact-checking. This nascent genre of investigative reporting has become a basic feature of political coverage, especially during elections, and plays an important role in improving political discourse and increasing democratic accountability. Part of the goal of computational journalism is use computing to automate fact-checking. There are many computational and journalistic challenges toward a fully automated fact-checking system. This dissertation presents these challenges and focuses on the research areas where breakthroughs are needed. Toward automated fact-checking, we developed tools to find check-worthy factual claims from natural language sentences. Specifically, we prepared a U.S. presidential debate dataset and built classification models to distinguish check-worthy factual claims from non-factual claims and unimportant factual claims. We also identified the most effective features based on their impact on the classification models' accuracy. We built a platform, ClaimBuster, which uses the classification model and presents check-worthy factual claims spoken during all the 2016 U.S. presidential election primary debates. Like for automated fact-checking, advanced computation techniques are also necessary for newsworthy fact discovery, especially from live events. Reporters always try hard to bring out attention-seizing factual statements backed by data, which may lead to news stories and investigation. Facts can be stated on data from domains outside of sports and social media, including stock data, weather data, and criminal records. These facts are not only interesting to reporters but also useful to financial analysts, scientists, and citizens. Database and data mining researchers have started to push the frontiers of automated significant fact discovery and monitoring. This dissertation addresses the problem of significant facts monitoring during live events such as a basketball game, hourly weather updates and so on. Technically, we consider an ever-growing table of objects with dimension and measure attributes. We define situational fact, a "contextual" skyline tuple that stands out against historical tuples in a context when a set of measure attributes are compared. A context is specified by a conjunctive constraint involving dimension attributes. New tuples are constantly added to the table, reflecting events happening in the real world in a live fashion. The goal is to discover constraint-measure pairs that qualify a new tuple as a contextual skyline tuple and discover them quickly before the event becomes yesterday's news. A brute-force approach requires exhaustive comparison with every tuple, under every constraint, and in every measure subspace. We design algorithms in response to these challenges using three corresponding ideas- tuple reduction, constraint pruning, and sharing computation across measure subspaces. Furthermore, we present an end-to-end system, including fact ranking, fact-to-statement translation and keyword-based fact search. In addition to addressing the fact-checking and fact-monitoring problem and thereby pushing the boundary of computational journalism forward, this dissertation also focuses on multi-dimensional Pareto-optimal analysis; specifically, given a set of multi-dimensional points, finding the set of points which are not worse than any other points on all dimensions. This dissertation finds applications of Pareto-optimality and its variants in group recommendation, crowdsourcing, and other domains. Traditional Pareto frontier (skyline) computation is inadequate to answer queries which need to analyze not only individual points but also groups of points. To fill this gap, this dissertation proposes a novel concept Skyline Groups that represents groups which are not dominated by any other groups. It also demonstrates applications of Skyline Group through a web-based system in question answering, expert team formation and paper reviewer selection.

Keywords

Fact-checking, Fact-finding, Database, Data mining

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

26136-2.zip (14960 kB)

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.