Graduation Semester and Year

2017

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Chengkai Li

Abstract

Fact-checking in real-time for events such as presidential debates is a challenging task. These fact-checking processes have a difficult and rigorous task in having the best accuracy in classifying facts, finding topics, etc. The first and foremost task in fact-checking is to find out whether a sentence is factually check-worthy. The UTA IDIR Lab has deployed an automated fact-checking system named ClaimBuster. ClaimBuster has a core functionality of identifying check-worthy factual sentences. Named entities are essentially an important component of any textual data. To use these named entities, it is required to link them to labels such as a person, location, and organization. If we want the automated systems to read and understand the natural language like we do, the system must recognize the named entities that are mentioned in the text. The ClaimBuster Project, in classifying the sentences of the presidential debates has categorized the sentences into three types, namely check-worthy factual sentences (CFS), non-factual sentences (NFS) and unimportant factual sentences (UFS). This categorization helps us in making the supervised classification problem as a three-class problem (or a two-class problem, by merging NFS and UFS). ClaimBuster, in the process of identifying check-worthy factual claims, has employed named entities as a feature along with sentiment, length, words (W) and part-of-speech(POS) tags in the classification models. In this work, I have evaluated the classification algorithms such as Naïve Bayes Classifier (NBC), Support Vector Machine (SVM) and Random Forrest Classifier (RFC). The evaluation mainly constitutes the comparison of the performance of these classifiers with and without using named entities as a feature. We have also analyzed the mistakes that the classifiers have made by comparing two sets of features at a time. Therefore, the analysis consists of 18 experiments constituting three classifiers, two classification types and three sets of feature comparison. We see that the presence of named entities contributes very little to the classifier, but also that their presence is subdued by the presence of better performing features such as the part-of-speech (POS) tags.

Keywords

Named entity, Features, Classifiers

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

26988-2.zip (1444 kB)

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.