Graduation Semester and Year

2018

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

David Levine

Abstract

Social media platforms have been a major part of our daily lives. But with the freedom of expression there is no way one can check whether the posts/tweets/expressions are classified on which polarity. Since Twitter is one of the biggest social platforms for microblogging, hence the experiment was done on this platform. There are several topics that are popular over the internet like sports, politics, finance, technology are chosen as the source of the experiment. These tweets were collected over a span of time for more than 2 months via a cron job. Every tweet can be divided into three categories based on sentiment analysis, positive, negative or neutral. In the process of analyzing the sentiment, Natural Language Processing is widely used for data processing like removing stopwords, lemmatization, tokenization and POS tagging. In this work, focus is on the detection and prediction of sentiments based on tweets, associated with different topics. There are several ways to carry out the analysis using libraries, APIs, classifiers and tools. The use of data mining techniques namely data extraction, data cleaning, data storage, comparison with other reliable sources and finally sentiment analysis is followed for this thesis. In this experiments and analysis, a comparative study of sentiment analysis of various tweets collected over a span of time, by using many data mining techniques is presented. The techniques used are mainly lexicon-based, machine learning based using Random Forest Classifier, API based Stanford NLP Sentiment analyzer and a tool called SentiStrength. The fifth way of analysis is an expert, i.e. a human carrying out the analysis. In this approach, the polarity of a particular tweet is found, analyzed and a confusion matrix is prepared. From that matrix tweets are broadly classified into 4 classes, namely False Positive, False Negative, True Positive and True Negative, which are used to calculate parameters like accuracy, precision and recall. This entire task is transformed to a cloud-based web interface hosted on Amazon Web Services to carry out the operations without human intervention on live data.

Keywords

Sentiment analysis, Twitter, Cloud computing

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS