Graduation Semester and Year
2018
Language
English
Document Type
Thesis
Degree Name
Master of Science in Computer Science
Department
Computer Science and Engineering
First Advisor
David Levine
Abstract
Social media platforms have been a major part of our daily lives. But with the freedom of expression there is no way one can check whether the posts/tweets/expressions are classified on which polarity. Since Twitter is one of the biggest social platforms for microblogging, hence the experiment was done on this platform. There are several topics that are popular over the internet like sports, politics, finance, technology are chosen as the source of the experiment. These tweets were collected over a span of time for more than 2 months via a cron job. Every tweet can be divided into three categories based on sentiment analysis, positive, negative or neutral. In the process of analyzing the sentiment, Natural Language Processing is widely used for data processing like removing stopwords, lemmatization, tokenization and POS tagging. In this work, focus is on the detection and prediction of sentiments based on tweets, associated with different topics. There are several ways to carry out the analysis using libraries, APIs, classifiers and tools. The use of data mining techniques namely data extraction, data cleaning, data storage, comparison with other reliable sources and finally sentiment analysis is followed for this thesis. In this experiments and analysis, a comparative study of sentiment analysis of various tweets collected over a span of time, by using many data mining techniques is presented. The techniques used are mainly lexicon-based, machine learning based using Random Forest Classifier, API based Stanford NLP Sentiment analyzer and a tool called SentiStrength. The fifth way of analysis is an expert, i.e. a human carrying out the analysis. In this approach, the polarity of a particular tweet is found, analyzed and a confusion matrix is prepared. From that matrix tweets are broadly classified into 4 classes, namely False Positive, False Negative, True Positive and True Negative, which are used to calculate parameters like accuracy, precision and recall. This entire task is transformed to a cloud-based web interface hosted on Amazon Web Services to carry out the operations without human intervention on live data.
Keywords
Sentiment analysis, Twitter, Cloud computing
Disciplines
Computer Sciences | Physical Sciences and Mathematics
License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Recommended Citation
Niyogi, Srijanee, "TEXT MINING ON TWITTER DATA TO EVALUATE SENTIMENT" (2018). Computer Science and Engineering Theses. 475.
https://mavmatrix.uta.edu/cse_theses/475
Comments
Degree granted by The University of Texas at Arlington