Graduation Semester and Year
2023
Language
English
Document Type
Thesis
Degree Name
Master of Science in Computer Science
Department
Computer Science and Engineering
First Advisor
Chengkai Li
Abstract
The task of health tweet classification entails identifying whether a given tweet is health-related or not. While existing research in this area has made significant progress in classifying tweets into specific sub-domains of health, such as mental health, COVID-19, or specific diseases, there is a need for a more comprehensive approach that considers a broader range of health-related topics. This thesis addresses this need by proposing a diverse and comprehensive dataset that includes various existing health-related datasets, data collected through a keyword-based approach, and manually annotated data. However, the use of health-related keywords in a figurative or non-health context poses a significant challenge to the classification task. To overcome this challenge, the thesis explores the use of Transformer-based models, such as BERT, BERTweet, RoBERTa, and DistilBERT, which have the ability to understand the contextual meaning of words. The study experiments with these models to assess their effectiveness in classifying health-related tweets. Based on the findings of the thesis study, Transformer-based models, including BERT, DistilBERT, and RoBERTa, had lower F1-scores of 0.882, 0.870, and 0.872, respectively when evaluated on test data. The highest F1-score of 0.900 was achieved by adding the BiLSTM layer to the BERTweet model, which was then fine-tuned on our proposed dataset and RHMD (Reddit Dataset). Additionally, an ablation analysis was conducted to highlight the significance of the BiLSTM layer and the RHMD dataset in enhancing the BERTweet model's performance for health tweet classification.
Keywords
Healthcare, Deep Learning, Twitter Data Analysis, Transformers, Classification
Disciplines
Computer Sciences | Physical Sciences and Mathematics
License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Recommended Citation
Patel, Foram Pankajbhai, "Enhancing Health Tweet Classification: An evaluation of Transformer-based models for Comprehensive Analysis" (2023). Computer Science and Engineering Theses. 495.
https://mavmatrix.uta.edu/cse_theses/495
Comments
Degree granted by The University of Texas at Arlington