A Naive-Bayes Text Classifier using Laplace smoothing
Document Type
Article
Abstract
The text classifier was built using kaggle.com, a website that provides GPU resources to train large amounts of data. Using this website, I created a Naive-Bayes Text Classifier in Python to classify articles on whether people agree with them. The process is based on the Naive-Bayes theory which depends on two other theorems in its namesake. The Bayes theorem states that when calculating the probability of an event one should take into account the evidence for that event to happen. The Naive theorem assumes that these evidence events are independent of each other. Following this logic, I counted the number of times a particular word appears in a document and how many times that word is present in all the documents of the dataset. These statistics allow me to categorize articles using certain words. My classifier has an accuracy of 78% and can be improved by adding more data and tuning the parameters of the model.
Disciplines
Data Science
Publication Date
5-3-2023
Language
English
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Recommended Citation
Pudu, Prithvidhar, "A Naive-Bayes Text Classifier using Laplace smoothing" (2023). 2023 IDIR Machine Learning Text Classification Challenge. 1.
https://mavmatrix.uta.edu/utadatathon_2023textclassification/1