Graduation Semester and Year
2019
Language
English
Document Type
Thesis
Degree Name
Master of Science in Computer Science
Department
Computer Science and Engineering
First Advisor
Deokgun Park
Abstract
The advancement in the field of Natural Language Processing and Machine Learning has played a significant role in the huge improvement of conversational Artificial Intelligence (AI). The use of text-based conversation AI such as chatbots have increased significantly for the everyday purpose to communicate with real people for a variety of tasks. Chatbots are deployed in almost all popular messaging platforms and channels. The rise of chatbot development frameworks based on machine learning is helping to deploy chatbot easily and promptly. These chatbot development frameworks use machine learning and natural language understanding (NLU) to understand users' messages and intents and respond accordingly to users' utterance. Since most of the chatbots are developed for domain-specific purposes, the performance of the chatbot is directly related to the training data. To increase the domain knowledge and knowledge base of the chatbots via training data, the chatbots need to know similar words or phrases for a users' message. Furthermore, it is not guaranteed that a user will spell a word correctly. A lot of times, in written conversation, a user will misspell at least some words. Thus, to include semantically similar words and misspellings in the training data, I have used word embedding to generate misspellings and similar words. These generated similar words and misspellings will be used as training data to train the model for chatbot development.
Keywords
Chatbots, Conversational artificial intelligence, Machine learning, Rasa, Misspellings, Word embedding, Similar words
Disciplines
Computer Sciences | Physical Sciences and Mathematics
License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Recommended Citation
Thapa, Sanjay, "USE OF WORD EMBEDDING TO GENERATE SIMILAR WORDS AND MISSPELLINGS FOR TRAINING PURPOSE IN CHATBOT DEVELOPMENT" (2019). Computer Science and Engineering Theses. 446.
https://mavmatrix.uta.edu/cse_theses/446
Comments
Degree granted by The University of Texas at Arlington