Sanjay Thapa

ORCID Identifier(s)


Graduation Semester and Year




Document Type


Degree Name

Master of Science in Computer Science


Computer Science and Engineering

First Advisor

Deokgun Park


The advancement in the field of Natural Language Processing and Machine Learning has played a significant role in the huge improvement of conversational Artificial Intelligence (AI). The use of text-based conversation AI such as chatbots have increased significantly for the everyday purpose to communicate with real people for a variety of tasks. Chatbots are deployed in almost all popular messaging platforms and channels. The rise of chatbot development frameworks based on machine learning is helping to deploy chatbot easily and promptly. These chatbot development frameworks use machine learning and natural language understanding (NLU) to understand users' messages and intents and respond accordingly to users' utterance. Since most of the chatbots are developed for domain-specific purposes, the performance of the chatbot is directly related to the training data. To increase the domain knowledge and knowledge base of the chatbots via training data, the chatbots need to know similar words or phrases for a users' message. Furthermore, it is not guaranteed that a user will spell a word correctly. A lot of times, in written conversation, a user will misspell at least some words. Thus, to include semantically similar words and misspellings in the training data, I have used word embedding to generate misspellings and similar words. These generated similar words and misspellings will be used as training data to train the model for chatbot development.


Chatbots, Conversational artificial intelligence, Machine learning, Rasa, Misspellings, Word embedding, Similar words


Computer Sciences | Physical Sciences and Mathematics


Degree granted by The University of Texas at Arlington