Graduation Semester and Year
2010
Language
English
Document Type
Thesis
Degree Name
Master of Science in Computer Science
Department
Computer Science and Engineering
First Advisor
Chengkai Li
Abstract
Identifying the semantic similarity between named entities has many applications in NLP, including information extraction and retrieval, word sense disambiguation, text summarization and type classification. Similarity between named entities or terms is commonly determined using a taxonomy based approach, but the limited scalability of existing taxonomies has led recent research to use Wikipedia's encyclopedic knowledge base to find similarity or relatedness. These existing methods using Wikipedia have so far focused on relatedness, but are not as well suited to finding similarity. In this thesis, we evaluate methods for determining the semantic similarity between named entities by associating each named entity to a specific Wikipedia article, and then using the commonalities between Wikipedia category hierarchies as the similarity. To evaluate the effectiveness, we conducted a survey to get manually defined similarity scores for named entity pairs. The scores obtained were then compared to both implemented methods and existing relatedness measures.
Disciplines
Computer Sciences | Physical Sciences and Mathematics
License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Recommended Citation
Ashman, Jared M., "Measuring Named Entity Similarity Through Wikipedia Category Hierarchies" (2010). Computer Science and Engineering Theses. 133.
https://mavmatrix.uta.edu/cse_theses/133
Comments
Degree granted by The University of Texas at Arlington