Graduation Semester and Year

2010

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Chengkai Li

Abstract

Identifying the semantic similarity between named entities has many applications in NLP, including information extraction and retrieval, word sense disambiguation, text summarization and type classification. Similarity between named entities or terms is commonly determined using a taxonomy based approach, but the limited scalability of existing taxonomies has led recent research to use Wikipedia's encyclopedic knowledge base to find similarity or relatedness. These existing methods using Wikipedia have so far focused on relatedness, but are not as well suited to finding similarity. In this thesis, we evaluate methods for determining the semantic similarity between named entities by associating each named entity to a specific Wikipedia article, and then using the commonalities between Wikipedia category hierarchies as the similarity. To evaluate the effectiveness, we conducted a survey to get manually defined similarity scores for named entity pairs. The scores obtained were then compared to both implemented methods and existing relatedness measures.

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS