Graduation Semester and Year

2018

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Chengkai Li

Abstract

Several applications deploy the use of large entity graphs. Given the entirety of its application scope, it is challenging to select a single entity graph for a particular need from numerous data sources. For a comprehensible overview of the entity graph, we may project a preview table for compact representation of an entity graph. Each preview table represents a single entity type in the dataset. We need to find the representative entities for a given entity type from the entity graph to show the coverage of a dataset. In this paper, we propose a method to find representative entities for a given entity type from the entity graph. Each entity of the same type is represented by a multi-dimensional label vector using neighborhood nodes. We apply the k-means clustering algorithm on the generated label vectors of the same entity type. The clustering algorithm divides a set of entities into k disjoint clusters. The nearest entity to the centroid of each cluster is used as the representative entity for the given entity type. We have performed experiments on the Freebase dataset, based off of which, we got diverse and important representative entities for the tv, film and location domain. We can use these representative entities in the generation of preview tables. This helps the data worker understand the coverage of a particular entity type in the dataset.

Keywords

Representative entities, Entity similarity, Graph mining, Entity graph

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS