Author

Ning Yan

Graduation Semester and Year

2013

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Chengkai Li

Abstract

We witness an unprecedented proliferation of entity graphs that capture entities (e.g., persons, products, organizations) and their relationships. Such entity data gradually evolves into the most comprehensive knowledge graph ever built by human beings. The flourish of entity data also boosts a growing demand for entity-centric information exploration tasks. Different from traditional information retrieval tasks that are based on statistics of text terms, entity-centric information exploration tasks are usually based on metrics over entities, their relationships, or graph structures. In this work, we identified two essential tasks in entity-centric information exploration: 1) how to do faceted navigation over a set of homogeneous entities, utilizing entity relationships and concept hierarchies; 2) how to assist users in attaining a quick and rough preview of huge entity graphs. As an approach for the first problem, we proposed a novel method that dynamically discovers a query-dependent faceted interface for a set of Wikipedia articles resulting from a keyword query. We further extended this method into a general framework for faceted interface discovery for Web documents. Our model leverages the collaborative vocabularies in Wikipedia, such as its category hierarchy and intensive internal hyperlinks, for building faceted interfaces. We proposed metrics for ranking both individual facet hierarchies and faceted interfaces (each with k facet hierarchies). We then developed faceted interface discovery algorithms that optimize these ranking metrics. As an approach for the second problem, we proposed a novel method that generates preview tables for entity graphs. The preview tables consist of important entities and relationships from entity graphs, which can help users quickly understand such graphs. We studied various scoring measures for the goodness of preview tables. Based on these scoring measures, we formulated several optimization problems that look for the optimal previews with the highest aggregated scores, under various constraints on preview size and distance between preview tables. We proved that the optimization problems under distance constraint are NP-complete. We then designed a dynamic-programming algorithm and an Apriori-style algorithm for finding optimal previews. For both problems, we conducted experiments as well as user studies for evaluating our proposed models, ranking measures, and algorithms. The results demonstrated the effi- ciency and effectiveness of our proposed methods for discovering query-dependent faceted interfaces and generating preview tables.

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS