Graduation Semester and Year
2013
Language
English
Document Type
Dissertation
Degree Name
Doctor of Philosophy in Computer Science
Department
Computer Science and Engineering
First Advisor
Chengkai Li
Abstract
We witness an unprecedented proliferation of entity graphs that capture entities (e.g., persons, products, organizations) and their relationships. Such entity data gradually evolves into the most comprehensive knowledge graph ever built by human beings. The flourish of entity data also boosts a growing demand for entity-centric information exploration tasks. Different from traditional information retrieval tasks that are based on statistics of text terms, entity-centric information exploration tasks are usually based on metrics over entities, their relationships, or graph structures. In this work, we identified two essential tasks in entity-centric information exploration: 1) how to do faceted navigation over a set of homogeneous entities, utilizing entity relationships and concept hierarchies; 2) how to assist users in attaining a quick and rough preview of huge entity graphs. As an approach for the first problem, we proposed a novel method that dynamically discovers a query-dependent faceted interface for a set of Wikipedia articles resulting from a keyword query. We further extended this method into a general framework for faceted interface discovery for Web documents. Our model leverages the collaborative vocabularies in Wikipedia, such as its category hierarchy and intensive internal hyperlinks, for building faceted interfaces. We proposed metrics for ranking both individual facet hierarchies and faceted interfaces (each with k facet hierarchies). We then developed faceted interface discovery algorithms that optimize these ranking metrics. As an approach for the second problem, we proposed a novel method that generates preview tables for entity graphs. The preview tables consist of important entities and relationships from entity graphs, which can help users quickly understand such graphs. We studied various scoring measures for the goodness of preview tables. Based on these scoring measures, we formulated several optimization problems that look for the optimal previews with the highest aggregated scores, under various constraints on preview size and distance between preview tables. We proved that the optimization problems under distance constraint are NP-complete. We then designed a dynamic-programming algorithm and an Apriori-style algorithm for finding optimal previews. For both problems, we conducted experiments as well as user studies for evaluating our proposed models, ranking measures, and algorithms. The results demonstrated the effi- ciency and effectiveness of our proposed methods for discovering query-dependent faceted interfaces and generating preview tables.
Disciplines
Computer Sciences | Physical Sciences and Mathematics
License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Recommended Citation
Yan, Ning, "Novel Methods For Entity-centric Information Exploration" (2013). Computer Science and Engineering Dissertations. 234.
https://mavmatrix.uta.edu/cse_dissertations/234
Comments
Degree granted by The University of Texas at Arlington