Graduation Semester and Year

2006

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Gautam Das

Abstract

Top-k queries on large multi-attribute data sets are fundamental operations in information retrieval and ranking applications. In this thesis, we initiate research on the anytime behavior of top-k algorithms on exact and fuzzy data. In particular given specific topk algorithms we are interested in studying their progress towards identification of the correct result at any point of the algorithms' execution. We adopt a probabilistic approach where we seek to report at any point the scores of the top-k results the algorithm has identified, as well as associate a confidence with this prediction. Such functionality can be a valuable asset when one is interested to reduce the runtime cost of top-k computations. We show analytically that such probability and confidence are monotone in expectation. We present a thorough experimental evaluation to validate our techniques using both synthetic and real data sets.

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS