Graduation Semester and Year




Document Type


Degree Name

Master of Science in Computer Science


Computer Science and Engineering

First Advisor

Gautam Das


Deep web databases are pillars of today’s internet services hidden behind HTML forms and Top-K search interfaces. While Top-K search interfaces provide a good way to retrieve information, it still lacks in addressing the diverse preferences of the users. Due to query rate limit constraint - i.e., maximum number of k-Nearest Neighbors queries a user/IP address can issue over a specific period of time, it is often impossible to access all the tuples in backed database. With the query rate limit constraint in mind, our motivation is twofold (i) Enable users to obtain individual records from these databases and rank them according to the user’s preference, (ii) Enable the user to access aggregate information over these databases. We introduce QR2 and DBLoc, both these systems access the hidden databases via their public search interfaces and operate without any knowledge on the underlying system ranking function. While QR2 helps in ranked retrieval of single tuples, DBLoc helps in aggregating information over Location based services. QR2 enables on-the-fly processing of queries with any user-specified ranking functions (with or without selection conditions), no matter if the ranking function is supported by the database or not. Using DBLoc the users can perform density based clustering over the backend database of Location Based Services. Thus, DBLOC aims to mine from the LBS a cluster assignment function f (·). We have developed an efficient system for both these problems to be scalable, reliable and secure. We also support multi user accessibility for both these systems and illustrate how to efficiently deploy them in the industry.


Information retrieval, Query processing, Data exploration


Computer Sciences | Physical Sciences and Mathematics


Degree granted by The University of Texas at Arlington