Graduation Semester and Year




Document Type


Degree Name

Doctor of Philosophy in Computer Science


Computer Science and Engineering

First Advisor

Gautam Das


In recent years, there has been significant interest in the development of ranking functions and efficient top-k retrieval algorithms to help users in ad-hoc search and retrieval in databases (e.g., buyers searching for products in a catalog). We introduce a complementary problem: how to guide a seller in selecting the best attributes of a new tuple (e.g., a new product) to highlight so that it stands out in the crowd of existing competitive products and is widely visible to the pool of potential buyers. For example, assume one wants to sell an iPod in e-commerce site, but the title allows only 12 characters of space, should he write "Purple iPod", "Apple iPod" or "iPod 30g"? Which title is good? Do people care more about color, brand or size? We refer this problem as “attributes selection” problem. Package design based on user input is a problem that has also attracted recent interest. Given a set of elements, and a set of user preferences (where each preference is a conjunction of positive or negative preferences for individual elements), we investigate the problem of designing the most “popular package”, i.e., a subset of the elements that maximizes the number of satisfied users. Numerous instances of this problem occur in practice. For example, a vacation package consisting of a subset of all possible activities may need to be assembled, that satisfies as many potential customers as possible, where each potential customer may have expressed his preferences (positive or negative) for certain activities. Likewise, selecting topics for a social network group based on users' preferences as well as the problem of designing new products, i.e., deciding which features to add to a new product (e.g., to an iPod) that satisfies as many potential customers as possible, also falls under this framework. We refer this later problem as “package design” problem. We develop several formulations of both the problems. Even for the NPcomplete problems, we give several exact (optimal) and approximation algorithms that work well in practice. For “attributes selection” problem, one type of exact algorithms is based on Integer Programming (IP) formulations. Another class of exact methods is based on Maximal Frequent Itemset mining algorithms. For “package design” problem, the exact algorithm is based on Signature Tree data structure that is feasible for relatively moderate problem instances. Further, we present approximate algorithms, and study the performance, both theoretically and experimentally. Our experimental evaluation on real and synthetic datasets shows that the optimal and approximate algorithms are efficient for moderate and large datasets respectively, and also that the approximate algorithms have small approximation error.


Computer Sciences | Physical Sciences and Mathematics


Degree granted by The University of Texas at Arlington