Graduation Semester and Year




Document Type


Degree Name

Doctor of Philosophy in Computer Science


Computer Science and Engineering

First Advisor

Gautam Das


The widespread use and growing popularity of online collaborative content sites (e.g., Yelp, Amazon, IMDB) has created rich resources for consumers to consult in order to make purchasing decisions on various items such as restaurants, e-commerce products, movies, etc. It has also created new opportunities for producers of such items to improve business by designing better products, composing succinct advertisement snippets, building more effective personalized recommendation systems, etc. This motivates us to develop a framework for exploratory mining of user feedback on items in collaborative social content sites. Typically, the amount of user feedback (e.g., ratings, reviews) associated with an item (or, a set of items) can easily reach hundreds or thousands resulting in an overwhelming amount of information (information explosion), which users may find difficult to cope with (information overload). For example, popular restaurants listed in the review site Yelp routinely receive several thousand ratings and reviews, thereby causing decision making cumbersome. Moreover, most online activities involve interactions between multiple items and different users and interpreting such complex user-item interactions becomes intractable too. Our research concerns developing novel data mining and exploration algorithms to formally analyze how user and item attributes influence user-item interactions. In this dissertation, we choose to focus on short user feedback (i.e., ratings and tags) and reveal how it, in conjunction with structural attributes associated with items and users, open up exciting opportunities for performing aggregated analytics. The aggregate analysis goal is two-fold: (i) exploratory mining to benefit content consumers make more informed judgment (e.g., if a user will enjoy eating at a particular restaurant), as well as (ii) exploratory mining to benefit content producers conduct better business (e.g., a redesigned menu to attract more people of a certain demographic group, etc.). We identify a family of mining tasks and propose a suite of algorithms - exact, approximation with theoretical properties, and efficient heuristics - for solving the problems. Performance evaluation over synthetic data and real data crawled from the web validates the utility of our framework and effectiveness of our algorithms.


Computer Sciences | Physical Sciences and Mathematics


Degree granted by The University of Texas at Arlington