Kamal Taha

Graduation Semester and Year




Document Type


Degree Name

Doctor of Philosophy in Computer Science


Computer Science and Engineering

First Advisor

Ramez Elmasri


This dissertation research focuses on three aspects related to querying of XML data. The three focus areas are: (1) Improving accuracy of XML keyword queries by modeling the contexts of XML elements; (2) Enhancing XML-based personalized search by using group profiling to determine individual preferences; and (3) Improving performance of distributed XML querying by caching of frequently-used query results. For each of these three focus areas, we developed formal concepts and algorithms that lead to the improved accuracy and performance. Our contributions are as follows:1. Improving the accuracy of XML keyword queries:We improve search accuracy by utilizing nodes' contexts in an XML tree. Overlooking nodes' contexts when building relationships between the nodes may lead to erroneous query results. The context of a data node is determined by its parent node. By treating each set of nodes consisting of a parent and its children data nodes as one unified entity and then determining the relationships between the different unified entities, an XML system can build much more accurate relationships between data nodes in less processing time, resulting in more accurate query results.2. Enhancing XML-based personalized search: By pre-defining and categorizing social groups based on demographic, ethnic, cultural, religious, or other characteristics, a user profile could be inferred from the profiles of the social groups to which the user belongs. This would simplify personalized search and make its process more efficient. We implemented this approach in an XML-based recommender system. The system is able to output ranked lists of content items taking into account not only the initial preferences of the user, but also the preferences of the user's various social groups.3. Improving performance of distributed XML querying:Distributed XML documents are too big and complicated to be rapidly queried every time a user submits a query due to the overhead involved in decomposing the queries, sending the decomposed queries to remote site(s), and executing structural join operations to compose the results. We investigated strategies and mechanisms to tackle these problems. We then implemented these mechanisms in a query processor, and compared their performance to standard XML query processors.


Computer Sciences | Physical Sciences and Mathematics


Degree granted by The University of Texas at Arlington