Feng Ji

Graduation Semester and Year




Document Type


Degree Name

Doctor of Philosophy in Computer Science


Computer Science and Engineering

First Advisor

Ramez Elmasri


In bioinformatics research, scientists usually face the problems of modeling complex data types and integrating diverse resources. Traditional data models such as EER lack the expressing power to capture many characteristics that are common in bioinformatics data. We first propose extensions to the ER model that allow accurate representation of many of these characteristics. We then utilize these concepts in an integrative system to provide an easy-to-use interface for biologists to construct queries. Our research utilizes the enhanced conceptual modeling concepts to create a prototype mediator for querying multiple data sources. The various relationships between different biological entities are all semantically represented as domain ontologies stored in the mediator for experts to analyze and correlate the integrated query results. The following research has been conducted: (1) We first propose new EER schema notation to represent the common occurring biological concepts: the ordering properties of the DNA sequences, the 3D structure of proteins and the functional processes of metabolic pathways. (2) Then, we utilize these new relationships in the development of the mediated domain ontology, which helps the interface design and query processor implementation of our mediator system. Our mediated schema features are based on a hybrid of taxonomy ontologies (core concepts and external classification/annotation concepts) for interpretation of raw data sets (protein and gene sequences) in the context of molecular interactions, biochemical pathways and biological processes. We adopt the RDF data model to implement the mediation data. Our mediator mainly takes a browsing-based approach to integrate different data sources. Extra data can be dynamically retrieved through the web service. By browsing the ontology tree in the query interface, users can select concepts of interest and associated attributes to formulate queries based on their domain knowledge. The query result is a set of various database entry accessions with associated attribute values. Users can click each link of the accessions to see the detailed reports, or cross-compare attributes of these data instances. Query usability and performance experiments are tested for real data sets from UniProt [30], ENZYME [8], CATH [23], and GO [29].


Computer Sciences | Physical Sciences and Mathematics


Degree granted by The University of Texas at Arlington