Author

Jacek Kukluk

Graduation Semester and Year

2007

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Engineering

Department

Computer Science and Engineering

First Advisor

Lawrence B Holder

Abstract

In this dissertation we study the inference of node and edge replacement graph grammars. The approach is based on previous research in frequent isomorphic subgraphs discovery. We extend the search for frequent subgraphs by checking for overlap among the instances of the subgraphs in the input graph. If subgraphs overlap by one node, we propose a node replacement graph grammar production. If subgraphs overlap by two nodes or two nodes and an edge, we propose an edge replacement graph grammar production. We also can infer a hierarchy of productions by compressing portions of a graph described by a production and then inferring new productions on the compressed graph. We validate the approach to node replacement grammar inference in experiments where we generate graphs from known grammars and measure how well the approach infers the original grammar from the generated graph. We show how this method performs in extracting the organization of XML files. We convert an XML file into a tree and infer a graph grammar from it. We compare the inferred graph grammar to the Document Type Definition of the XML file. We report the graph grammar we found from XML files used in the National Library of Medicine, the United States Patent and Trademark Office, and major baseball leagues. We also apply the algorithm to biological domains. We show the graph grammars found in biological molecules and in biological networks, and analyze learning curves of the algorithm as we increase the number of biological networks input to the method. We also describe an algorithm and experiments for inference of edge replacement graph grammars. This method generates candidate recursive graph grammar productions based on finding isomorphic subgraphs which overlap by two nodes. If there is no edge between the two overlapping nodes, the method generates a recursive graph grammar production with a virtual edge. We guide the search for the graph grammar using the Minimum Description Length (MDL) of a graph and the size of a graph. We show experiments where we generate graphs from known graph grammars, use our method to infer the grammar from the generated graphs, and then measure the error between the original and inferred grammars. Experiments show that the method performs well on several types of grammars, and specifically that error decreases with increased numbers of unique labels in the graph. We briefly discuss other grammar inference algorithms indicating that our study extends classes of learnable graph grammars.

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS