Author

Anish Rai

ORCID Identifier(s)

0000-0003-4488-2706

Graduation Semester and Year

2020

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Sharma Chakravarthy

Abstract

Substructure discovery is well-researched for single graphs (both simple and attribute) as it is an important component of knowledge discovery for many applications such as finding the core substructure in a protein, important concept in a large graph, etc. However, multilayer networks or MLNs (instead of attribute graphs) have been shown to be better for modeling complex data sets that have multiple entity and feature types. This model provides more clarity on semantics of data sets as well as the ability to use an arbitrary subset of layers for analysis. However, the challenge is that many algorithms such as community and centrality detection as well as substructure discovery need to be extended to MLN representation. With the representation of complex data sets as MLNs brings new challenges in terms of fi nding substructures in MLNs or a subset of MLNs. A naive approach would be to collapse (or aggregate) all (or a subset of) layers into a single attribute graph and use extant algorithms. There are a number of substructure discovery algorithms ranging from memory-based, disk-based, SQL-based, and partitioned using map/reduce framework. While substructure discovery has been widely used for the analysis of single networks, attribute graphs, and forests, the development of an efficient substructure discovery methods for multilayer networks without aggregation is currently not available. This thesis proposes a new decoupling approach-based substructure discovery algorithm for homogeneous MLNs (or HoMLNs). HoMLNs are MLNs where each layer has the same set of nodes but different connectivity in each layer. In this dissertation, we propose a decoupling approach-based algorithm for HoMLNs where aggregation is not needed. Further, the algorithm has been implemented using the Map/Reduce framework in order to handle arbitrary number of layers and improving the response time through parallelism. Each layer is processed individually/separately in parallel, but the substructures generated for each layer are combined after each iteration to identify substructures across layers (or MLN). The focus is on correctness of the algorithm and resource utilization with respect to number of layers. The proposed algorithm is validated analytically as well through extensive experimental analysis on large real-world and synthetic graphs.

Keywords

Substructure discovery, Multilayer networks, Graph mining, MapReduce

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS