ORCID Identifier(s)

0000-0001-6654-4551

Graduation Semester and Year

2017

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Ramez Elmasri

Abstract

Graph database is a popular choice for representing data with relationships. It facilitates easy modifications to the relational information without the need for structural redefinition, as in case of relational databases. Exponentially growing graph sizes demand efficient querying, memory limitations notwithstanding. Use of indexes, to speed up query processing, is integral to databases. Existing works have used in-memory approaches that were limited by the main memory size. This thesis proposes a way to use graph representation, indexing technique and secondary memory to efficiently answer queries. Textual unstructured data is parsed to identify entities and assign unique identification. The entities and relationships are assembled into a graph representation in the form of key-value pairs. The key-value pairs are hashed into redundant Berkeley Database stores, clustered on relationships and entities. Berkeley DB key-value store uses primary memory in conjunction with secondary memory. Redundancy is affordable, since main memory size is not a limitation. Redundant key-value hash stores facilitate fast processing of many queries in multiple directions.

Keywords

Key-value store, Unstructured data

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

27194-2.zip (581 kB)

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.