ORCID Identifier(s)

0000-0001-6654-4551

Graduation Semester and Year

2017

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Ramez Elmasri

Abstract

Graph database is a popular choice for representing data with relationships. It facilitates easy modifications to the relational information without the need for structural redefinition, as in case of relational databases. Exponentially growing graph sizes demand efficient querying, memory limitations notwithstanding. Use of indexes, to speed up query processing, is integral to databases. Existing works have used in-memory approaches that were limited by the main memory size. This thesis proposes a way to use graph representation, indexing technique and secondary memory to efficiently answer queries. Textual unstructured data is parsed to identify entities and assign unique identification. The entities and relationships are assembled into a graph representation in the form of key-value pairs. The key-value pairs are hashed into redundant Berkeley Database stores, clustered on relationships and entities. Berkeley DB key-value store uses primary memory in conjunction with secondary memory. Redundancy is affordable, since main memory size is not a limitation. Redundant key-value hash stores facilitate fast processing of many queries in multiple directions.

Keywords

Key-value store, Unstructured data

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS