Graduation Semester and Year
2017
Language
English
Document Type
Thesis
Degree Name
Master of Science in Computer Science
Department
Computer Science and Engineering
First Advisor
Ramez Elmasri
Abstract
Graph database is a popular choice for representing data with relationships. It facilitates easy modifications to the relational information without the need for structural redefinition, as in case of relational databases. Exponentially growing graph sizes demand efficient querying, memory limitations notwithstanding. Use of indexes, to speed up query processing, is integral to databases. Existing works have used in-memory approaches that were limited by the main memory size. This thesis proposes a way to use graph representation, indexing technique and secondary memory to efficiently answer queries. Textual unstructured data is parsed to identify entities and assign unique identification. The entities and relationships are assembled into a graph representation in the form of key-value pairs. The key-value pairs are hashed into redundant Berkeley Database stores, clustered on relationships and entities. Berkeley DB key-value store uses primary memory in conjunction with secondary memory. Redundancy is affordable, since main memory size is not a limitation. Redundant key-value hash stores facilitate fast processing of many queries in multiple directions.
Keywords
Key-value store, Unstructured data
Disciplines
Computer Sciences | Physical Sciences and Mathematics
License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Recommended Citation
Varghese, Jasmine Manoj, "SCALABLE CONVERSION OF TEXTUAL UNSTRUCTURED DATA TO NoSQL GRAPH REPRESENTATION USING BERKELEY DB KEY-VALUE STORE FOR EFFICIENT QUERYING" (2017). Computer Science and Engineering Theses. 390.
https://mavmatrix.uta.edu/cse_theses/390
Comments
Degree granted by The University of Texas at Arlington