ORCID Identifier(s)

0000-0003-4561-6557

Graduation Semester and Year

2016

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Chengkai Li

Abstract

Querying graph data can be difficult as it requires the user to have knowledge of the underlying schema and the query language. Visual query builders allow users to formulate the intended query by drawing nodes and edges of the query graph, which can be translated into a database query. Visual query builders help users formulate the query without requiring the user to have knowledge of the query language and the underlying schema. To the best of our knowledge, none of the currently available visual query builders suggest users what nodes/edges to include into their query graph. We provide suggestions to users via machine learning algorithms and help them formulate their intended query. No readily available dataset can be directly used to train our algorithms, so we simulate the training data using Freebase, DBpedia, and Wikipedia and use them to train our algorithms. We also compare the performance of four machine learning algorithms, namely Naïve Bayes (NB), Random Forest (RF), Classification based on Association Rules (CAR), and a recommendation system based on SVD (SVD), in suggesting the edges that can be added to the query graph. On an average, CAR requires 67 suggestions to complete a query graph on Freebase while other algorithms require 83-160 suggestions. Moreover, Naïve Bayes requires an average of 134 suggestions to complete a query graph on DBpedia while other algorithms require 150-171 suggestions.

Keywords

Machine learning, Data mining, Querying data graphs, Visual query builders

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS