ORCID Identifier(s)

0000-0002-1610-3843

Graduation Semester and Year

2022

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Gautam Das

Abstract

Data is generated at an unprecedented rate surpassing our ability to analyze them. In real applications, it is often impractical to find an exact answer by traversing the entire data. As a result, Approximate Query Processing (AQP) is getting extremely popular, which finds an approximate answer in a quick time by sacrificing a fraction of accuracy. This dissertation focuses on developing different AQP techniques to solve fundamental database problems using deep learning. Moreover, we build a fast and scalable algorithm for Quantile Regression, a well-known regression technique that can help minimize the uncertainty in the recent deep learning-based AQP solutions, including ours. First, we develop solutions for a fundamental database problem called Selectivity Estimation- the problem of estimating the result size of queries. Poor selectivity estimation can seriously impair query optimizer, hence an inefficient query execution. The traditional database techniques often perform poorly for multi-attribute queries involving many attributes. The deep learning-based solution is a promising area of research for Selectivity Estimation. However, many existing approaches suffer from high inaccuracy for low-selectivity queries. We propose two deep learning-based solutions that perform exceptionally well for low-selectivity multi-attribute queries involving many predicates. Our first approach treats selectivity estimation as a density estimation problem and learns the distribution using an unsupervised deep learning technique from data. The second approach is a supervised technique that learns from queries. Both of our approaches are fast and accurate, even for a large number of predicates. The second problem we tackle in this dissertation is one of the most important traditional database problems widely known as Approximate Query Processing (AQP) for aggregate queries like Sum, Average, and Count, which a data scientist widely uses for real-time data analysis. We propose a deep learning-based technique that learns the data distribution to generate samples representative of the input data from the model. Once we have a learned model, samples can be generated as needed, and aggregate queries can be answered without affecting the database. Moreover, the sample quality can be improved by our proposed techniques for minimizing model bias and ensemble approaches. Finally, we propose a scalable algorithm for a famous regression problem, Quantile Regression, which can help minimize the uncertainty of deep learning-based solutions, including our proposed solutions. We use computational geometry concepts of arrangement and duality to design an algorithm to calculate objective cost from a neighboring point in the arrangement in O(d). Moreover, we propose an algorithm for 2-dimensional Quantile Regression, which is better than any other existing techniques. While our deep learning-based approaches' efficacy is demonstrated with extensive experiment results, our Quantile Regression algorithm for 2-dimension is theoretically superior to all existing techniques.

Keywords

Approximate query processing, Selectivity estimation, Quantile regression

Disciplines

Computer Sciences | Physical Sciences and Mathematics

License

This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.

Comments

Degree granted by The University of Texas at Arlington

Recommended Citation

Hasan, Shohedul, "Approximate Query Processing Using Deep Learning and Database Techniques" (2022). Computer Science and Engineering Dissertations. 330.
https://mavmatrix.uta.edu/cse_dissertations/330

31033-2.zip (846 kB)

Download

Included in

Computer Sciences Commons

COinS

Computer Science and Engineering Dissertations

Approximate Query Processing Using Deep Learning and Database Techniques