Author

Sona Hasani

Graduation Semester and Year

2020

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Gautam Das

Abstract

Machine learning (ML) has been widely adopted in the last few years and it has had an undeniable impact on the ways many organizations make decisions. While great advances have been made in developing new ML algorithms and applications, there is a major need for scalable ML solutions in order to meet the demands of the Big data era. In this dissertation, we focus on improving the efficiency of two main machine learning solutions through database techniques: i) efficient construction of machine learning models, and ii) efficient explanation of machine learning models for multiple predictions. First, we introduce application of machine learning in complex analytic processing. Recently, there has been extensive interest in the database community for supporting quick and interactive ad-hoc analytic queries on ML models trained over large datasets. Data is typically stored in large data warehouses with multiple dimension hierarchies. In this dissertation, we investigate the feasibility of efficiently constructing approximate ML models for new queries from previously constructed ML models by leveraging the concepts of model materialization and reuse. We propose algorithms that can support a wide variety of ML models such as generalized linear models for classification along with K-Means and Gaussian Mixture models for clustering. We also propose a cost based optimization framework that identifies appropriate ML models to combine at query time. The second ML problem we tackle in this dissertation is in the area of explanation. ML algorithms are increasingly used for automated decision making in diverse domains. The widespread use of ML models has necessitated the development of algorithms for explaining their predictions. Generating concise and accurate explanations often increases user trust and understanding of the model prediction. The research community has mobilized to develop sophisticated algorithms for generating explanations. Usually, the implementations of popular explanation algorithms are highly optimized for a single prediction. However, in practice, explanations of- ten have to be generated in a batch for multiple predictions at a time. We propose a principled and lightweight approach for identifying redundant computations and several effective heuristics for speeding up multiple explanation generation. Our approach is inspired by Multi Query Optimization. Our techniques are general and could be applied to a wide variety of explanation algorithms. For all the problems, we provide extensive experiments over real-world and synthetic datasets, using popular ML algorithms and popular explainers.

Keywords

Machine learning, Database, Explainable AI

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS