ORCID Identifier(s)

0000-0002-4275-5247

Graduation Semester and Year

2019

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Gautam Das

Abstract

Machine Learning (ML) has become an essential tool in answering complex predictive analytic queries. Model building for large scale datasets is one of the most time-consuming parts of the data science pipeline. Often data scientists are willing to sacrifice some accuracy in order to speed up this process during the exploratory phase. In this report, we aim to demonstrate ApproxML, a system that efficiently constructs approximate ML models for new queries from previously constructed ML models using the concepts of model materialization and reuse. ApproxML supports a wide variety of ML models such as generalized linear models for supervised learning and K-Means and Gaussian Mixture model for unsupervised learning. The Implementation is compatible with different datasets and ML algorithms, as it is a cost-based optimization framework that identifies best reuse strategy at query time.

Keywords

Machine learning, Model merging, Coreset, K-means, SVM, Gaussian mixture model, Linear regression

Disciplines

Computer Sciences | Physical Sciences and Mathematics

License

This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.

Comments

Degree granted by The University of Texas at Arlington

Recommended Citation

Ghaderi, Faezeh, "ApproxML: Efficient Approximate Ad-Hoc ML Models Through Materialization and Reuse" (2019). Computer Science and Engineering Theses. 25.
https://mavmatrix.uta.edu/cse_theses/25

Download

Included in

Computer Sciences Commons

COinS

Computer Science and Engineering Theses

ApproxML: Efficient Approximate Ad-Hoc ML Models Through Materialization and Reuse