Author

Di Ming

Graduation Semester and Year

2020

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Chris Ding

Abstract

Feature selection and data reconstruction are very important topics in machine learning area. In today's big data environment, many data could have high dimensions and come with noise, corruption, etc. Thus, we develop robust and flexible learning models so as to select the relevant features from the high-dimensional data spaces and reconstruct the original clean data from the corrupted input data more efficiently and more effectively. To resolve the inflexibility of the widely used class-shared feature selection methods such as L21-norm, we derive LASSO from probabilistic selection on ridge regression which provides an independent point of view from the usual sparse coding point of view, and further propose the probability-derived L12-norm based feature selection to select discriminative features. On the other hand, we propose a novel "exclusive L21" regularization to select robust and flexible feature. Exclusive L21 regularization brings out joint sparsity at inter-group level and exclusive sparsity at intra-group level simultaneously. As a result, it combines the advantages of both L21-norm (increase the robustness) and L12-norm (provide the flexibility) regularizations together. For purpose of automatically recovering the original clean data from the noisy input in unsupervised fashion, we propose a deep robust data reconstruction method in the form of autoencoder networks using L1 loss, and introduce a smoothed ReLU (sReLU) activation function to resolve the black spot problem in the outputs of the network naively using L1 loss with popular ReLU. In addition, we propose a robust PCA based low-rank and sparse data reconstruction method, and theoretically prove the underlying connection between the regularization and the robustness. Towards resolving the corresponding multivariate optimization problem efficiently, we introduce an "exact solver" based optimization algorithm to minimize robust L1-PCA models via alternative optimization strategy. Experimental result on benchmark datasets shows: (i) the feature selected by robust and flexible learning models achieves a higher accuracy in classifying the multi-class data; (ii) the data reconstructed by robust and flexible learning models obtains a smaller noise-free error in recovering the corrupted noise data. Thus it can be seen that the proposed robust and flexible learning models obtain better performance than state-of-the-arts in real-world applications.

Keywords

Robust, Flexible, Feature selection, Data reconstruction, Probabilistic LASSO, Exclusive L21, L1-Autoencoder, L1-PCA

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS