Supervised Sparse Learning with Applications in Bioinformatics

Kin Ming Puk

Degree granted by The University of Texas at Arlington

Abstract

In machine learning and mathematical optimization, sparse learning is the use of mathematical norms such as L1-norm, group norm and L21-norm in order to seek a trade-off between the goodness-of-fit measure and sparsity of the result. Sparsity of result leads to a parsimonious learning model - in other words, only few features from the data matrix are required to build the learning model and for further interpretation. The motivations of employing sparse learning in bioinformatics are two-fold: firstly, a parsimonious learning model enhances the explanatory power; and secondly, a parsimonious model generally allows better prediction and generalizes better to new data. This dissertation is a collection of recent advances of sparse learning in bioinformatics, and consists of 1) L21-regularized multi-target support vector regression (L21-MSVR), 2) the application of L21-MSVR in predicting optimal tibial soft-tissue insertion of the human knees, 3) hierarchical sparse group lasso (HSGL), which improves the hierarchical lasso by incorporating an extra group-norm regularization, and 4) the use of HSGL on an electroencephalography (EEG)-based emotion recognition problem. The commonality between these articles is the use of mathematical norms, and improvement from existing optimization formulations in order to learn better and to allow a better interpretation of feature selection.