Author

Deguang Kong

Graduation Semester and Year

2013

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Chris Ding

Abstract

Nowadays, in order to sense environment and understand human behaviors, data analysis plays a more and more important role to handle heterogeneous data ranging from different domains, e.g., image categorization/annotation, customer segmentation, traffic prediction, ad optimization, recommendation systems, privacy analysis, etc. The large amount of multivariate data raises the fundamental problem of data mining: how to discover meaningful compact patterns hidden in the high-dimensional noisy observations? One approach is to do dimension reduction, which finds the low-dimensional subspace and thus encodes data in a low-dimensional structure. The other approach is to do feature selection or feature engineering, which manipulates the features to capture the most discriminant patterns for classification/clustering tasks.In this thesis, to further improve the low-dimensional embedding results, an iteratively locally linear embedding algorithm is proposed, which captures the global structure of non-linear manifold through iteratively updating the embedding. To handle noisy data (e.g., data with missing values, corrupted values) classification problem, a robust data recovery model via Schatten-p norm is proposed to preprocessing the noisy data, where the rank of the data is implicitly decreased. To utilize the feature structure with constraints, an efficient feature learning algorithm via group lasso is proposed to handle features on arbitrary structure, whose convergence can be rigorously proved. To handle the problem of limited labeled data in image categorization/annotation tasks, efficient maximum consistency label propagation methods are proposed to improve the performance of graph-based semi-supervised learning methods, which utilizes both the labeled data information and graph manifold information. Extensive experiments indicate the good performance of proposed algorithms.

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS