Graduation Semester and Year
2013
Language
English
Document Type
Dissertation
Degree Name
Doctor of Philosophy in Computer Science
Department
Computer Science and Engineering
First Advisor
Chris Ding
Abstract
Nowadays, in order to sense environment and understand human behaviors, data analysis plays a more and more important role to handle heterogeneous data ranging from different domains, e.g., image categorization/annotation, customer segmentation, traffic prediction, ad optimization, recommendation systems, privacy analysis, etc. The large amount of multivariate data raises the fundamental problem of data mining: how to discover meaningful compact patterns hidden in the high-dimensional noisy observations? One approach is to do dimension reduction, which finds the low-dimensional subspace and thus encodes data in a low-dimensional structure. The other approach is to do feature selection or feature engineering, which manipulates the features to capture the most discriminant patterns for classification/clustering tasks.In this thesis, to further improve the low-dimensional embedding results, an iteratively locally linear embedding algorithm is proposed, which captures the global structure of non-linear manifold through iteratively updating the embedding. To handle noisy data (e.g., data with missing values, corrupted values) classification problem, a robust data recovery model via Schatten-p norm is proposed to preprocessing the noisy data, where the rank of the data is implicitly decreased. To utilize the feature structure with constraints, an efficient feature learning algorithm via group lasso is proposed to handle features on arbitrary structure, whose convergence can be rigorously proved. To handle the problem of limited labeled data in image categorization/annotation tasks, efficient maximum consistency label propagation methods are proposed to improve the performance of graph-based semi-supervised learning methods, which utilizes both the labeled data information and graph manifold information. Extensive experiments indicate the good performance of proposed algorithms.
Disciplines
Computer Sciences | Physical Sciences and Mathematics
License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Recommended Citation
Kong, Deguang, "Image Annotation And Feature Engineering Via Structural Sparsity And Low Rank Approximation" (2013). Computer Science and Engineering Dissertations. 228.
https://mavmatrix.uta.edu/cse_dissertations/228
Comments
Degree granted by The University of Texas at Arlington