Graduation Semester and Year
2018
Language
English
Document Type
Dissertation
Degree Name
Doctor of Philosophy in Computer Science
Department
Computer Science and Engineering
First Advisor
Vassilis Athitsos
Abstract
A popular method in machine learning is Convolutional Neural Network (CNN). CNN had was of high interest to the research community in the 1990s, but after that its popularity receded compared to the Support Vector Machine Support Vector Machine (SVM)[1]. One of the reasons was the relatively lower computational demands of SVM. Training CNNs requires significantly more computational power, time, and data than training SVM. One of the important issues in showing the power of the CNN is the availability of the huge amount of data and introducing big datasets. With increased availability of powerful GPU processing, using several improvements in network structure, and using much more data Krizhevsky et al. [2] used CNN to achieve the highest image classification accuracy on ImageNet Large Scale Visual Recognition Challenge(ILSVRC) [3]. After that result, CNNs have become widely popular in the computer vision and pattern recognition community, and have been applied to a variety of classification problems, including detection and localization. CNNs have achieved the best results for detection on the PASCAL VOC dataset [1], and for classification on the Caltech-256 [4] and Caltech-101 datasets [4, 5]. Based on such results, CNNs have emerged as a leading method for Machine learning and the term Deep Learning was emerged. The origin of deep learning is in computer vision. However, researchers found that deep learning is a very powerful tool to solve many problems in other areas like forecasting, finance, human pose estimation, Natural Language Processing (NLP), etc. Deep learning based methods showed a wonderful performance relate to other available methods. We have tried to improve deep learning methods and using them for solving problems in different areas. In this thesis, we will try to use the deep learning techniques for solving problems indifferent areas such as unsupervised learning, object classification, forecasting, cognitive behavior assessment and face recognition. In the computer vision part, a novel method for unsupervised feature learning for image classification was proposed in the thesis. Training CNN needs huge amount of data. So, finding the methods to train CNN with unlabeled data is very promising. In the second part, we proposed a new deep learning based framework for forecasting. Forecasting is a challenging task and has many applications in finance, meteorology, etc. We have proposed a new framework for forecasting in cases that there are many nodes to generate data. One application of our framework is prediction of the wind speed for multiple stations around the country. Another problem that we have been using Deep Learning (DL) to solve is face recognition at scale. Face recognition is very demanding both in academic and industry. We applied DL for solving face recognition for more than 600,000 identities. Also, we used DL to improve the performance of the system for behavioral assessment. This thesis makes the following contributions. First, we proposed a method for unsupervised feature learning for object classification. Due to need for huge amount of labeled data for training neural networks, unsupervised learning is very appealing for CNN training. Representation learning with unlabeled data is an interesting and open problem in machine learning community. We used transfer learning to transfer knowledge from trained network in a dataset to test samples from other dataset. The results are promising and we compare them to other methods. There are some ideas in this topic to improve the results which we implement them in the future. The paper was published at ICPR 2016. Second, we solved a forecasting problem with proposing a new deep learning based framework. We presented a spatio-temporal wind speed forecasting algorithm using DL and in particular, Recurrent Neural Networks (RNNs). we modeled the spatio-temporal information by a graph whose nodes are data generating entities and its edges basically model how these nodes are interacting with each other. Available methods for forecasting propose models to forecast wind speed for only one node. One of the main contributions of our work is the fact that we obtain forecasts of all nodes of the graph at the same time based on one framework. Our paper in this project was published at ICML Time Series workshop 2017. We improved the motion analysis module for HTKS assessment. HTKS [6] is a game-like cognitive assessment method, designed for children between four and eight years of age. During the HTKS assessment, a child responds to a sequence of requests, such as“touch your head” or “touch your toes”. The cognitive challenge stems from the fact that the children are instructed to interpret these requests not literally, but by touching a different body part than the one stated. In prior work, we have developed the CogniLearn system, that captures data from subjects performing the HTKS game, and analyzes the motion of the subjects. We propose specific improvements that make the motion analysis module more accurate. As a result of these improvements, the accuracy in recognizing cases where subjects touch their toes has gone from 76.46% in our previous work to 97.19%. The paper was published at PETRA 2017.Finally, a method proposed for face recognition at scale for large number of identities. We used the triplet loss function to train the neural network for feature learning. In our problem for face recognition we have huge number of classes so we can not use soft maxin the last layer of the network like what is done for usual classification problems. So, we used the triplet loss function for the network to create features and then we used a classifier on top of the features. The triplet loss function tries to minimize the distance of samples in a class and maximize the distance of a class with other classes. As a result of CNN for representation learning, each image could be converted to a 128-dimensional vector. We have done experiments on different number of classes on different datasets like FLW, MegaFace, and Face Scrub. The number of classes are 500, 5K, 10K, 20K, 100K, and 663386
Keywords
Machine learning, Deep learning
Disciplines
Computer Sciences | Physical Sciences and Mathematics
License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Recommended Citation
Ghaderi, Amir, "Deep Learning for Recognition of Objects, Activities, Faces, and Spatio-Temporal Patterns" (2018). Computer Science and Engineering Dissertations. 333.
https://mavmatrix.uta.edu/cse_dissertations/333
Comments
Degree granted by The University of Texas at Arlington