Author

Xin Miao

ORCID Identifier(s)

0000-0002-7891-1627

Graduation Semester and Year

2020

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Vassilis Athitsos

Abstract

Due to the powerful feature representation capabilities, deep learning has became a powerful tool in the field of computer vision. Especially in the aspect of high-dimensional images, deep learning can achieve fast inference compared with most traditional methods. This paper focuses on how to design an efficient neural network and apply it to two high-dimensional images application, video facial landmarks detections and compressive imaging system. In this first part of this paper, we focus on landmarks detection for video facial images. Existing methods for facial landmarks detection mainly rely on cascaded regression. It is an indirect method and progressively estimates shape increments in an iterative way. Moreover, cascaded models extract handcrafted features, which fail to leverage the strength of convolutional neural networks. In addition, those local descriptors need to be calculated in each iteration based on updated shapes, which can be time consuming and makes it hard to integrate feature learning into one single architecture for end-to-end learning. This paper propose the a direct shape regression network (DSRN) which can achieve fast facial landmarks prediction. Specifically, by deploying doubly convolutional layer and by using the Fourier feature pooling layer proposed in this paper, DSRN efficiently constructs strong representations to disentangle highly nonlinear relationships between images and shapes. It can run very fast with about 500 frames per second excluding face detection in the plat- form of NVIDIA GTX 1080Ti GPU, which is promising for the prospect of practical video facial landmarks detection. In this second part of this paper, we proposes a deep learning framework for high dimensional images reconstruction in the snapshot compressive imaging system. Snapshot compressive imaging (SCI) refers to compressive imaging systems where multiple frames are mapped into a single measurement, video compressive imaging and hyperspectral compressive imaging are two representative aspects. In this manner, a two-dimensional (2D) monochromatic camera can sample the scenes at video rate and thus saves memory, band- width and cost significantly. While enjoying all these advantages, one important step in SCI is that algorithms are required to reconstruct the 3D data-cube from every snapshot measurement after the sensing process. Existing algorithms are either too slow or the performance is not high which preclude wide applications of SCI. In this paper, we develop a dual-stage deep learning model to reconstruct the desired 3D signal in SCI. It can be used for both video reconstruction and hyperspectral images reconstruction. The only difference is just the training process for the deep network. Results on both simulation and real datasets demonstrate the significant advantages of our network, which leads to a huge improvement in PSNR on simulation data compared to the current state-of-the-art. Further- more, our network can finish the reconstruction task within sub-seconds instead of hours taken by the most recently proposed DeSCI algorithm, thus speeding up the reconstruction >1000 times.

Keywords

Deep learning, High Dimensional Images, Snapshot compressive imaging, Facial landmarks detection

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS