Graduation Semester and Year
2022
Language
English
Document Type
Dissertation
Degree Name
Doctor of Philosophy in Computer Science
Department
Computer Science and Engineering
First Advisor
Vassilis Athitsos
Abstract
Hand analysis using vision systems is necessary for interaction between people and digital devices and thus is crucial in many applications relating to computer vision and human computer interaction (HCI). The proposed dissertation will explore hand analysis from depth images along two lines: hand part segmentation and 3D hand pose estimation. First, we investigate hand part segmentation from depth images, which is formulated as a semantic segmentation task. We explore a method aimed at determining for every pixel what hand part it belongs to. This method attempts to perform this task without requiring the ground-truth segmentation labels for training. It uses the 3D hand pose annotations, already provided with hand pose datasets, as a form of weak supervision for training. Both qualitative and quantitative experiments confirm the effectiveness of the proposed method. Second, we investigate a method that enables accurate 3D hand pose estimation from depth images. This is achieved by a novel formulation of the decomposition of the 3D hand pose estimation into the estimation of 2D joint locations in the depth image space (UV), and the estimation of their corresponding depths aided by two complementary attention maps. This decomposition prevents depth estimation, which is a more difficult task, from interfering with the UV estimations at both the prediction and feature levels. We empirically show that the proposed formulation of the decomposition of the 3D hand pose estimation and its interaction with two complementary attention maps estimated by the model by two separate branches leads to the state-of-the-art accuracy on three public 3D hand pose estimation benchmark datasets. Finally, we explore a semi-supervised method for 3D hand pose estimation from depth images. This method is aimed at reducing the reliance of model’s training on the ground-truth annotations, which are costly to acquire. This goal is achieved by adopting a student-teacher framework. The teacher network is trained by taking advantage of consistency training and adapting the latest advancements in semisupervised image classification methods. It generates pseudo-labels for training the student network. As the training progresses, the teacher network improves and generates more accurate pseudo-labels for the training of the student network, resulting in further improvement in the student network. For inference at test time, only the student network is used, and the teacher network is discarded after training. We conduct several experiments to demonstrate the effectiveness of the proposed framework.
Keywords
3D hand pose estimation, Hand part segmentation, Deep learning, Semi-supervised learning
Disciplines
Computer Sciences | Physical Sciences and Mathematics
License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Recommended Citation
Rezaei, Mohammad, "HAND ANALYSIS FROM DEPTH IMAGES" (2022). Computer Science and Engineering Dissertations. 281.
https://mavmatrix.uta.edu/cse_dissertations/281
Comments
Degree granted by The University of Texas at Arlington