Author

Sakher Ghanem

Graduation Semester and Year

2020

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Vassilis Athitsos

Abstract

Accurate hand segmentation is vital in many applications in which the hands play a central role, such as sign language recognition, action recognition, and gesture recognition. A relatively unexplored obstacle to correct hand segmentation is when the hand overlaps the face. The shortage of a dataset for this research area has been one motivation for this work. However, this dissertation investigates and proposes improvements for the hand-over-face segmentation task. Toward an in-depth study of the hand segmentation problem, the work presented in this dissertation will yield several contributions. First, it introduces a survey on sign language recognition systems using mobile phones, which shows a recent practical example of the need for the hand segmentation dataset and comprehensive research work. Second, following the context of this work, a literature review that covers and summarizes all available hand segmentation datasets will be presented. Besides, I provide a public dataset (VLM-HandOverFace) for hand segmentation task. This newly constructed dataset contains 4384 labeled frames and includes color, depth, infrared streams recorded by Kinect. The performance of the VLM-HandOverFace dataset is evaluated using several state-of-the-art architectures. Furthermore, this dissertation proposes the Multi-level Pyramid Scene Parsing Network (MPSP-Net) for semantic segmentation. I also provide a thorough discussion and evaluations of the new modeled-solution about the unique characteristics that demonstrate its applicability for the hand-over-face segmentation challenge. Several experiments were conducted to examine MPSPNet using two object segmentation datasets and two hand segmentation datasets. The results show that the proposed method achieves at least a 6% improvement in mIOU compared with all state-of-the-art methods. Finally, various experiments conducted to measure the impact of including temporal motion information on MPSPNet.

Keywords

Computer vision, Machine learning, CNN, DNN, MPSPNet

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS