Graduation Semester and Year




Document Type


Degree Name

Doctor of Philosophy in Computer Science


Computer Science and Engineering

First Advisor

Vassilis Athitsos


Recognizing a sign in a sign language video is one of the well known challenging problems in computer vision community. The difficulty arises from many factors including inconsistent sign performing, noisy background, difference in image transformation between training and testing set such as scale, rotation and illumination. One of the most difficult problems, however, is capturing core information features. In most cases, hands are considered the dominant features since sign language usually involve hands movement and shapes. Having a large scale of a sign database also create another issue, expensive look-up time. As with majority of machine learning application, sign language recognition generally uses one-vs-all approach, where we compute the compatibility score between the given query and every class model and label the query with the class with the highest score. With large number of classes, this results in very inefficient look-up time. As such, efficient indexing is a requirement for the application. In this dissertation, a sign language recognition application in a large scale system is proposed. The contributions are a random forest hands detector and a fast retrieval indexing method based on hashing. The random forest hands detector is an extension work of Shotton et al [1] to support RGB videos. The main goal is to label hands pixels whether it is hand pixel or not. Since the focus is on sign language videos, the random offset introduced in Shotton et al [1] has now been extend to 3D space where the third dimension is time, resulting in incremental of features information. The difference between the proposed work and the original work [1] is that i) our work use RGB images as input making the detection accuracy harder due to the fact that depth information is not available and background segmentation is more difficult ii) The propose approach will use 3D offset space. Thus, utilizing time domain information which should result in better accuracy than using only 2D space offset. The proposed indexing method is based on the concept of filter and refine approach, where candidate signs are first filtered through hash table. Then, the nearest neighbors are found among these candidates. The filtering step is fast since it involves only calculating the hash function of a given query sign. The bottleneck is in refine step where the actual expensive distance/ classification is computed between the query and all objects in candidate set. The contribution is how to define hash functions such that neighbors objects would likely fall into the same hash bucket while minimizing number of objects fallen into the same bucket to reduce the candidate set size. The proposed approach, Distance Based Hashing (DBH), adapt basic geometry properties and machine learning concept to learn such functions. The experiment is conducted on American Sign Language dataset (ASL) containing 1,113 unique signs from 3 signers making a total of 3,339 videos of signs. It will be done in user independent scenarios where signers used in training set will never appear in testing set.


Hands detection, Sign language recognition, Indexing


Computer Sciences | Physical Sciences and Mathematics


Degree granted by The University of Texas at Arlington