Wei Xiang

Graduation Semester and Year




Document Type


Degree Name

Doctor of Philosophy in Computer Science


Computer Science and Engineering

First Advisor

Vassilis Athitsos


With the massive storage of multimedia data and increasing computational power of mobile devices, developing scalable computer vision applications has become the primary motivation for both research and industrial community. Among these applications, object detection and semantic segmentation are two of the most popular topics which, in addition, serve as the fundamental features for many computer vision systems under platforms like mobile, healthcare, autonomous driving, etc. Inspired by the current and foreseeable trend, this thesis focuses on developing both effective and efficient object detection and semantic segmentation models, with the large-scale, publicly available data sets sourced for various applications. In the last several years, object detection and semantic segmentation have received large attention in the literature, and have been significantly advanced with the emergence of deep learning methods. Particularly, by applying Convolutional Neural Networks (CNNs), researchers have leveraged unsupervised features in modeling which greatly simplified the tasks of classification and regression, compared to using merely hand-crafted features in those traditional approaches. In object detection, however, there still exist many open research problems like integrating contextual information to the existing models, the missing relationship between proposal scales and receptive field sizes for different CNNs, etc. In this thesis, we study extensively such relationship, and further demonstrate that our statistical results can be used as a guideline to design both heuristically and efficiently new detection models, with an improvement of detection accuracy particularly for small objects. In semantic segmentation, we investigate many of the state-of-the-art methods and figure out that current research have largely focused on using complicated backbones together with some popular meta-architectures and designs which, in turn, leads to the problem of overfitting and incapability for real-time tasks. To overcome this issue, we propose Turbo Unified Network (ThunderNet), which builds on a minimum backbone followed by a pyramid pooling module and a customized, two-level lightweight decoder. Our experimental results show that ThunderNet remains one of the fastest models that are currently available, while achieving comparable accuracy to a majority of methods in the literature. We also test ThunderNet with a GPU-powered embedded platform--NVIDIA Jetson TX2, whose results indicate that ThunderNet performs sufficiently fast and accurate, thus meeting the demands for embedded system. Finally, this thesis also surveys on the joint calibration methods for RGB-D sensor. We summarize the related work and present our quantitative evaluation results thereafter.


Computer vision, Object detection, Semantic segmentation, RGB-D calibration


Computer Sciences | Physical Sciences and Mathematics


Degree granted by The University of Texas at Arlington