Jinyu Yang

Graduation Semester and Year




Document Type


Degree Name

Doctor of Philosophy in Computer Science


Computer Science and Engineering

First Advisor

Junzhou Huang


Deep neural networks (DNNs) demonstrate unprecedented achievements on various machine learning problems and applications. However, such impressive performance heavily relies on massive amounts of labeled data which requires considerable time and labor efforts to collect and annotate. To remedy this limitation, unsupervised domain adaptation (UDA) has attracted more and more attention in the past decade, owing to its capability in transferring the knowledge learned from a labeled source domain to an unlabeled target domain. UDA has proved its wide applicability in various vision tasks, for example, image classification and semantic segmentation. Despite its impressive success, the limitations of existing UDA methods lie in that: i) the consistency of the joint distribution in the target domain cannot be guaranteed by simply performing global feature alignment as in previous studies; ii) the context-dependency is essential for semantic segmentation, however, its transferability is still not well understood; iii) the robustness of UDA methods in semantic segmentation remains unexplored, which poses a security concern in this field; and iv) previous work is mainly built upon convolutional neural networks (CNNs) to learn domain-invariant representations. However, the transferability of the Vision Transformer (ViT) which is convolution-free, is still an open problem. To address these limitations, in this dissertation: i) we use a reconstruction network to reconstruct both source and target images from their predicted labels. Therefore, we can encourage cross-domain features with the same category close to each other; ii) we design two cross-domain attention modules to adapt context dependencies from both spatial and channel views. Specifically, the spatial attention module captures local feature dependencies between each position in the source and target image. The channel attention module models semantic dependencies between each pair of cross-domain channel maps. In consequence, the contextual information can be aggregated and adapted across domains; iii) we comprehensively evaluate the robustness of existing UDA methods and propose a robust UDA approach that maximizes the agreement between clean images and their adversarial examples by a contrastive loss in the output space. iv) we perform the first-of-its-kind investigation of ViT's generalization ability on commonly used benchmarks and propose a new UDA method that explicitly considers the intrinsic merits of the transformer architecture.


Transfer learning, Unsupervised domain adaptation, Deep neural networks


Computer Sciences | Physical Sciences and Mathematics


Degree granted by The University of Texas at Arlington