Sheng Wang

Graduation Semester and Year




Document Type


Degree Name

Doctor of Philosophy in Computer Science


Computer Science and Engineering

First Advisor

Junzhou Huang


In the current era of big data, deep learning has been the state-of-the-art model for various applications. Image-based applications such as image classification, object detection, image segmentation, benefit most from deep learning networks. One reason for the successful applications of deep learning is that there are a large number of labeled training samples for the model to learn from. People are interested in reducing the cost of getting labeled training samples, and there are various research going on with unsupervised, semi-supervised, and self-supervised deep learning. The cost of health-related data is even higher. Labeling the surgical videos with tools being used and surgical phase needs surgical related domain knowledge, it is not feasible to use general cloud labeling. Getting molecule properties even cost more since it usually needs expensive laboratory experiments. How to utilize the unlabeled data to improve the model performance attracts increasing research interests. In this thesis, we aim at proposing semi-supervised deep learning models to introduce unlabeled data into model training to get better model performance. Specifically, this thesis focuses on developing semi-supervised deep models for in surgical tool presence detection problem, and molecular property prediction problem. Surgical tool presence detection is one of the key problems in automatic surgical video content analysis. Solving this problem benefits many applications, such as the evaluation of surgical instrument usage and automatic surgical report generation. Given the fact that each video is only sparsely labeled at the frame level, meaning that only a small portion of video frames will be properly labeled, existing approaches only model this problem as an image (frame) classification problem without considering temporal information in surgical videos. In this thesis, we discuss from a supervised deep neural network to a semi-supervised frame, which utilizes the information from both labeled and unlabeled frames to solve this problem with different components to capture the spatial and temporal information of surgical videos. With the rapid progress of AI in both academia and industry, Deep Learning has been widely introduced into various areas in drug discovery to accelerate its pace and cut R\&D costs. Among all the problems in drug discovery, molecular property prediction has been one of the most important problems. Unlike general Deep Learning applications, the scale of labeled data is limited in molecular property prediction. To better solve this problem, Deep Learning methods have started focusing on how to utilize tremendous unlabeled data to improve the prediction performance on small-scale labeled data. In this thesis, we discuss a semi-supervised model named SMILES-BERT, which consists of the attention mechanism based Transformer Layer. A large-scale unlabeled data has been used to pre-train the model through a masked SMILES recovery task. Then the pre-trained model could easily be generalized into different molecular property prediction tasks via fine-tuning.


Deep learning, Semi-supervised learning, Surgical tool detection, Molecule property prediction


Computer Sciences | Physical Sciences and Mathematics


Degree granted by The University of Texas at Arlington