Zheng Xu

ORCID Identifier(s)


Graduation Semester and Year




Document Type


Degree Name

Doctor of Philosophy in Computer Science


Computer Science and Engineering

First Advisor

Junzhou Huang


With the recent advancement of the deep learning technology in the artificial intelligence area, nowadays people's lives have been drastically changed. However, the success of deep learning technology mostly relies on large-scale high-quality data-sets. The complexity of deeper model and larger scale datasets have brought us significant challenges. Inspired by this trend, in this dissertation, we focus on developing efficient and effective large-scale deep learning techniques in solving real-world problems, like cell detection in hyper-resolution medical image or drug screening from millions of compound candidates. With respect to the hyper-resolution medical imaging cell detection problem, the challenges are mainly the extremely large scale pixel information. Also the cell density in the region of interests are usually super high, meaning that the cells will clump and congest in small areas. These challenges hence demand high-quality efficient modeling to address this cell detection problem at scale. In this paper, we will discuss the large-scale cell detection problem from both mathematical/statistical modeling and architectural system perspective and reach to a comprehensive solution, which is both incredibly efficient and effective. With respect to the drug discovery problem, every drug company with R\&D department has carried out numerous initiatives for speeding up its drug discovery process. Drug discovery is the process through which potential new medicines are identified. Modern drug discovery is usually implemented as drug compound selection, while, for every candidate chemical compound, the chemical drug properties, e.g., affinity, selectivity, metabolic stability, are biologically tested in the lab environment. Once all the properties pass the drug requirement tests, it will be selected as a new potential drug candidate. However, this process is excessively expensive and labor-intensive, and costs hundreds of million dollars each year. The major challenge for deep learning is to take in the sequence representation of drug compound, i.e, SMILE representation as input and infer chemical properties from limited high-quality datasets. Within this context, we propose several effective unsupervised/semi-supervised techniques in generating the powerful chemical representation and models that provide strong inference.


Deep learning, Machine learning, Medical imaging, Bio-informatics


Computer Sciences | Physical Sciences and Mathematics


Degree granted by The University of Texas at Arlington