Graduation Semester and Year

Spring 2026

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Jean Gao

Second Advisor

Dajiang Zhu

Third Advisor

Qilian Liang

Fourth Advisor

Junzhou Huang

Abstract

The rapid growth of single-cell RNA sequencing and transcriptomic datasets has created major computational challenges in causal discovery, representation learning, and biologically faithful data generation. To address these challenges, this dissertation presents three complementary deep learning frameworks for the analysis and modeling of transcriptomic data. Together, these methods form an integrative computational toolkit for understanding complex biological systems from high-dimensional and heterogeneous gene expression data.

First, this dissertation introduces DAG-VAERL, a causal discovery framework that integrates variational autoencoders, graph neural networks, reinforcement learning, and attention mechanisms to infer directed acyclic graphs for gene regulatory network analysis. DAG-VAERL improves causal structure learning in nonlinear and high-dimensional settings and demonstrates strong performance on both synthetic datasets and Alzheimer’s disease transcriptomic data, enabling more accurate discovery of causal relationships among lncRNAs and disease-related genes.

Second, this dissertation proposes the Transcriptome Graph Transformer (TGT), an unsupervised graph Transformer framework for transcriptomic representation learning. By modeling heterogeneous biological graphs composed of gene, pathway, and virtual nodes, TGT learns generalizable transcriptomic representations through pretraining. The model demonstrates strong performance across multiple downstream tasks, including Alzheimer’s disease classification, tumor transcriptomic classification, biomarker and pathway discovery, and zero-shot clustering of both transcriptomic and spatial transcriptomic data, while also providing improved interpretability and cross-dataset generalization.

Finally, this dissertation presents TransFlow, a Transformer-enhanced flow matching framework for in silico generation of single-cell RNA expression profiles. By learning biologically meaningful continuous-time transport dynamics under sparsity and non-negativity constraints, TransFlow generates realistic and cell-type-specific synthetic transcriptomic data. Experimental results on PBMC and Alzheimer’s disease datasets show that the framework better preserves manifold structure, correlation patterns, sparsity characteristics, and biologically relevant lncRNA-associated functional programs, supporting applications in data augmentation, benchmarking, and simulation of cellular states.

Overall, the methods presented in this dissertation advance transcriptomic analysis across the three core tasks of causal inference, unsupervised representation learning, and generative modeling. These contributions provide practical computational approaches for uncovering disease mechanisms, identifying biomarkers, and modeling complex cellular states, thereby supporting future advances in systems biology and precision medicine.

Keywords

Single-Cell Multi-Omics, Transcriptomics, Gene Regulatory Networks, Causal Discovery, Directed Acyclic Graphs; Graph Neural Networks, Graph Transformer, Deep Generative Models, Alzheimer’s Disease, Data Integration, Computational Biology

Disciplines

Biomedical Informatics | Other Computer Sciences

License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.