Graduation Semester and Year
Spring 2026
Language
English
Document Type
Dissertation
Degree Name
Doctor of Philosophy in Computer Science
Department
Computer Science and Engineering
First Advisor
Jean Gao
Second Advisor
Dajiang Zhu
Third Advisor
Qilian Liang
Fourth Advisor
Junzhou Huang
Abstract
The rapid growth of single-cell RNA sequencing and transcriptomic datasets has created major computational challenges in causal discovery, representation learning, and biologically faithful data generation. To address these challenges, this dissertation presents three complementary deep learning frameworks for the analysis and modeling of transcriptomic data. Together, these methods form an integrative computational toolkit for understanding complex biological systems from high-dimensional and heterogeneous gene expression data.
First, this dissertation introduces DAG-VAERL, a causal discovery framework that integrates variational autoencoders, graph neural networks, reinforcement learning, and attention mechanisms to infer directed acyclic graphs for gene regulatory network analysis. DAG-VAERL improves causal structure learning in nonlinear and high-dimensional settings and demonstrates strong performance on both synthetic datasets and Alzheimer’s disease transcriptomic data, enabling more accurate discovery of causal relationships among lncRNAs and disease-related genes.
Second, this dissertation proposes the Transcriptome Graph Transformer (TGT), an unsupervised graph Transformer framework for transcriptomic representation learning. By modeling heterogeneous biological graphs composed of gene, pathway, and virtual nodes, TGT learns generalizable transcriptomic representations through pretraining. The model demonstrates strong performance across multiple downstream tasks, including Alzheimer’s disease classification, tumor transcriptomic classification, biomarker and pathway discovery, and zero-shot clustering of both transcriptomic and spatial transcriptomic data, while also providing improved interpretability and cross-dataset generalization.
Finally, this dissertation presents TransFlow, a Transformer-enhanced flow matching framework for in silico generation of single-cell RNA expression profiles. By learning biologically meaningful continuous-time transport dynamics under sparsity and non-negativity constraints, TransFlow generates realistic and cell-type-specific synthetic transcriptomic data. Experimental results on PBMC and Alzheimer’s disease datasets show that the framework better preserves manifold structure, correlation patterns, sparsity characteristics, and biologically relevant lncRNA-associated functional programs, supporting applications in data augmentation, benchmarking, and simulation of cellular states.
Overall, the methods presented in this dissertation advance transcriptomic analysis across the three core tasks of causal inference, unsupervised representation learning, and generative modeling. These contributions provide practical computational approaches for uncovering disease mechanisms, identifying biomarkers, and modeling complex cellular states, thereby supporting future advances in systems biology and precision medicine.
Keywords
Single-Cell Multi-Omics, Transcriptomics, Gene Regulatory Networks, Causal Discovery, Directed Acyclic Graphs; Graph Neural Networks, Graph Transformer, Deep Generative Models, Alzheimer’s Disease, Data Integration, Computational Biology
Disciplines
Biomedical Informatics | Other Computer Sciences
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Recommended Citation
Long, Teng, "INTEGRATIVE APPROACHES AND DATA ANALYSIS FOR SINGLE-CELL RNA SEQUENCING DATA" (2026). Computer Science and Engineering Dissertations. 1.
https://mavmatrix.uta.edu/cse_dissertations2/1