Graduation Semester and Year
Spring 2026
Language
English
Document Type
Dissertation
Degree Name
Doctor of Philosophy in Computer Science
Department
Computer Science and Engineering
First Advisor
Junzhou Huang
Second Advisor
Jia Rao
Third Advisor
Meng Ye
Fourth Advisor
Dajiang Zhu
Abstract
This dissertation presents three contributions to multimodal deep learning for biological data understanding, addressing the fundamental challenge of cross-modal alignment from two complementary perspectives: designing effective multimodal fusion methods for specific biomedical applications, and proposing a general framework for higher-order multimodal alignment that captures hierarchical structure in data.
First, we develop Cmai, a deep learning framework for B cell receptor (BCR) to antigen binding prediction that aligns BCR sequence information with antigen three-dimensional structures using contrastive learning. Cmai achieves an average AUROC of 0.907 across 17 antigens and 5 independent cohorts, and demonstrates clinical utility in predicting immune checkpoint inhibitor treatment response and immune-related adverse events. This work was published in Nature Cancer.
Second, we propose SAC (Segment Any Cell), a SAM-based auto-prompting fine-tuning framework for adapting foundation models to nuclei segmentation in pathology images. SAC integrates Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning with an automatic prompt generator that eliminates manual annotations. SAC achieves state-of-the-art performance with Dice scores ranging from 84\% to 93\% across different tissue types while training only about 1\% of the model parameters. This work was published in IEEE Transactions on Neural Networks and Learning Systems.
Third, building upon the insights from these pairwise alignment methods, we propose HyperGRAM (Hyperbolic Gramian Volumes for Multimodal Alignment), a novel framework that extends multimodal alignment to higher-order relationships using hyperbolic geometry. HyperGRAM addresses the volume collapse problem inherent in Euclidean Gramian volumes by leveraging hyperbolic geometry's exponential volume growth, combined with a data-driven hybrid mixing approach that balances Euclidean discriminative stability with hyperbolic semantic variance. HyperGRAM achieves state-of-the-art results on all four standard video-text retrieval benchmarks. This work was accepted at CVPR 2026.
Together, these works demonstrate the central thesis that moving from pairwise to higher-order alignment, with geometry-aware representations, enables more effective multimodal learning for biological data understanding and beyond.
Keywords
multimodal deep learning, biological data understanding, antigen binding, cell segmentation, multimodal alignment
Disciplines
Artificial Intelligence and Robotics
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Recommended Citation
Na, Saiyang, "Multimodal Deep Learning for Biological Data Understanding" (2026). Computer Science and Engineering Dissertations. 5.
https://mavmatrix.uta.edu/cse_dissertations2/5