ORCID Identifier(s)

0000-0002-2662-5063

Graduation Semester and Year

Spring 2026

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Junzhou Huang

Second Advisor

Jia Rao

Third Advisor

Meng Ye

Fourth Advisor

Dajiang Zhu

Abstract

This dissertation presents three contributions to multimodal deep learning for biological data understanding, addressing the fundamental challenge of cross-modal alignment from two complementary perspectives: designing effective multimodal fusion methods for specific biomedical applications, and proposing a general framework for higher-order multimodal alignment that captures hierarchical structure in data.

First, we develop Cmai, a deep learning framework for B cell receptor (BCR) to antigen binding prediction that aligns BCR sequence information with antigen three-dimensional structures using contrastive learning. Cmai achieves an average AUROC of 0.907 across 17 antigens and 5 independent cohorts, and demonstrates clinical utility in predicting immune checkpoint inhibitor treatment response and immune-related adverse events. This work was published in Nature Cancer.

Second, we propose SAC (Segment Any Cell), a SAM-based auto-prompting fine-tuning framework for adapting foundation models to nuclei segmentation in pathology images. SAC integrates Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning with an automatic prompt generator that eliminates manual annotations. SAC achieves state-of-the-art performance with Dice scores ranging from 84\% to 93\% across different tissue types while training only about 1\% of the model parameters. This work was published in IEEE Transactions on Neural Networks and Learning Systems.

Third, building upon the insights from these pairwise alignment methods, we propose HyperGRAM (Hyperbolic Gramian Volumes for Multimodal Alignment), a novel framework that extends multimodal alignment to higher-order relationships using hyperbolic geometry. HyperGRAM addresses the volume collapse problem inherent in Euclidean Gramian volumes by leveraging hyperbolic geometry's exponential volume growth, combined with a data-driven hybrid mixing approach that balances Euclidean discriminative stability with hyperbolic semantic variance. HyperGRAM achieves state-of-the-art results on all four standard video-text retrieval benchmarks. This work was accepted at CVPR 2026.

Together, these works demonstrate the central thesis that moving from pairwise to higher-order alignment, with geometry-aware representations, enables more effective multimodal learning for biological data understanding and beyond.

Keywords

multimodal deep learning, biological data understanding, antigen binding, cell segmentation, multimodal alignment

Disciplines

Artificial Intelligence and Robotics

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Recommended Citation

Na, Saiyang, "Multimodal Deep Learning for Biological Data Understanding" (2026). Computer Science and Engineering Dissertations. 5.
https://mavmatrix.uta.edu/cse_dissertations2/5

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Computer Science and Engineering Dissertations

Multimodal Deep Learning for Biological Data Understanding

ORCID Identifier(s)

Graduation Semester and Year

Language

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Abstract

Keywords

Disciplines

License

Recommended Citation

Included in

Search

Browse

Author & Creator Corner

Links

Computer Science and Engineering Dissertations

Multimodal Deep Learning for Biological Data Understanding

Author

ORCID Identifier(s)

Graduation Semester and Year

Language

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Abstract

Keywords

Disciplines

License

Recommended Citation

Included in

Share

Search

Browse

Author & Creator Corner

Links