ORCID Identifier(s)

0000-0002-2662-5063

Graduation Semester and Year

Spring 2026

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Junzhou Huang

Second Advisor

Jia Rao

Third Advisor

Meng Ye

Fourth Advisor

Dajiang Zhu

Abstract

This dissertation presents three contributions to multimodal deep learning for biological data understanding, addressing the fundamental challenge of cross-modal alignment from two complementary perspectives: designing effective multimodal fusion methods for specific biomedical applications, and proposing a general framework for higher-order multimodal alignment that captures hierarchical structure in data.

First, we develop Cmai, a deep learning framework for B cell receptor (BCR) to antigen binding prediction that aligns BCR sequence information with antigen three-dimensional structures using contrastive learning. Cmai achieves an average AUROC of 0.907 across 17 antigens and 5 independent cohorts, and demonstrates clinical utility in predicting immune checkpoint inhibitor treatment response and immune-related adverse events. This work was published in Nature Cancer.

Second, we propose SAC (Segment Any Cell), a SAM-based auto-prompting fine-tuning framework for adapting foundation models to nuclei segmentation in pathology images. SAC integrates Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning with an automatic prompt generator that eliminates manual annotations. SAC achieves state-of-the-art performance with Dice scores ranging from 84\% to 93\% across different tissue types while training only about 1\% of the model parameters. This work was published in IEEE Transactions on Neural Networks and Learning Systems.

Third, building upon the insights from these pairwise alignment methods, we propose HyperGRAM (Hyperbolic Gramian Volumes for Multimodal Alignment), a novel framework that extends multimodal alignment to higher-order relationships using hyperbolic geometry. HyperGRAM addresses the volume collapse problem inherent in Euclidean Gramian volumes by leveraging hyperbolic geometry's exponential volume growth, combined with a data-driven hybrid mixing approach that balances Euclidean discriminative stability with hyperbolic semantic variance. HyperGRAM achieves state-of-the-art results on all four standard video-text retrieval benchmarks. This work was accepted at CVPR 2026.

Together, these works demonstrate the central thesis that moving from pairwise to higher-order alignment, with geometry-aware representations, enables more effective multimodal learning for biological data understanding and beyond.

Keywords

multimodal deep learning, biological data understanding, antigen binding, cell segmentation, multimodal alignment

Disciplines

Artificial Intelligence and Robotics

License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.