Graduation Semester and Year
Spring 2026
Language
English
Document Type
Dissertation
Degree Name
Doctor of Philosophy in Computer Science
Department
Computer Science and Engineering
First Advisor
Jacob M. Luber
Second Advisor
Cesar Torres
Third Advisor
VP Nguyen
Fourth Advisor
Jiayi Meng
Abstract
The transition from traditional microscopy to digital pathology has digitized diagnostic data, yet clinical workflows remain constrained by two-dimensional screens and passive, opaque analysis tools that fail to capture the spatial complexity of biological systems. While Foundation Models now promise to reason across histology and genomics, a critical disconnect persists between the richness of this data and the limited cognitive bandwidth of clinicians, who currently lack the immersive interfaces and trustworthy agents necessary to utilize it effectively. This dissertation presents a unified framework for "Embodied Agentic AI," establishing a pipeline that augments physician capabilities through immersive visualization, robust security, and active reasoning.
I begin by examining the biological ground truth of multi-omics data, identifying "transcriptional bursting" as a driver of discordance between the proteome and transcriptome. This analysis exposed the inherent limitations of traditional 2D workflows to capture spatial biological realities, motivating the development of SpatialVisVR. By utilizing extended reality (XR) to visualize massive multiplexed datasets and integrating deep learning for contextual similar-patient search, I demonstrate that spatial computing significantly expands the cognitive bandwidth available to clinicians for data exploration.
Addressing the security implications of this digital transformation, I investigate the vulnerability landscape of the Vision Language Models (VLMs) that underpin modern pathology. I present the first successful adversarial attack against the Pathology Language-Image Pretraining (PLIP) model, achieving a 100% misclassification rate with imperceptible perturbations. This investigation establishes that trustworthiness is a prerequisite for deployment, necessitating robust defense mechanisms based on attention map interpretability before AI can be safely integrated into clinical loops.
Building upon these visualization and security methodologies, I engineered PathVis, a mixed-reality platform that embeds multimodal conversational agents directly into the pathologist’s spatial environment. To demonstrate the future capabilities of such foundational models within this ecosystem, I explored two distinct avenues: AI for Ophthalmology and PaliGemma for Ocular Segmentation. Through the Ophthalmology initiative, I showcase how fine-tuned models move beyond passive analysis to active reasoning, improving diagnostic accuracy from 10% to 80% across nine retinal disease categories and demonstrating the capacity to diagnose rare pathologies. Concurrently, I utilized PaliGemma for segmentation and high-fidelity parsing of ocular features such as Iris, Pupil, & Sclera.
Overall, this work establishes a new paradigm of human-agent symbiosis, transforming the diagnostic process from simple observation to an interactive, secure, and AI-assisted reasoning loop. My contributions provide a practical framework that equips clinicians with the perceptual and reasoning tools necessary to harness the full potential of Foundation Models in critical real-world applications.
Keywords
Embodied Agentic AI, Spatial Computing, Digital Pathology, Vision Language Models, Adversarial Machine Learning, Explainable AI, Human-Computer Interaction, Generative AI in Medicine, Extended Reality, Multi-Omics
Disciplines
Artificial Intelligence and Robotics | Bioinformatics | Graphics and Human Computer Interfaces | Ophthalmology | Pathology
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Recommended Citation
Veerla, Jai Prakash, "Trustworthy Multimodal AI for Medical Imaging: Enhancing Diagnosis, Reasoning, and Human-Agent Interaction in Extended Reality" (2026). Computer Science and Engineering Dissertations. 8.
https://mavmatrix.uta.edu/cse_dissertations2/8
Included in
Artificial Intelligence and Robotics Commons, Bioinformatics Commons, Graphics and Human Computer Interfaces Commons, Ophthalmology Commons, Pathology Commons