Computer Science and Engineering Dissertations

Trustworthy Multimodal AI for Medical Imaging: Enhancing Diagnosis, Reasoning, and Human-Agent Interaction in Extended Reality

Jai Prakash Veerla, University of Texas at ArlingtonFollow

ORCID Identifier(s)

0009-0000-5023-0769

Graduation Semester and Year

Spring 2026

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Jacob M. Luber

Second Advisor

Cesar Torres

Third Advisor

VP Nguyen

Fourth Advisor

Jiayi Meng

Abstract

The transition from traditional microscopy to digital pathology has digitized diagnostic data, yet clinical workflows remain constrained by two-dimensional screens and passive, opaque analysis tools that fail to capture the spatial complexity of biological systems. While Foundation Models now promise to reason across histology and genomics, a critical disconnect persists between the richness of this data and the limited cognitive bandwidth of clinicians, who currently lack the immersive interfaces and trustworthy agents necessary to utilize it effectively. This dissertation presents a unified framework for "Embodied Agentic AI," establishing a pipeline that augments physician capabilities through immersive visualization, robust security, and active reasoning.

I begin by examining the biological ground truth of multi-omics data, identifying "transcriptional bursting" as a driver of discordance between the proteome and transcriptome. This analysis exposed the inherent limitations of traditional 2D workflows to capture spatial biological realities, motivating the development of SpatialVisVR. By utilizing extended reality (XR) to visualize massive multiplexed datasets and integrating deep learning for contextual similar-patient search, I demonstrate that spatial computing significantly expands the cognitive bandwidth available to clinicians for data exploration.

Addressing the security implications of this digital transformation, I investigate the vulnerability landscape of the Vision Language Models (VLMs) that underpin modern pathology. I present the first successful adversarial attack against the Pathology Language-Image Pretraining (PLIP) model, achieving a 100% misclassification rate with imperceptible perturbations. This investigation establishes that trustworthiness is a prerequisite for deployment, necessitating robust defense mechanisms based on attention map interpretability before AI can be safely integrated into clinical loops.

Building upon these visualization and security methodologies, I engineered PathVis, a mixed-reality platform that embeds multimodal conversational agents directly into the pathologist’s spatial environment. To demonstrate the future capabilities of such foundational models within this ecosystem, I explored two distinct avenues: AI for Ophthalmology and PaliGemma for Ocular Segmentation. Through the Ophthalmology initiative, I showcase how fine-tuned models move beyond passive analysis to active reasoning, improving diagnostic accuracy from 10% to 80% across nine retinal disease categories and demonstrating the capacity to diagnose rare pathologies. Concurrently, I utilized PaliGemma for segmentation and high-fidelity parsing of ocular features such as Iris, Pupil, & Sclera.

Overall, this work establishes a new paradigm of human-agent symbiosis, transforming the diagnostic process from simple observation to an interactive, secure, and AI-assisted reasoning loop. My contributions provide a practical framework that equips clinicians with the perceptual and reasoning tools necessary to harness the full potential of Foundation Models in critical real-world applications.

Keywords

Embodied Agentic AI, Spatial Computing, Digital Pathology, Vision Language Models, Adversarial Machine Learning, Explainable AI, Human-Computer Interaction, Generative AI in Medicine, Extended Reality, Multi-Omics

Disciplines

Artificial Intelligence and Robotics | Bioinformatics | Graphics and Human Computer Interfaces | Ophthalmology | Pathology

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Recommended Citation

Veerla, Jai Prakash, "Trustworthy Multimodal AI for Medical Imaging: Enhancing Diagnosis, Reasoning, and Human-Agent Interaction in Extended Reality" (2026). Computer Science and Engineering Dissertations. 8.
https://mavmatrix.uta.edu/cse_dissertations2/8

Download

Included in

Artificial Intelligence and Robotics Commons, Bioinformatics Commons, Graphics and Human Computer Interfaces Commons, Ophthalmology Commons, Pathology Commons

COinS

Computer Science and Engineering Dissertations

Trustworthy Multimodal AI for Medical Imaging: Enhancing Diagnosis, Reasoning, and Human-Agent Interaction in Extended Reality

ORCID Identifier(s)

Graduation Semester and Year

Language

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Abstract

Keywords

Disciplines

License

Recommended Citation

Included in

Search

Browse

Author & Creator Corner

Links

Computer Science and Engineering Dissertations

Trustworthy Multimodal AI for Medical Imaging: Enhancing Diagnosis, Reasoning, and Human-Agent Interaction in Extended Reality

Author

ORCID Identifier(s)

Graduation Semester and Year

Language

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Abstract

Keywords

Disciplines

License

Recommended Citation

Included in

Share

Search

Browse

Author & Creator Corner

Links