ORCID Identifier(s)

0009-0000-3352-2495

Graduation Semester and Year

Spring 2026

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Mathematics

Department

Mathematics

First Advisor

Dr. Keaton Hamm

Second Advisor

Dr. Gaik Ambartsoumian

Third Advisor

Dr. Andrzej Korzeniowski

Fourth Advisor

Dr. Wei Jiang

Abstract

Dimensionality reduction techniques are developed from the assumption that high-dimensional data often arises from low-dimensional structures embedded into the high-dimensional ambient space. Classical dimensionality reduction methods rely on the Euclidean distance, which may fail to capture the geometric structures of the datasets. This dissertation includes alternative metrics for dimensionality reduction techniques and challenges in applying these techniques.

First, we investigate the Wasserstein distance based neighbor embeddings for dimensionality reduction methods and compare the classification and clustering performance with the classical Euclidean based methods. The Wasserstein distance models the data as probability distributions and compares two distributions applying optimal transport (OT) through mass transportation. In this work, we show that the Wasserstein distance based methods provide a more meaningful representation and often demonstrates better performance in downstream machine learning tasks such as classification and clustering.

Second, though the Wasserstein distance based neighbor embeddings often shows improved machine learning tasks performance over classical Euclidean methods, the limitations of the Wasserstein distance based methods are that the data are considered as probability distributions with equal mass 1. We propose neighbor em- beddings using unbalanced optimal transport (UOT) that relax the constraint of mass conservation. This relaxation enables robust comparisons between distributions with uneven mass. The study shows that on the MedMNIST datasets, UOT outperforms classical OT in classification and clustering 87.5% and 83.33% of the time and UOT outperforms both OT and Euclidean based methods 55.5% of the time.

Finally, we study the computational challenges associated with the Wasser- stein distance in large scale settings. We propose two algorithms that estimate the Wasserstein distance from a sample of a few entries. The first algorithm uniformly samples entries from the upper triangular Wasserstein distance matrix and uses iterative methods (Barzilai-Borwein descent) to estimate the matrices. The second algorithm samples O(d log d) random columns of the Wasserstein distance matrices and applies the Nystr¨om method for matrix completion. We show that the Nystr¨om method (Algorithm 2) can outperform the iterative method (Algorithm 1) and the Nystr¨om method is stable for the classification task even for only 10% of the sample columns.

Keywords

Unbalanced optimal transport, Hellinger– Kantorovich distance, manifold learning, measure-valued data, image processing, machine learning

Disciplines

Data Science

License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Comments

I would like to express my deepest gratitude to my supervisor Dr. Keaton Hamm for his continuous support and guidance throughout my doctoral studies. Without his patience and exceptional mentorship, this thesis would not have been possible. I am thankful for his kindness, genuine care during my PhD journey. I am also grateful to Dr. Andrzej Korzeniowski, Dr. Gaik Ambartsoumian and Dr. Wei Jiang for their invaluable time and constructive suggestions.

I would like to thank Dr. Hristo V. Kojouharov, former graduate advisor and current chair of Mathematics department, for his availability and support from the beginning of my PhD studies. I am also grateful to the Mathematics department of The University of Texas at Arlington for funding my studies.

Special thanks to my collaborators for their collaboration and contribution to the results presented in this thesis. Thanks to Dr. Mohammad Atiqul Islam of the Computer Science and Engineering department of The University of Texas at Arlington and Dr. Zahidur Rahim Talukder, then a doctoral student of the Computer Science and Engineering of The University of Texas at Arlington and now an assistant professor of Texas Lutheran University, for the opportunity to collaborate on their project and develop the theoretical analysis of the project. Thanks to Dr. Abiy Tasissa of the department of Mathematical Sciences of Tufts University and Dr. Hanqin Cai of the School of Data, Mathematical, and Statistical Sciences of University of Central Florida for their invaluable guidance and collaboration. I also thank Phuong Trinh, Ryan Bui and Yakov Gavriyelov for their contribution in different research projects during my PhD studies through undergraduate capstone project. 

The Army Research Office is acknowledged for their financial support and accomplished this work under Grant Number W911NF-23-1-0213.

Included in

Data Science Commons

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.