ORCID Identifier(s)

0009-0009-6222-6746

Graduation Semester and Year

Spring 2024

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Dr. Cesar Torres

Second Advisor

David Levine

Third Advisor

Dr. Manfred Huber

Fourth Advisor

Dr. Ming Li

Abstract

Benchmark datasets are critical to the evolution of AI efforts yet often embed unintended biases that influence the models that drive human-AI interactions. A deeper inspection and awareness of data is needed to understand the biases datasets may contain. In this dissertation, I introduce the Tag-and-Release method, inspired from wildlife research, that treats data as an organism and examines how different environments (i.e., CNNs) select for unique traits or characteristics that ultimately impact data's survival. Using the canonical MNIST handwritten digit dataset as a case study, I describe how the Tag-and-Release method can be used to analyze how dataset imbalance biases propagate into different neural architectures. I demonstrate how the technique can be scaled to coordinate data inspection efforts with crowd workers to annotate the dataset. Using the tagged data, I developed explainable AI interventions through a user study with machine learning students. I present our findings for developing balanced and fair datasets, stimulate discussions about models as ecosystems, and advocate for a data conservatory for coordinated efforts to support explainable AI initiatives within intelligent systems.

Keywords

explainable AI, model analysis, data labeling, tagging, MNIST, ML education, ML intervention

Disciplines

Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces

License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Available for download on Wednesday, May 14, 2025

Share

COinS