ORCID Identifier(s)

0009-0009-6222-6746

Graduation Semester and Year

Spring 2024

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Dr. Cesar Torres

Second Advisor

David Levine

Third Advisor

Dr. Manfred Huber

Fourth Advisor

Dr. Ming Li

Abstract

Benchmark datasets are critical to the evolution of AI efforts yet often embed unintended biases that influence the models that drive human-AI interactions. A deeper inspection and awareness of data is needed to understand the biases datasets may contain. In this dissertation, I introduce the Tag-and-Release method, inspired from wildlife research, that treats data as an organism and examines how different environments (i.e., CNNs) select for unique traits or characteristics that ultimately impact data's survival. Using the canonical MNIST handwritten digit dataset as a case study, I describe how the Tag-and-Release method can be used to analyze how dataset imbalance biases propagate into different neural architectures. I demonstrate how the technique can be scaled to coordinate data inspection efforts with crowd workers to annotate the dataset. Using the tagged data, I developed explainable AI interventions through a user study with machine learning students. I present our findings for developing balanced and fair datasets, stimulate discussions about models as ecosystems, and advocate for a data conservatory for coordinated efforts to support explainable AI initiatives within intelligent systems.

Keywords

explainable AI, model analysis, data labeling, tagging, MNIST, ML education, ML intervention

Disciplines

Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Recommended Citation

Zaman, Akib, "Living Datasets: Towards Data-Centric AI Explainability and Bias Mitigation" (2024). Computer Science and Engineering Dissertations. 3.
https://mavmatrix.uta.edu/cse_dissertations/3

Download

Available for download on Wednesday, May 14, 2025

Included in

Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons

COinS

Computer Science and Engineering Dissertations

Living Datasets: Towards Data-Centric AI Explainability and Bias Mitigation

ORCID Identifier(s)

Graduation Semester and Year

Language

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Abstract

Keywords

Disciplines

License

Recommended Citation

Included in

Search

Browse

Author & Creator Corner

Links

Computer Science and Engineering Dissertations

Living Datasets: Towards Data-Centric AI Explainability and Bias Mitigation

Author

ORCID Identifier(s)

Graduation Semester and Year

Language

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Abstract

Keywords

Disciplines

License

Recommended Citation

Included in

Share

Search

Browse

Author & Creator Corner

Links