Graduation Semester and Year
2021
Language
English
Document Type
Dissertation
Degree Name
Doctor of Philosophy in Computer Science
Department
Computer Science and Engineering
First Advisor
Ramez Elmasri
Second Advisor
Manfred Huber
Abstract
Machine learning-based decision support systems bring relief to the decision-makers in many domains such as loan application acceptance, dating, hiring, granting parole, insurance coverage, and medical diagnoses. These support systems facilitate processing tremendous amounts of data to decipher the embedded patterns. However,these decisions can also absorb and amplify bias embedded in the data. An increasing number of applications of machine learning-based decision sup-port systems in a growing number of domains has directed the attention of stake-holders to the accuracy, transparency, interpretability, cost effectiveness, and fairness encompassed in the ensuing decisions. In this dissertation, we have focused on fairness and accuracy embodied in such predictions. When making machine learning based forecasts, there are a series of sub-problems within the overarching problem of addressing bias and accuracy in decisions that we address in this work: 1) detecting bias in the predictions, 2) increasing accuracy in predictions, 3) increasing prediction accuracy without tampering the class labels and while excluding sensitive attributes that trigger bias, 4) quantifying bias in a model, and finally 5) reducing a model’s bias during the training phase. In this dissertation we develop machine learning methods to address the aforementioned problems to improve fairness and prediction accuracy while using three large socially relevant datasets in two different domains. One of these two Department of Justice recidivism datasets as well as the Census-based adult income-based datasets hold significant demographic information. The second recidivism dataset is more feature rich and holds information pertaining to criminal history, substance-abuse, and treatments taken during incarceration and thus provides a rich contrast to the largely demographic datasets when comparing fairness in predicted results. Our approach is focused on data preparation, feature enrichment in activity and personal history-based datasets, model design, and inclusion of loss function regularization alongside the traditional binary cross entropy loss to increase both fairness and accuracy. We achieve this without tampering with the class labels and without balancing the datasets. To stay squarely focused on fairness, we do not include the sensitive attributes in our input features while training the models. In the experiments we show that we can increase accuracy and fairness in the predictions based on the three dataset beyond what has been achieved in the published literature. The results demonstrate that our fairness improvement approach via loss functions is applicable in different domains with different sensitive attributes and can be applied without manipulating class labels or balancing skewed datasets.
Keywords
Accuracy, Bias, Fairness, Machine learning, Deep learning, Artificial intelligence recidivism, Income prediction, FPR, FNR, TPR, TNR, Loss functions, Prediction, Race-based bias, Gender-based bias, BPS, Bias parity score
Disciplines
Computer Sciences | Physical Sciences and Mathematics
License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Recommended Citation
Jain, Bhanu Chaturvedi, "MACHINE LEARNING METHODS TO IMPROVE FAIRNESS AND PREDICTION ACCURACY ON LARGESOCIALLY RELEVANT DATASETS" (2021). Computer Science and Engineering Dissertations. 311.
https://mavmatrix.uta.edu/cse_dissertations/311
Comments
Degree granted by The University of Texas at Arlington