Computer Science and Engineering Dissertations

MACHINE LEARNING METHODS TO IMPROVE FAIRNESS AND PREDICTION ACCURACY ON LARGESOCIALLY RELEVANT DATASETS

Bhanu Chaturvedi Jain

Graduation Semester and Year

2021

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Ramez Elmasri

Second Advisor

Manfred Huber

Abstract

Machine learning-based decision support systems bring relief to the decision-makers in many domains such as loan application acceptance, dating, hiring, granting parole, insurance coverage, and medical diagnoses. These support systems facilitate processing tremendous amounts of data to decipher the embedded patterns. However,these decisions can also absorb and amplify bias embedded in the data. An increasing number of applications of machine learning-based decision sup-port systems in a growing number of domains has directed the attention of stake-holders to the accuracy, transparency, interpretability, cost effectiveness, and fairness encompassed in the ensuing decisions. In this dissertation, we have focused on fairness and accuracy embodied in such predictions. When making machine learning based forecasts, there are a series of sub-problems within the overarching problem of addressing bias and accuracy in decisions that we address in this work: 1) detecting bias in the predictions, 2) increasing accuracy in predictions, 3) increasing prediction accuracy without tampering the class labels and while excluding sensitive attributes that trigger bias, 4) quantifying bias in a model, and finally 5) reducing a model’s bias during the training phase. In this dissertation we develop machine learning methods to address the aforementioned problems to improve fairness and prediction accuracy while using three large socially relevant datasets in two different domains. One of these two Department of Justice recidivism datasets as well as the Census-based adult income-based datasets hold significant demographic information. The second recidivism dataset is more feature rich and holds information pertaining to criminal history, substance-abuse, and treatments taken during incarceration and thus provides a rich contrast to the largely demographic datasets when comparing fairness in predicted results. Our approach is focused on data preparation, feature enrichment in activity and personal history-based datasets, model design, and inclusion of loss function regularization alongside the traditional binary cross entropy loss to increase both fairness and accuracy. We achieve this without tampering with the class labels and without balancing the datasets. To stay squarely focused on fairness, we do not include the sensitive attributes in our input features while training the models. In the experiments we show that we can increase accuracy and fairness in the predictions based on the three dataset beyond what has been achieved in the published literature. The results demonstrate that our fairness improvement approach via loss functions is applicable in different domains with different sensitive attributes and can be applied without manipulating class labels or balancing skewed datasets.

Keywords

Accuracy, Bias, Fairness, Machine learning, Deep learning, Artificial intelligence recidivism, Income prediction, FPR, FNR, TPR, TNR, Loss functions, Prediction, Race-based bias, Gender-based bias, BPS, Bias parity score

Disciplines

Computer Sciences | Physical Sciences and Mathematics

License

This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.

Comments

Degree granted by The University of Texas at Arlington

Recommended Citation

Jain, Bhanu Chaturvedi, "MACHINE LEARNING METHODS TO IMPROVE FAIRNESS AND PREDICTION ACCURACY ON LARGESOCIALLY RELEVANT DATASETS" (2021). Computer Science and Engineering Dissertations. 311.
https://mavmatrix.uta.edu/cse_dissertations/311

30921-2.zip (2741 kB)

Download

Included in

Computer Sciences Commons

COinS