University of Illinois Chicago
Browse

Fairness and Missing Data in Machine Learning: Challenges and Solutions

Download (9.97 MB)
thesis
posted on 2024-12-01, 00:00 authored by Francesco Vaina
As automated decision-making systems become increasingly prevalent in critical domains like education, ensuring fairness in these systems is paramount. Missing data presents a unique challenge to fairness in machine learning (ML), particularly in high- stakes applications such as predicting student outcomes. This research investigates the effects of missing data and various preprocessing methods on the fairness and accuracy of ML models within educational datasets. Using data from the 2012 Education Longitudinal Study, the study aims to predict bachelor’s degree attainment through models such as Random Forest, Logistic Regression, and Support Vector Classifier. By examining multiple imputation techniques, especially in contexts where data is not Missing Completely at Random (MCAR), this research evaluates the influence of these methods on model fairness and performance, with a focus on mitigating bias against vulnerable student groups. The study underscores the importance of feature handling in data preprocessing, highlighting how improper treatment during imputation can introduce or exacerbate biases that affect model predictions. Through an analysis of feature importance and its impact on fairness, this work identifies the features most likely to contribute to bias, supporting the design of more equitable predictive models. Findings reveal trade-offs between accuracy and fairness, illustrating the critical role of appropriate fairness metrics—such as Equalized Odds—in accounting for contextual nuances over simpler metrics like Statistical Parity. This research contributes to the field by addressing gaps in existing literature, providing insights into the relationship between missing data handling, fairness, and accuracy in educational ML applications, and offering practical recommendations for developing fairer, more reliable models in educational contexts.

History

Advisor

Hadis Anahideh

Department

Mechanical and Industrial Engineering

Degree Grantor

University of Illinois Chicago

Degree Level

  • Masters

Degree name

MS, Master of Science

Committee Member

Abolfazl Asudeh Roberto Cigolini Rita Difrancesco

Thesis type

application/pdf

Language

  • en

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC