Fairness for Distributionally Robust Learning

Rezaei, Ashkan

doi:10.25417/uic.20254263.v1

Fairness for Distributionally Robust Learning

thesis

posted on 2022-05-01, 00:00 authored by Ashkan Rezaei

Machine learning algorithms have been traditionally focused on optimizing prediction performance. However, in recent years there has been a rising awareness about the unintended biases in these algorithms and their potential adverse social and economic impact. With the growing adoption of machine learning algorithms in data-driven decision-making, the fair treatment of certain demographic groups e. g. gender, race, etc. has become a priority and sometimes even legally required. Naive use of learning algorithms that are oblivious to fairness, can lead to unwanted biases, discrimination, and unequal opportunities for minorities and underrepresented groups because of many such biases that are present in data sources for historical or procedural reasons. Other categories of biases stem from distributional noise between training and test data. These include accounting for general noise in the dataset where we cannot rely on gathering clean and well-labeled data or a systematic bias in our sample selection procedure for training. Minimizing sample mean error which is a standard in empirical risk minimization (ERM) methods can become a poor estimate of the true error on underlying data distribution under these circumstances. Sample selection bias is common in practice and occurs when the chance of an example appearing in training and test data is not equal. A specific type of sample selection bias called "covariate shift" happens when the chance of an example appearing in a training sample is affected by only the input features. This happens for example in medical trials, where patients are sampled for treatment based on having symptoms. Ignoring such conditions in learning algorithms can introduce even more bias in fair treatment of certain groups or individuals. In this thesis, we investigate methods that are robust against both types of fairness and distribution biases, based on a distributionally robust approach to classification. Distributionally robust learning methods seek a robust predictor against a worst-case expected error calculated by a distribution from a larger uncertainty set of distributions bounded by empirical data statistics. This is in contrast to empirical risk minimization (ERM) methods that minimize the average error on training samples. We base our approach on adversarial robust formulation, where we seek the optimal predictor against an adversary that finds worst-case label distribution by maximizing expected loss constrained to match the statistics of the training data. This framework provides robust learning methods against label distribution noise and covariate shift. This thesis aims at laying the first groundwork for adding fairness guarantees to this framework. As the first step, we focus on fairness under adversarial robust formulation for log loss and derive a closed-form parametric fair predictor under the IID assumption. In the next step, we focus on covariate shift as one main variant of sample selection bias and derive a fair log-loss predictor under this assumption. We show the benefits of our models both in theory and practice, evaluate them against other existing baselines, and outline future research directions based on this line of work.

History

Advisor

Ziebart, Brian

Chair

Ziebart, Brian

Department

Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

Doctoral

Degree name

PhD, Doctor of Philosophy

Committee Member

Zhang, Xinhua Kanich, Chris Ohannessian, Mesrob Dudik, Miro

Submitted date

May 2022

Thesis type

application/pdf

Language

en

Usage metrics

Keywords

Classification Robust Learning Fairness Covariate Shift Sample Selection Bias

Licence

In Copyright

Fairness for Distributionally Robust Learning

History

Advisor

Chair

Department

Degree Grantor

Degree Level

Degree name

Committee Member

Submitted date

Thesis type

Language

Usage metrics

Categories

Keywords

Licence

Exports