University of Illinois Chicago
Browse

Uncovering Bias and Fairness in Crowdsourced Labeling

Download (1.89 MB)
thesis
posted on 2023-12-01, 00:00 authored by Simone Lazier
In recent times, the confluence of the internet's vast connectivity and the pervasive influence of social media has dramatically transformed the landscape of information exchange and perspective sharing. This transformation has played a pivotal role in catalyzing the evolution of crowdsourcing--a paradigm wherein an expansive array of information and viewpoints can be harnessed efficiently and cost-effectively from a diverse and geographically dispersed populace. While the rise of the internet and social media has undoubtedly accelerated the momentum of crowdsourcing, this phenomenon is shaped by a multifaceted interplay of historical, technological, and sociocultural factors that extend beyond the digital realm. As a result, crowdsourcing has emerged as a dynamic approach that leverages collective intelligence, fostering innovation and efficiency in problem-solving and decision-making processes. This dissertation delves into the implications and complexities of crowdsourcing's ascent, particularly in the context of data reliability, fairness, and equitable representation, and proposes novel methodologies to enhance its effectiveness while mitigating potential biases and inaccuracies. In response, businesses have embraced crowdsourcing as a judicious and resource-effective avenue to amass information, facilitate decision-making, and address intricate challenges. Noteworthy is its successful deployment across a diverse spectrum of real-world applications and relevance in advanced machine learning research. Despite the benefits of crowdsourcing, the fairness of the information gathered is often questionable due to the presence of false information and biased viewpoints. Notably, there has been limited effort in ensuring that the labels assigned to collected data, which are crucial for downstream machine learning tasks, consider the perspectives of underrepresented groups in the population. Moreover, despite the growing interest in fair machine learning practices, a significant portion of academic research has focused on developing algorithms that produce unbiased results. Paradoxically, the integrity of the training data and the pivotal role of the labels employed to refine these algorithms have been largely disregarded. The consequences of this oversight are substantial: if the training data contains biases or inaccuracies, the models developed from it will not only perpetuate existing biases but may even amplify them. In response to these gaps, this dissertation presents a new way to determine the reliability of information in crowdsourcing, focusing on improving the fairness of the labels assigned while maintaining their accuracy. To achieve this, we first conduct an exploratory analysis to quantify the extent of unfairness in crowdsourcing and the efficacy and limitations of current approaches to aggregate labels and mitigate workers' bias. We then propose a new fairness metric specifically tailored for crowdsourcing scenarios overcoming the conventional fairness evaluation pitfalls. Our approach is rooted in the idea that similar samples should be treated equally. Additionally, we put forth two distinct strategies to mitigate biases and evaluate their impact using both real-world and artificial datasets. Through this rigorous inquiry, we forge a comprehensive understanding of the nuanced interplay between our methodological approach and the prevailing state-of-the-art methodologies. In light of the limited work in this area, this research pioneers a fresh approach to assessing the accuracy of crowdsourced information, aiming to enhance the fairness of label assignments without sacrificing accuracy. This effort closes with insights into potential paths for future research, advancing the ongoing journey to refine, expand, and push the boundaries of this innovative approach.

History

Advisor

Hadis Anahideh

Department

Mechanical and Industrial Engineering

Degree Grantor

University of Illinois Chicago

Degree Level

  • Masters

Degree name

MS, Master of Science

Committee Member

Houshang Darabi Roberto Cigolini

Thesis type

application/pdf

Language

  • en

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC