University of Illinois Chicago
Browse

Evaluating and Mitigating Bias in Large Language Models and Retrieval-Augmented Generation Systems

Download (3.22 MB)
thesis
posted on 2025-05-01, 00:00 authored by Shweta Parihar
Social biases in large language models (LLMs) raise critical fairness concerns; this work addresses them through two complementary projects. The first project addresses the challenge of mitigating gender bias without compromising the language modeling capabilities of LLMs. Counterfactual Data Augmentation (CDA), a widely used method for fine-tuning, generates synthetic data that may poorly align with real-world distributions or ignore the social context of altered sensitive attributes (e.g., gender) in the pretraining corpus. To overcome this, we propose Context-CDA, which leverages LLMs to produce contextually relevant and diverse counterfactual data for fine-tuning. By minimizing distributional discrepancies between the debiasing corpus and the pre-training data, this approach enhances alignment with real-world usage. To further improve data quality, we implement semantic entropy filtering to remove uncertain text samples. Evaluations on bias benchmarks demonstrate that Context-CDA significantly reduces bias while preserving model performance. The second project focuses on bias propagation in Retrieval-Augmented Generation (RAG) Systems. We investigate how the addition of retrieved contexts influences the bias behavior of LLMs. Our findings reveal a reduction in bias after incorporating the RAG pipeline, indicating that the inclusion of external context often helps counteract stereotype driven predictions. We also delve deeper into understanding the model’s reasoning process by integrating Chain-of-Thought (CoT) prompting into the RAG system while assessing faithfulness of the model’s CoT. Our experiments reveal that the model’s bias inclination shifts between stereotype and anti-stereotype responses as more contextual information is incorporated. Also, contrary to the bias reduction observed with standard RAG, we find that applying CoT with RAG increases overall bias across datasets. This counterintuitive result can be attributed to the bias-accuracy trade-off. While CoT improves accuracy by encouraging more deliberate reasoning, this often comes at the expense of fairness, thereby advocating for the design of bias-aware reasoning frameworks to mitigate this trade-off. Public datasets used: Statistical and neural machine translation news commentary, StereoSet, CrowS-Pairs, Multi-Genre Natural Language Inference, Natural Language Inference Bias, Semantic textual similarity benchmark, BiasBios, Question-answering Natural Language Inference, Recognizing Textual Entailment, Stanford Sentiment Treebank v2, Winogender, WinoBias, WikiText-103, Colossal Clean Crawled Corpus, Bias in Open-ended Language Generation, Holistic Bias

History

Advisor

Lu Cheng

Department

Department of Computer Science

Degree Grantor

University of Illinois Chicago

Degree Level

  • Masters

Degree name

MS, Master of Science

Committee Member

Sourav Medya Man Luo Natalie Parde

Thesis type

application/pdf

Language

  • en

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC