University of Illinois at Chicago
Browse

File(s) under embargo

1

year(s)

8

month(s)

12

day(s)

until file(s) become available

From Performance to Trust: Improving the Reliability of Pre-trained Language Models

thesis
posted on 2024-05-01, 00:00 authored by Seo Yeon Park
The deep learning models can be black boxes, making it difficult to assess their learning progress. To address this challenge, we monitor training dynamics since it provides essential insights into how well a model learns from training data. By understanding training dynamics, we can optimize training strategies to improve performance and model calibration (i.e., ensuring predicted confidence accurately reflects actual accuracy). Hence, in this thesis, we explore leveraging training dynamics to enhance performance and model calibration for natural language understanding (NLU) tasks. First, we propose to leverage training dynamics, specifically Area Under the Margin and saliency maps, to enhance MixUp for improved NLU model calibration without accuracy drops. We further propose TDMixUp which utilizes training dynamics (i.e., confidence and variability on gold-label prediction) to identify informative sample pairs for MixUp. We show that TDMixUp not only improves NLU model calibration but also improves performance. Next, we introduce supervised contrastive learning guided by sample selection (SupCL-GSS) that utilizes training dynamics to construct hard positive and hard negative sets, ultimately enhancing NLU model performance and calibration. Last, we propose to utilize training dynamics to tackle real-world challenges in NLU models, specifically focusing on improving performance through increasing diversity in training data. We achieve this by exploring pseudo-labeling and data augmentation approaches coupled with LLM-generated data and predictions. By leveraging training dynamics, we identify potentially mislabeled LLM-generated samples or augmented samples and combine them with clean original samples using MixUp. We evaluate our proposed approaches on various publicly available NLU datasets including sentence-pair classification datasets (e.g., natural language inference, paraphrase detection, commonsense reasoning) and single-sentence classification (e.g., sentiment analysis, opinion mining), and show performance improvement compared to competitive baselines. We provide detailed analysis to understand the benefits of our proposed approaches.

History

Advisor

Cornelia Caragea

Department

Computer Science

Degree Grantor

University of Illinois Chicago

Degree Level

  • Doctoral

Degree name

Doctor of Philosophy

Committee Member

X i n h u a Z h a n g , B a r b a r a D i E u g e n i o , W e i T a n g , E r d e m K o y u n c u , D o i n a C a r a g e a

Thesis type

application/pdf

Language

  • en

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC