University of Illinois Chicago
Browse

Low-resource Multi-grained Natural Language Understanding: English and Beyond

Download (2.7 MB)
thesis
posted on 2025-05-01, 00:00 authored by Huu Hoang Nguyen
Low-resource settings, in which intelligent systems constantly face emerging knowledge beyond their initial learning, are inevitable when developing intelligent systems. Only by extracting the true semantic understanding of the linguistic inputs can these systems prevail when little knowledge is provided. This persistent challenge hampers the ability of intelligent systems to excel in the emerging essential tasks. In reality, low-resource can occur on different granularities of the textual understanding: (1) coarse-grained on the sentence-level, (2) fine-grained on the token-level, or both. Throughout this manuscript, we address the issues of low-resource settings in Natural Language Understanding (NLU) across multiple granularities. First, we tackle the challenges of low-resource coarse-grained annotations by introducing dynamic semantic extraction together with multi-perspective matching and aggregation networks. Secondly, we address the concerns of unavailable fine-grained annotations and explore the potentials of inducing such information without the need of token-level supervised training by extracting and refining the preserved knowledge existent in generic-purpose language models with additional multi-level contrastive learning objectives. Third, we overcome the challenges of low-resource multi-grained annotations by reinforcing the interconnections of different granularities via coarse-to-fine chain-of-thought reasoning and structured knowledge from Abstract Meaning Representation Graph. Finally, we broaden the scope of low-resource NLU challenges beyond English, focusing on the cross-lingual transfer towards low-resource languages through the novel phonemic transcription integration beyond the textual scripts. Our work leverages publicly available datasets catering for both Task-oriented Dialogue Systems (SNIPS, NLUE, ATIS, MTOP, MASSIVE) in conjunction with the open-source comprehensive generic-purpose multilingual NLU benchmark datasets such as XTREME.

History

Advisor

Philip S. Yu

Department

Computer Science

Degree Grantor

University of Illinois Chicago

Degree Level

  • Doctoral

Degree name

PhD, Doctor of Philosophy

Committee Member

Natalie Parde Shweta Yadav Chenwei Zhang Ye Liu

Thesis type

application/pdf

Language

  • en

Usage metrics

    Dissertations and Theses

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC