posted on 2025-05-01, 00:00authored byHuu Hoang Nguyen
Low-resource settings, in which intelligent systems constantly face emerging knowledge beyond their initial learning, are inevitable when developing intelligent systems. Only by extracting the true semantic understanding of the linguistic inputs can these systems prevail when little knowledge is provided. This persistent challenge hampers the ability of intelligent systems to excel in the emerging essential tasks. In reality, low-resource can occur on different granularities of the textual understanding: (1) coarse-grained on the sentence-level, (2) fine-grained on the token-level, or both.
Throughout this manuscript, we address the issues of low-resource settings in Natural Language Understanding (NLU) across multiple granularities. First, we tackle the challenges of low-resource coarse-grained annotations by introducing dynamic semantic extraction together with multi-perspective matching and aggregation networks. Secondly, we address the concerns of unavailable fine-grained annotations and explore the potentials of inducing such information without the need of token-level supervised training by extracting and refining the preserved knowledge existent in generic-purpose language models with additional multi-level contrastive learning objectives. Third, we overcome the challenges of low-resource multi-grained annotations by reinforcing the interconnections of different granularities via coarse-to-fine chain-of-thought reasoning and structured knowledge from Abstract Meaning Representation Graph. Finally, we broaden the scope of low-resource NLU challenges beyond English, focusing on the cross-lingual transfer towards low-resource languages through the novel phonemic transcription integration beyond the textual scripts. Our work leverages publicly available datasets catering for both Task-oriented Dialogue Systems (SNIPS, NLUE, ATIS, MTOP, MASSIVE) in conjunction with the open-source comprehensive generic-purpose multilingual NLU benchmark datasets such as XTREME.