University of Illinois Chicago
Browse

Length Generalization with Recursive Neural Networks and Beyond

Download (3.3 MB)
thesis
posted on 2024-08-01, 00:00 authored by Jishnu Ray Chowdhury
We investigate Recursive Neural Networks (RvNNs) for language processing tasks. Roughly, from a generalized perspective, RvNNs repeatedly apply some neural function on an input until some termination condition is reached. Nevertheless, most often, the term "Recursive Neural Network" is associated with Tree-RvNNs (or sometimes, DAG-RvNN). Tree-RvNNs iteratively apply some recursive neural function to a sequence of representations in a tree-like order. Intuitively, Tree-RvNNs can model the mereological (part-whole) structures of a given input sequence. Starting from a sequence of elementary “part" representations, it can iteratively build the representation of the “whole" sequence. Tree-RvNN-based methods often do particularly well in structure-sensitive language processing tasks (e.g., arithmetic, semantic parsing, logical inference) where other comparable architectures tend to fail - at least, without extensive pre-training. Most interestingly, in such tasks, RvNN-based methods also tend to show robustness to out-of-distribution (OOD) settings by exhibiting systematic generalization and length generalization. We introduce two novel paradigms for Tree-RvNNs that can automatically induce a tree structure (removing the need for the ground-truth trees). First, we introduce Continuous Recursive Neural Networks (CRvNN) that induce a differentiable soft-tree structure instead of a discrete one. Second, we introduce Beam Tree Recursive Neural Networks (BT-RvNN) that use a beam search strategy to parse multiple discrete structures. We propose ways to improve the computational trade-offs of BT-RvNNs and a way to allow information flow from wholes to parts for token contextualization. Furthermore, we introduce extensions of a location-based recursive attention mechanism for better length generalization in certain tasks using sequence-to-sequence models and empirically investigate different ways to introduce recursive inductive biases to Transformers with and without a dynamic halt mechanism.

History

Advisor

Cornelia Caragea

Department

Computer Science

Degree Grantor

University of Illinois Chicago

Degree Level

  • Doctoral

Degree name

Doctor of Philosophy

Committee Member

Xinhua Zhang Natalie Parde Doina Caragea Elena Zheleva

Thesis type

application/pdf

Language

  • en

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC