Length Generalization with Recursive Neural Networks and Beyond

Ray Chowdhury, Jishnu

doi:10.25417/uic.27153507.v1

Length Generalization with Recursive Neural Networks and Beyond

thesis

posted on 2024-08-01, 00:00 authored by Jishnu Ray Chowdhury

We investigate Recursive Neural Networks (RvNNs) for language processing tasks. Roughly, from a generalized perspective, RvNNs repeatedly apply some neural function on an input until some termination condition is reached. Nevertheless, most often, the term "Recursive Neural Network" is associated with Tree-RvNNs (or sometimes, DAG-RvNN). Tree-RvNNs iteratively apply some recursive neural function to a sequence of representations in a tree-like order. Intuitively, Tree-RvNNs can model the mereological (part-whole) structures of a given input sequence. Starting from a sequence of elementary “part" representations, it can iteratively build the representation of the “whole" sequence. Tree-RvNN-based methods often do particularly well in structure-sensitive language processing tasks (e.g., arithmetic, semantic parsing, logical inference) where other comparable architectures tend to fail - at least, without extensive pre-training. Most interestingly, in such tasks, RvNN-based methods also tend to show robustness to out-of-distribution (OOD) settings by exhibiting systematic generalization and length generalization. We introduce two novel paradigms for Tree-RvNNs that can automatically induce a tree structure (removing the need for the ground-truth trees). First, we introduce Continuous Recursive Neural Networks (CRvNN) that induce a differentiable soft-tree structure instead of a discrete one. Second, we introduce Beam Tree Recursive Neural Networks (BT-RvNN) that use a beam search strategy to parse multiple discrete structures. We propose ways to improve the computational trade-offs of BT-RvNNs and a way to allow information flow from wholes to parts for token contextualization. Furthermore, we introduce extensions of a location-based recursive attention mechanism for better length generalization in certain tasks using sequence-to-sequence models and empirically investigate different ways to introduce recursive inductive biases to Transformers with and without a dynamic halt mechanism.

History

Advisor

Cornelia Caragea

Department

Computer Science

Degree Grantor

University of Illinois Chicago

Degree Level

Doctoral

Degree name

Doctor of Philosophy

Committee Member

Xinhua Zhang Natalie Parde Doina Caragea Elena Zheleva

Thesis type

application/pdf

Language

en

Usage metrics

Keywords

Deep Learning Natural Language Processing Computer Science Machine Learning

Licence

In Copyright

Length Generalization with Recursive Neural Networks and Beyond

History

Advisor

Department

Degree Grantor

Degree Level

Degree name

Committee Member

Thesis type

Language

Usage metrics

Categories

Keywords

Licence

Exports