posted on 2024-08-01, 00:00authored byJishnu Ray Chowdhury
We investigate Recursive Neural Networks (RvNNs) for language processing tasks. Roughly, from a generalized perspective, RvNNs repeatedly apply some neural function on an input until some termination condition is reached. Nevertheless, most often, the term "Recursive Neural Network" is associated with Tree-RvNNs (or sometimes, DAG-RvNN). Tree-RvNNs iteratively apply some recursive neural function to a sequence of representations in a tree-like order. Intuitively, Tree-RvNNs can model the mereological (part-whole) structures of a given input sequence. Starting from a sequence of elementary “part" representations, it can iteratively build the representation of the “whole" sequence. Tree-RvNN-based methods often do particularly well in structure-sensitive language processing tasks (e.g., arithmetic, semantic parsing, logical inference) where other comparable architectures tend to fail - at least, without extensive pre-training. Most interestingly, in such tasks, RvNN-based methods also tend to show robustness to out-of-distribution (OOD) settings by exhibiting systematic generalization and length generalization.
We introduce two novel paradigms for Tree-RvNNs that can automatically induce a tree structure (removing the need for the ground-truth trees). First, we introduce Continuous Recursive Neural Networks (CRvNN) that induce a differentiable soft-tree structure instead of a discrete one. Second, we introduce Beam Tree Recursive Neural Networks (BT-RvNN) that use a beam search strategy to parse multiple discrete structures. We propose ways to improve the computational trade-offs of BT-RvNNs and a way to allow information flow from wholes to parts for token contextualization. Furthermore, we introduce extensions of a location-based recursive attention mechanism for better length generalization in certain tasks using sequence-to-sequence models and empirically investigate different ways to introduce recursive inductive biases to Transformers with and without a dynamic halt mechanism.
History
Advisor
Cornelia Caragea
Department
Computer Science
Degree Grantor
University of Illinois Chicago
Degree Level
Doctoral
Degree name
Doctor of Philosophy
Committee Member
Xinhua Zhang
Natalie Parde
Doina Caragea
Elena Zheleva