LI-DISSERTATION-2022.pdf (4.14 MB)
Learning Structured Priors with Optimization-based Modeling
thesis
posted on 2022-05-01, 00:00 authored by Mao LiUnderpinning the success of deep learning is effective structural prior modeling schemes that allow a broad range of domain-specific knowledge in data to be naturally encoded in a deep learning architecture. For example, in the computer vision community, convolutional neural networks implicitly encode transformation invariances (e.g., rotation and translation) by learning shareable weights across spatial domain of images. For sequential data, such as natural language sentences and speech utterances, recurrent neural networks are another class of architectures that perceive sequential order and capture the dependence among inputs. Besides advanced network architecture, one of the most prevalent approach to incorporating structural priors is regularization, which usually results in a complex non-convex optimization problem and creates contention between performance of end tasks and faithful of regularization.
We argue in this thesis that optimization methods provide an expressive set of primitive operations that allow us to integrate structural priors into the modeling pipeline without interference the learning of end tasks. We first propose inserting proximal mapping as a hidden layer to the deep neural network, which directly and explicitly produces well regularized hidden layer outputs. The resulting technique is shown well connected to kernel warping and dropout, and novel algorithms were developed for robust temporal learning and multiview learning. Next, we extend our framework to learn well regularized functions which project given inputs to structured outputs. As an instantiation of this approach, we aim to solve an unsupervised domain adaptation problem in which the minimax game leads to the training process unstable. A bi-level optimization based approach was proposed to decouple the minimax optimization so that the model enjoys a much more principled and efficient training procedure. In addition, our method warping probability discrepancy measures towards the end tasks by leveraging the pseudo-labels produced by the optimal predictor.
We validate our proposed methods through extensive experiments including image classification, speech recognition, cross-lingual word embedding, and domain adaptation. Our methods demonstrate a number of benefits over other baseline methods as we achieved state-of-the-art performance in various supervised and unsupervised learning tasks.
History
Advisor
Zhang, XinhuaChair
Zhang, XinhuaDepartment
Computer ScienceDegree Grantor
University of Illinois at ChicagoDegree Level
- Doctoral
Degree name
PhD, Doctor of PhilosophyCommittee Member
Yu, Philip S Ziebart, Brian Hu, Mengqi White, MarthaSubmitted date
May 2022Thesis type
application/pdfLanguage
- en