University of Illinois Chicago
Browse

Induced Model Matching: Learning from Restricted Models

Download (2.41 MB)
thesis
posted on 2025-05-01, 00:00 authored by Usama Muneeb
Given a very good predictive model that uses a restricted feature set, what is the best way to incorporate it into a large full-featured model? There are two main scenarios where this problem can arise: (a) we have ample amounts of data with restricted features or (b) the restricted feature model is easier to learn with existing data. Here are some relevant cases. For (a), while training a logistic regression model most data may be missing many features for privacy reasons or while training an MDP policy with expensive data (with full sensing), we may have a past model built using a lot of cheap data (partial sensing) available. For (b), it is common to augment LM data with smaller language models, such as N-grams, and these are assumed to be reliably buildable from the same data. We discuss prior works that have used restricted models in the training of full-featured models using implicit or explicit regularization and we reveal their caveats. To solve these caveats, we propose our methodology, Induced Model Matching (IMM), that aligns the context-restricted, or induced, version of the large model with the restricted model. We show that correctly incorporating the restriction is crucial to have consistency in the limit (theoretically) and to achieve better performance with finite samples (experimentally) than the past approaches. Namely, these past approaches are (1) noising, which is implicit in addressing the problem, and (2) reverse knowledge distillation from weak teachers, which is explicit. These past approaches do not exploit the restriction being the nature of the weakness and can be problematic in terms of consistency. We demonstrate the merits of IMM using logistic regression as a proof of concept. We then apply it in language modeling (the application that initially inspired it) and demonstrate it on both LSTM and transformer full models, using bigrams as restricted models. We lastly give a simple RL example, which shows that POMDP policies can help learn better MDP policies. The IMM principle is thus generally applicable in common scenarios where restricted data is cheaper to collect or restricted models are easier to learn.

History

Advisor

Mesrob Ohannessian

Department

Electrical and Computer Engineering

Degree Grantor

University of Illinois Chicago

Degree Level

  • Doctoral

Degree name

PhD, Doctor of Philosophy

Committee Member

Brian Ziebart Natalie Parde Shuo Han Ahmet Enis Cetin

Thesis type

application/pdf

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC