University of Illinois at Chicago
Browse

File(s) under embargo

1

year(s)

8

month(s)

18

day(s)

until file(s) become available

Scaling Approximate Linear Programming: Langevin Based Sampling for Constraint Violation Learning

thesis
posted on 2024-05-01, 00:00 authored by Virginia Marcante
Approximate Linear Programs (ALPs) are fundamental models for computing value function approximations and bounds for high-dimensional Markov decision processes (MDPs) arising in a wide range of applications. ALP has a manageable number of variables and a large number of constraints. Constraint generation and sampling are two traditional approaches to handle the numerous constraints. The former approach has limited applicability. The latter approach, while broadly applicable, does not guarantee a valid bound on the optimal policy value. Constraint violation learning is a recent approach for solving ALP that combines first-order methods with Metropolis Hastings sampling to provide a general purpose approach that retains the bounding property of ALP and has convergence guarantees. Its use of Metropolis Hastings however limits its scalability. This thesis introduces a novel adaptation of constraint violation learning based on Langevin Dynamics (LD) that allows us to scale the solution of ALPs to higher dimensional applications. The primary contribution of the thesis is a comprehensive examination of LD for constraint violation learning. This is a challenging setting for LD because sampling distributions are not log-concave in general, which is when LD has well understood theoretical properties. Nevertheless, we find that a log-concave regularization term used within constraint violation learning plays an important role in ensuring the effectiveness of LD. We argue that the core computational cost of LD is dimension independent. Through a comprehensive numerical study on a perishable inventory control application, this thesis demonstrates that LD maintains consistent running times even as MDP dimension increase, making it a scalable option for solving ALP. In contrast, Metropolis Hastings does not scale as effectively, as expected. This research extends the scalability of ALP using an LD based sampling approach in constraint violation learning. The exploration of LD for solving ALP is novel, thus bringing together tools from machine learning and operations research. We are also not aware of the use of LD to address the large scale nature of mathematical programs (e.g., linear programs). This thesis takes a first step at exploring this interface and its findings motivate the exploration of LD for large-scale optimization beyond ALP.

History

Advisor

Ian Kash

Department

Computer Science

Degree Grantor

University of Illinois Chicago

Degree Level

  • Masters

Degree name

MS, Master of Science

Committee Member

Xinhua Zhang, Sandra Pieraccini, Selvaprabu Nadarajah

Thesis type

application/pdf

Language

  • en

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC