Reinforcement Learning for Real Options: Interpretable Planning Under Uncertainty and Limited Data
thesisposted on 01.08.2021, 00:00 authored by Seyed Danial Mohseni-Taheri
Reinforcement learning (RL) has achieved great milestones for planning under uncertainty, especially in applications such as game playing and Robotics. However, the deployment of RL algorithms in real-world business problems is often hard. In this thesis, we address some of the implementation challenges, including the complexity of algorithms and/or policies that make them less understandable to RL non-experts, computational challenges related to the curses of dimensionality, and limited data in modeling the operating environment accurately in applications in the operations-finance area. In the second chapter, we study and model a new application in sustainable operations related to renewable power procurement. We focus on companies committed to procuring a specified percentage of their annual electricity demand from a renewable power source by a future date. We design a new rolling horizon policy based on information relaxation and duality theory with interpretable approximations, and in addition, account for uncertainty directly while computing decisions to address the issue of the curse of dimensionality. In the third chapter, we focus on the problem faced by a firm providing services to store ethanol and analyze the behavior of users interacting with this storage provider. We address the challenge of limited users' interaction by building a sample-efficient model of user behavior using Gaussian processes in a non-standard manner that leverages interpretable characterizations of the optimal policy from the operations management literature. In the final chapter, we investigate the long-term capacity investment problem faced by a hydropower plant and tackle the challenges of the curse of dimensionality and limited long-term market data. We design an RL algorithm to hedge against long-term model misspecification risk while mitigating any financial losses due to being overly conservative and extend the interpretable reoptimization techniques.