posted on 2022-08-01, 00:00authored byChristopher Tran
Estimating causal effects is an important problem in empirical sciences. Understanding and encoding how and why a variable can cause different outcomes is also important for developing autonomous systems that exhibit human-level intelligence. Randomized controlled trials are the gold standard for causal inference, but it is often infeasible to conduct such trials due to ethical or financial concerns. Therefore, observational studies have risen in importance due to the accumulation of data in many domains, such as healthcare, social sciences, and e-commerce. Estimating how a treatment affects a population as a whole is a well-studied problem. However, individuals differ in characteristics and responses to treatment, and population-level average treatment effects cannot capture this heterogeneity. Estimating how a treatment affects different individuals, known as heterogeneous treatment effect (HTE) estimation, is important in deciding which actions to take. In the last few years, there has been an increasing interest in adapting machine learning algorithms to estimate heterogeneous effects from experimental and observational data. However, these data-driven methods often rely on strong assumptions that do not take into consideration the underlying structure of the causal model and thus can lead to incorrect estimation by taking spurious correlations into account.
This thesis seeks to address the deficiencies of model-free approaches for HTE estimation by developing practical and accurate algorithms. I first start by exploring state-of-the-art, data-driven, heterogeneous treatment effect estimators, investigating how they perform under various types of underlying structural causal models, and dealing with model-free assumptions. I introduced a new algorithm, HTE-FS, that uses a greedy feature selection approach combined with local structure learning to pick out important variables in the underlying model to improve data-driven estimation. I develop a novel causal tree algorithm, CT-L, which leverages a validation set to learn heterogeneous effects. In addition, I introduce the novel problem of estimating triggers --- a threshold that leads to the most change in effect --- for heterogeneous effects. I utilized CT-L to deepen the understanding of how readers feel when reading written narratives. Finally, I introduced the problem of heterogeneous peer effect estimation and showed how this, in conjunction with the problem of estimation triggers, can be used to predict thresholds in the Linear Threshold Model.
History
Advisor
Zheleva, Elena
Chair
Zheleva, Elena
Department
Computer Science
Degree Grantor
University of Illinois at Chicago
Degree Level
Doctoral
Degree name
PhD, Doctor of Philosophy
Committee Member
Caragea, Cornelia
Ziebart, Brian
Tafti, Ali
Lerman, Kristina