Statistical and Causal Inference for Complex High Dimensional Data
thesis
posted on 2023-08-01, 00:00authored byNurlan Abdukadyrov
This thesis addresses the challenges posed by high dimensional data, which has gained prominence due to the abundance of information available in recent years. We develop methods for statistical inference and causal inference specifically designed for high dimensional scenarios..
For statistical inference, we focus on two sample hypothesis tests with missing values. Existing methods for high dimensional mean tests do not consider missing values or rely on the assumption of data being missing completely at random (MCAR), which can be restrictive. We propose a method that works under the less restrictive assumption of missing at random (MAR). We derive the limiting distribution and develop a weighted $\chi^2$ approximation for a test statistic. Simulation studies and analysis of gene data demonstrate the advantage of our method.
For causal inference, we propose a doubly robust test statistic for high dimensional outcome variables. Existing methods only consider case with a low dimensional
outcome variable and not applicable for high dimensional scenario. The proposed method has property
of being doubly robust, it gives more flexibility in models specification. We show that under some mild
conditions the test statistic can be approximated by a normal distribution. Through simulations and empirical studies using gene data and Parkinson's disease data, we highlight the advantages of our proposed method.
Overall, our thesis contributes novel methods for statistical and causal inference in high dimensional data, with empirical validation and efficient computational approaches.