posted on 2022-05-01, 00:00authored byYu-Che Chung
This dissertation aims to provide flexible missing data models and robust methods for handling missing data. Two methods are proposed, developed and studied. First, a mixture missing data mechanisms is developed where we assume flexible missing data assumptions at individual level. This proposed model can greatly reduce bias when the proportions of missing mechanisms are correctly specified and demonstrates strong performance even when the proportions of missing mechanisms are misspecified. Extensive simulation studies are conducted to assess the proposed method. We examine the method for cross-sectional type data as well as longitudinal type data. We also consider non-parametric regression for potential nonlinear relationships between the outcome and covariates. We found mixture missing mechanisms can cause unstable estimates when assuming missing not at random (MNAR), while stable and low bias estimates are seen if proper mixture mechanisms are assumed. Other scenarios including under-fitting/over-fitting and intensity of missingness in data generating model are also assessed in the simulation. Second, a missing data analysis often utilizes an outcome model and a missingness model. The well-known doubly robustness property provides asymptotic protection against misspecification of one of these two models. Little and An (2004), Zhang and Little (2009) and others proposed penalized spline of propensity prediction (PSPP) method that provides double robustness for predicting marginal and conditional means under missing at random (MAR). We develop multiple penalized spline of propensity prediction (mPSPP) method which incorporates multiple propensity scores models. We establish that mPSPP is robust against propensity score misspecification. We also develop stratified mPSPP that provides multiple robustness for marginal and conditional means. This approach uses regularization when a large number of propensity score models are included in the prediction model. We assess performance of the proposed mPSPP methodology in both non-regularized and regularized settings in extensive numerical studies.
The PArTNER study is presented as a real data example and both proposed methods are applied to the study to show their applicability. Other standard missing data methods are also included and compared in the real data example.
History
Advisor
Basu, Sanjib
Chair
Basu, Sanjib
Department
Public Health Sciences-Biostatistics
Degree Grantor
University of Illinois at Chicago
Degree Level
Doctoral
Degree name
PhD, Doctor of Philosophy
Committee Member
Berbaum, Michael
Bhaumik, Dulal
Demirtas, Hakan
Krishnan, Jerry