Multiple Imputation via a Semi-Parametric Probability Integral Transformation
thesisposted on 2012-09-07, 00:00 authored by Irene B. Helenowski
In real data scenarios, the distribution of the data is often unknown. Therefore, methods for imputing data which relax distributional or model assumptions may be of great interest to investigators. Here, we propose semi-parametric approaches allowing us to relax distributional assumptions when imputing continuous data, multinomial or loglinear model assumptions when imputing binary data, and general location model assumptions when imputing mixed continuous and binary data. The nonparametric portion of our methods involves mapping data to normally distributed values via empirical cumulative distribution (eCDF) or quantile computation and the parametric portion involves multiple imputation under the normality assumption via joint modeling. Applying our approaches to data generated under the MCAR mechanism and to real data from databases of the Northwestern University SPORE in Prostate Cancer (Grant #: P50 Ca 090386) and New York City Health and Nutrition Survey gave promising results.