Analysis of Survey Data with Non-Ignorable Missing Covariates

Langi, Fima Lanra Fredrik Gerarld

LANGI-DISSERTATION-2017.pdf (1.82 MB)

Analysis of Survey Data with Non-Ignorable Missing Covariates

thesis

posted on 2018-02-08, 00:00 authored by Fima Lanra Fredrik Gerarld Langi

Missing data are common in survey sampling, which create a spectrum of inferential problems. In this thesis, a method to analyze survey data with potentially non-ignorable covariates is proposed. The approach is particularly developed to address the limitations in current routines of the standard statistical packages when, simultaneously, the model of interest has a mixture of categorical and continuous missing covariates, the analysis needs to incorporate the sampling design under different assumptions about its functional form, and there is a demand for manageable computation time in practical sense. The proposed method proceeds as a full likelihood procedure if the sampling probability function is known for all observations, but it becomes a quasi-likelihood approach when the quantity of survey weight is instead the only available information about sample selection. Three classes of survey data are considered during the development, including those of which none (Case 1), all (Case 2), or some (Case 3) of the covariates are observable outside the samples. Two situations are further defined on each of them, that is, whether the functional form of sample selection is known (Situation 1) or unknown (Situation 2). Given its construction, the proposed method, termed the augmentation assisted EM algorithm or simply the augmentation method, retains the desirable properties of the maximum likelihood estimates, while flexible enough to handle both continuous and categorical missing covariates, and can adapt the use of survey weight to improve inference. The simulation studies indicates that the proposed method performs reliably well across all classes of survey data. In terms of unbiasedness, it is competitive with and may occasionally outperform the multiple imputation by chained equations (MICE), a well-known technique in multiple imputation. Efficiency of its estimates are also comparable to MICE. In the real data application using the dataset from the Indonesia Demographic and Health Survey of 2012, the proposed method successfully estimates the demographic, health, and birth-related factors associated with the infant mortality. Most importantly, it is able to improve the results of complete case analyses by both correcting the magnitude of effect size and increasing the power of analysis to detect the variable significance.

History

Advisor

Chen, Hua Yun

Chair

Chen, Hua Yun

Department

Public Health Sciences-Biostatistics

Degree Grantor

University of Illinois at Chicago

Degree Level

Doctoral

Committee Member

Freels, Sally Demirtas, Hakan Liu, Li Johnson, Timothy

Submitted date

December 2017

Issue date

2017-09-12

Usage metrics

Keywords

missing data survey sampling maximum likelihood EM algorithm

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Analysis of Survey Data with Non-Ignorable Missing Covariates

History

Advisor

Chair

Department

Degree Grantor

Degree Level

Committee Member

Submitted date

Issue date

Usage metrics

Categories

Keywords

Licence

Exports