posted on 2020-12-01, 00:00authored byJiyeong Jang
Variable selection has become an essential element of high dimensional statistical modeling to yield parsimonious models while keeping high prediction accuracy. High dimensionality often induces collinearity problems. For instance, studies of environmental mixtures include a large number of pollutants that are strongly inter-correlated. Regularized variable selection methods such as LASSO are popular for statistical variable selection, however, these methods often do not perform well in presence of strong collinearity in terms of selection and prediction. To address these challenges a novel method, namely COrrelation LeaRNing for variable Selection (COLRNS), is developed that is based on iterative correlation learning for cluster detection and variable selection. The COLRNS is further extended to COLRNS Generalized Linear Model (COLRNS-GLM) to be applicable in a generalized linear regression setting. The performance of the methods is evaluated through an extensive set of simulations and real-world applications to environmental mixtures data. The results show that the methods effectively identify a set of influential predictors, improve prediction accuracy, and reduce error in parameter estimation in most simulation scenarios and data applications under strong collinearity in high dimensional data.
History
Advisor
Basu, Sanjib
Chair
Basu, Sanjib
Department
Public Health Sciences-Biostatistics
Degree Grantor
University of Illinois at Chicago
Degree Level
Doctoral
Degree name
PhD, Doctor of Philosophy
Committee Member
Chen, Hua Yun
Turyk, Mary
Bhaumik, Runa
Awadalla, Saria