University of Illinois at Chicago
Browse
- No file added yet -

Variable Selection in Presence of Strong Collinearity with Application to Environmental Mixtures

Download (1.71 MB)
thesis
posted on 2020-12-01, 00:00 authored by Jiyeong Jang
Variable selection has become an essential element of high dimensional statistical modeling to yield parsimonious models while keeping high prediction accuracy. High dimensionality often induces collinearity problems. For instance, studies of environmental mixtures include a large number of pollutants that are strongly inter-correlated. Regularized variable selection methods such as LASSO are popular for statistical variable selection, however, these methods often do not perform well in presence of strong collinearity in terms of selection and prediction. To address these challenges a novel method, namely COrrelation LeaRNing for variable Selection (COLRNS), is developed that is based on iterative correlation learning for cluster detection and variable selection. The COLRNS is further extended to COLRNS Generalized Linear Model (COLRNS-GLM) to be applicable in a generalized linear regression setting. The performance of the methods is evaluated through an extensive set of simulations and real-world applications to environmental mixtures data. The results show that the methods effectively identify a set of influential predictors, improve prediction accuracy, and reduce error in parameter estimation in most simulation scenarios and data applications under strong collinearity in high dimensional data.

History

Advisor

Basu, Sanjib

Chair

Basu, Sanjib

Department

Public Health Sciences-Biostatistics

Degree Grantor

University of Illinois at Chicago

Degree Level

  • Doctoral

Degree name

PhD, Doctor of Philosophy

Committee Member

Chen, Hua Yun Turyk, Mary Bhaumik, Runa Awadalla, Saria

Submitted date

December 2020

Thesis type

application/pdf

Language

  • en

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC