University of Illinois Chicago
Browse

Response-Aided Score-Matching Approaches for Big Data Analysis under Generalized Linear Models

Download (4.74 MB)
thesis
posted on 2021-05-01, 00:00 authored by Duo Zheng
In this thesis, we propose an efficient method called Response-Aided Score Matching Representative (RASMR) approach to facilitate the big data analysis under generalized linear models. This method utilizes representatives of natural or algorithmically generated data blocks to produce estimates of parameters. The RASMR improves the performance and stability of previous generation Score-Matching Representative (SMR) proposed by Li and Yang (2018) by imposing additional response-aided split on data blocks to resolve the multiple solution issue of the score-matching function. The renovation in RASMR ensures the uniqueness of representatives for each data block and improves the quality of representatives. We propose additional modification to resolve problems in practical implementations of RASMR, including delta-ratio split for likelihood function value approximation, exponential learning rate decay for forced convergence and K-means clustering and correlation-based quantile spilt for quick data blocking. The accurate estimation and high quality representatives enable promising application of RAMSR in a variety of statistical analysis problems. The RASMR estimate of AIC or BIC was developed for model comparison on big data. We create a compound strategy for variable selection using RASMR and extend RASMR to the V-fold cross-validation framework to conduct more general evaluation of metrics. As an illustration, we apply our RASMR method with its extensions on a benchmark big data, the Airline On-time Performance data.

History

Advisor

Yang, Jie

Chair

Yang, Jie

Department

Mathematics, Statistics and Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

  • Doctoral

Degree name

PhD, Doctor of Philosophy

Committee Member

Chen, Hua Yun Wang, Jing Wu, Yichao Yang, Min

Submitted date

May 2021

Thesis type

application/pdf

Language

  • en

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC