LI-DISSERTATION-2018.pdf (728.46 kB)

Score-Matching Representative Approach for Big Data Analysis with Generalized Linear Models

Download (728.46 kB)
thesis
posted on 28.11.2018, 00:00 by Keren Li
We propose a fast and efficient strategy, called the representative approach, with linear models and generalized linear models for big data analysis, and in particular for distributed dataset. With a given partitioning of big dataset, this approach constructs a representative data point for each data block and fits the target model on the representative dataset. In terms of time complexity, it is as fast as the subsampling approaches in the literature. As for effi- ciency, its accuracy of estimated parameters appears to be better than the divide-and-conquer method. Additionally, the representative approach is especially useful when analyzing massive data distributed stored on different nodes, since the generation of representatives is conditional independent. Overall, we recommend two representative approaches, mean representative (MR) and score-matching representative (SMR), along with theoretical justifications, for big data analysis with generalized linear models. Comprehensive simulation studies confirm that MR is a good solution for linear models and pre-analysis for GLMs, while SMR outperforms the subsampling and divide-and-conquer methods, even with moderate size of block, for general GLMs. With properly chosen data partition, SMR estimate appears to be even comparable with the full data estimate. Using the Airline on-time performance data as an illustrative real big data example, we show that MR and SMR are as good as the full data estimate when available. For GLMs with flat inverse link functions and moderate coefficients of the continuous vari- ables, we recommend MR. Otherwise, we recommend SMR solution with MR as an initial step with a finer partition.

History

Advisor

Yang, Jie

Chair

Yang, Jie

Department

Department of Mathematics, Statistics, and Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

Doctoral

Committee Member

Hedayat, Samad Yang, Min Wang, Jing Chen, Hua Yun

Submitted date

August 2018

Issue date

10/07/2018

Usage metrics

Categories

Exports