File(s) under embargo

1

year(s)

25

day(s)

until file(s) become available

Information-based Optimal Subdata Selection for Clusterwise Linear Regression Model

thesis
posted on 01.12.2021, 00:00 authored by Yanxi Liu
As the data size increases rapidly, the relationship between input and output variables may not be homogeneous anymore. Conventional statistical models such as generalized linear models (GLMs) may not be well-suited to heterogeneous relationships. Using a Mixture of Expert models is a good solution. The Mixture of Expert models can combine different statistical models to detect heterogeneous patterns while maintaining the benefits of conventional statistical modeling techniques. However, it needs a considerable amount of computer resources, particularly when working with big data. To address this issue, an attractive idea is to analyze a subsample of the data retaining the rich information of the full data. Information-Based Optimal Subdata Strategy (IBOSS), proposed by Wang et al. (2019), is such a strategy. The IBOSS strategy captures most of the relevant information in the full data through a judicious selection of the subdata by "maximizing" the Fisher information matrix. This project aims to develop an algorithm for the Clusterwise Linear Regression model, a type of Mixture of Experts, to select subdata based on IBOSS strategy. However, the Fisher information matrix of the model has no explicit form, which is a major challenge of the work. To overcome this challenge, we propose a surrogate matrix which is proved to be asymptotically equivalent to the Fisher information matrix, and it is used to construct the IBOSS subdata. Further, the proposed subdata selection is proved to be asymptotically optimal, i.e., no other method is statistically more efficient than the proposed one when the full data size is large.

History

Advisor

Yang, Min

Chair

Yang, Min

Department

Mathematics, Statistics, and Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

Doctoral

Degree name

PhD, Doctor of Philosophy

Committee Member

Wang, Jing Yang, Jie Karabatsos, George Chen, Huayun

Submitted date

December 2021

Thesis type

application/pdf

Language

en

Usage metrics

Categories

Exports