Novel Algorithm for Constrained Optimal Design and Information-based Subdata Selection for Logistic Model
thesisposted on 01.11.2017 by Qianshun Cheng
In order to distinguish essays and pre-prints from academic theses, we have a separate category. These are often much longer text based documents than a paper.
My thesis includes two major parts which are described as follows. The first part develops a new powerful algorithm for multiple-constrained optimal design problems. Experiments with multiple objectives form a staple diet of modern scientific research. Deriving optimal designs with multiple objectives is a long-standing challenging problem with only a few tools available. The few existing approaches cannot provide a fully satisfactory solution in general: either the computation is very expensive, or a satisfactory solution is not guaranteed. A novel algorithm is proposed to address this literature gap. We prove the convergence of this algorithm, and show in various examples that the new algorithm can derive the true solutions with high speed. The second part is develops an information-based optimal subdata selection strategy, which can efficiently pick out subsample of fixed size from massive data set with the logistic regression model. Advances in computes technology have enabled an exponential growth in data collection and the size of data sets. For the extraordinary large data sets, proven statistical methods are no longer applicable due to computational limitations. A critical step in Big Data analysis is data reduction. In this thesis, we investigate the sampling approach of selecting subsets under the logistic regression model. For random sampling approaches, it is shown that the information contained in the subdata is limited by the size of the subset. A novel framework of selecting subsets is proposed. The information contained in the subdata based on the new framework increases as size of full data increases. The respective performances of the proposed approaches, along with some of the widely-applied existing methods, are compared under various criteria based on extensive simulation studies.