Semi-Parametric Mixture Gaussian Model to Detect Breast Cancer Intra-Tumor Heterogeneity
thesisposted on 01.12.2019, 00:00 authored by Dan Zhao
Breast cancer intratumor heterogeneity challenges our ability to predict patients’ outcomes or responses to targeted therapy; yet, available methods are limited to measure intratumor heterogeneity quantitatively. The goal of this research is to develop statistical methodologies for high dimensional PAM50 gene expression data to characterize the intratumor heterogeneity for better treatment option. In this dissertation, I propose two approaches for classification of intratumor heterogeneity: non-parametric clustering methods and finite mixture Gaussian method. For non-parametric clustering methods, I use Mahalanobis distance for classification. For finite mixture Gaussian method, as the parameters of these Gaussian mixtures cannot be estimated in closed form, so estimates are typically obtained via an iterative process, e.g. EM algorithm. However, finite mixture modeling can suffer from locally optimal solutions because of poor initial starting values. I improve EM in mixture Gaussian model by applying a simple and efficient initialization strategy based on Mahalanobis distance. This improved method allows the model to borrow information from data without any distributional assumption. The proposed model is illustrated with two real datasets from breast cancer patients, and also evaluated using simulated datasets.