Breast cancer intratumor heterogeneity challenges our ability to predict patients’ outcomes or responses to targeted therapy; yet, available methods are limited to measure intratumor heterogeneity quantitatively. The goal of this research is to develop statistical methodologies for high dimensional PAM50 gene expression data to characterize the intratumor heterogeneity for better treatment option. In this dissertation, I propose two approaches for classification of intratumor heterogeneity: non-parametric clustering methods and finite mixture Gaussian method. For non-parametric clustering methods, I use Mahalanobis distance for classification. For finite mixture Gaussian method, as the parameters of these Gaussian mixtures cannot be estimated in closed form, so estimates are typically obtained via an iterative process, e.g. EM algorithm. However, finite mixture modeling can suffer from locally optimal solutions because of poor initial starting values. I improve EM in mixture Gaussian model by applying a simple and efficient initialization strategy based on Mahalanobis distance. This improved method allows the model to borrow information from data without any distributional assumption. The proposed model is illustrated with two real datasets from breast cancer patients, and also evaluated using simulated datasets.
History
Advisor
Bhaumik, Dulal
Chair
Bhaumik, Dulal
Department
Public Health Sciences-Biostatistics
Degree Grantor
University of Illinois at Chicago
Degree Level
Doctoral
Degree name
PhD, Doctor of Philosophy
Committee Member
Basu, Sanjib
Bhaumik, Runa
Gann, Peter
Mehta, Supriya