Download file

Learning Patterns of Somatic Mutations and Medical Images in Human Cancer With Machine Learning

Download (4.49 MB)
posted on 01.12.2020, 00:00 by Rahul Mehta
Datasets generated at different stages of cancer diagnosis and prognosis has led to an era of precision medicine, where clinicians and researchers can now potentially create a tailored treatment plan for an individual patient. Whether datasets arise from magnetic resonance images, DNA microarray, or epigenetic markers, the thrust of the challenge is to create a treatment plan that comprehensively characterizes a patient’s cancer. The prelude to this challenge is identifying the underlying structure--hidden patterns--of these datasets, and from there understanding how to utilize these patterns to accurately infer a patient's diagnosis and prognosis in cancer. Here we explore machine learning solutions that provide a powerful and elegant framework for improved characterization of patients by assessing their cancerous lesion. We begin with analysis of cancerous lesions identified with fluorodeoxyglucose Positron emission tomography–computed tomography (FDG-PET-CT) and Diffusion Weighted Magnetic Resonance Imaging (DWI) for predicting a lesion's treatment response and classifying a lesion's histology , respectively. We treat metastatic liver cancer lesions obtained from FDG-PET-CT as 3D shapes and infer if a patient will respond or not-respond to a radiotherapy based on 3D shape features. Breast cancer lesions obtained from DWI are used to generate distinct parameters that represent tissue microstructure to differentiate if a lesion is benign or malignant. Situated within this framework, we illustrate the value of computational and statistical information available within a lesion image for other downstream tasks. We then move onto the intersection of medical imaging and genomics with a deep latent variable model that predicts somatic mutations from medical images. Unlike traditional computational models that focus on a specific cancer or a specific set of mutations, we created a model that scaled to incorporate image and mutation data from all possible cancerous lesion types that are publicly available. A unique property of this model is that the lesion images are modeled as point clouds instead of three-dimensional images. Our approach studies the two different yet related datasets as two distinct latent probability distributions unified by one shared distribution. The shared distribution is implicitly encouraged to create a connection between the two domains, so that we can faithfully transfer information from one domain to another. This learning paradigm allows us to predict all possible somatic mutations within a patient, thereby potentially aiding clinicians to assess effective treatment solutions for a patient during the initial diagnosis. As predicting the occurrence of somatic mutations only represents a minutia of the complexity of cancer biology, we propose a generative probabilistic latent variable model to determine co-occurrence patterns of somatic mutations. Whereas standard learning methodology uses heuristics and frequency for modeling somatic mutations, we created a data-driven dependent prior that enables us to specify a notion of similarity on both positive and negative correlations between somatic mutations. Our results showed biological processes, total number of somatic mutations, non-linear mutation-mutation interactions, and cancer type are all latent confounders that play an important role in influencing the co-occurrence patterns of somatic mutations Together, our research demonstrates the value of correctly characterizing a cancerous lesion to generate patterns that provide diagnostic and prognostic insights of a patient's cancer.



Karaman, Muge


Dai, Yang



Degree Grantor

University of Illinois at Chicago

Degree Level


Degree name

PhD, Doctor of Philosophy

Committee Member

Lu, Yang Liang, Jie Gaitonde, Sujata

Submitted date

December 2020

Thesis type




Usage metrics