Datasets generated at different stages of cancer diagnosis and prognosis has led to an era of precision medicine, where clinicians and researchers can now potentially create a tailored treatment plan for an individual patient. Whether datasets arise from magnetic resonance images, DNA microarray, or epigenetic markers, the thrust of the challenge is to create a treatment plan that comprehensively characterizes a patient’s cancer. The prelude to this challenge is identifying the underlying structure--hidden patterns--of these datasets, and from there understanding how to utilize these patterns to accurately infer a patient's diagnosis and prognosis in cancer. Here we explore machine learning solutions that provide a powerful and elegant framework for improved characterization of patients by assessing their cancerous lesion.
We begin with analysis of cancerous lesions identified with fluorodeoxyglucose Positron emission tomography–computed tomography (FDG-PET-CT) and Diffusion Weighted Magnetic Resonance Imaging (DWI) for predicting a lesion's treatment response and classifying a lesion's histology , respectively. We treat metastatic liver cancer lesions obtained from FDG-PET-CT as 3D shapes and infer if a patient will respond or not-respond to a radiotherapy based on 3D shape features. Breast cancer lesions obtained from DWI are used to generate distinct parameters that represent tissue microstructure to differentiate if a lesion is benign or malignant. Situated within this framework, we illustrate the value of computational and statistical information available within a lesion image for other downstream tasks. We then move onto the intersection of medical imaging and genomics with a deep latent variable model that predicts somatic mutations from medical images. Unlike traditional computational models that focus on a specific cancer or a specific set of mutations, we created a model that scaled to incorporate image and mutation data from all possible cancerous lesion types that are publicly available. A unique property of this model is that the lesion images are modeled as point clouds instead of three-dimensional images. Our approach studies the two different yet related datasets as two distinct latent probability distributions unified by one shared distribution. The shared distribution is implicitly encouraged to create a connection between the two domains, so that we can faithfully transfer information from one domain to another. This learning paradigm allows us to predict all possible somatic mutations within a patient, thereby potentially aiding clinicians to assess effective treatment solutions for a patient during the initial diagnosis.
As predicting the occurrence of somatic mutations only represents a minutia of the complexity of cancer biology, we propose a generative probabilistic latent variable model to determine co-occurrence patterns of somatic mutations. Whereas standard learning methodology uses heuristics and frequency for modeling somatic mutations, we created a data-driven dependent prior that enables us to specify a notion of similarity on both positive and negative correlations between somatic mutations. Our results showed biological processes, total number of somatic mutations, non-linear mutation-mutation interactions, and cancer type are all latent confounders that play an important role in influencing the co-occurrence patterns of somatic mutations
Together, our research demonstrates the value of correctly characterizing a cancerous lesion to generate patterns that provide diagnostic and prognostic insights of a patient's cancer.