Computational Methods for Longitudinal Microbiome Analysis: Identification, Modeling, and Classification
thesisposted on 2018-11-27, 00:00 authored by Ahmed Metwally
The microbiome plays a vital role in host-immune responses resulting in significant effects on host health. Dysbiosis of the microbiome has been linked to diseases including asthma, obesity, diabetes, and inflammatory bowel disease. Over the past decade, culture-independent sequencing methods have revolutionized microbiome studies through identification of the genetic content of microbial communities in the form of millions to billions of short DNA sequences. The sequences derived from the microbiome originate from thousands of different species that need to be identified, quantified, and compared over time among disease phenotypes. These analyses can detect biomarkers that may be used for microbial reconstitution through bacteriotherapy, probiotics, or antibiotics. Current taxonomic identification methods that achieve high precision can lack sensitivity in some applications. Conversely, methods with high sensitivity can suffer from low precision and require long computation time. Thus, highly accurate and sensitive taxonomic identification methods are needed. Furthermore, in longitudinal studies, sample collection suffers from all forms of variability such as a different number of subjects per phenotypic group, a different number of samples per subject, and samples not collected at consistent time points. These inconsistencies make current analysis methods unsuitable and create opportunities for the development of new methods. In addition, given the strong association between microbiome and disease, computational models can be built to predict disease status or prognosis using longitudinal microbial profiles. In this thesis, we discuss the computational methods and tools we have developed that improve both the characterization and longitudinal analysis of the microbiome. The first method, WEVOTE, classifies microbial sequences into taxonomic units with both high precision and high sensitivity. The second method, MetaLonDA, identifies time intervals of differentially abundant microbial features in longitudinal studies. The third method is a computational framework to predict host clinical phenotype from longitudinal microbiome profiles via deep learning approach. Finally, using these methods and tools, we identified microbiome dynamics suggestive of the development of bronchiolitis obliterans syndrome in pediatric lung transplant recipients, insights that can be leveraged to improve lung transplant outcomes across life span.