Integrative Analysis Strategies for Discovering Genetic Associations with Common Diseases

2013-06-28T00:00:00Z (GMT) by Joel Fontanarosa
Genetic association studies have proven to be successful at identifying reliable associations with complex diseases. However, the majority of these results are uninformative with respect to any functional basis, and more research is necessary appreciate the mechanisms by which these associations are related to pathogenic molecular alterations. In this project, we propose a number of computational approaches to address current challenges in genome-wide association studies: detection of gene-gene interactions, utilization of high performance computing resources, development of genomic risk prediction tools, and investigation into miRNA-associated variations that may lead to problematic modulations in transcriptional activity. First, we present an adaptive evolutionary optimization algorithm that utilizes local linkage disequilibrium patterns to improve the search for gene-gene interactions associated with a phenotype of interest. Our method was applied to several simulated disease models and to a real genome-wide association study. The results indicate that our method has improved power and computational efficiency for uncovering gene-gene interactions relative to one of the most powerful competing methods. This optimization strategy was extended into a parallel algorithm that uses state of the art computing methods involving graphics processing units to explore genome-wide association study data sets with maximal computational efficiency and minimal cost. Next, we present an improved penalized lasso regression strategy to build more accurate predictions of disease risk based on genomic and phenotypic information for case control studies. Using this approach on a simulated data set from the 1000 Genomes project, we were able to model disease risk using common and rare genetic variation in combination with quantitative trait information. Lastly, we present a framework for the determination of genomic variation associated with miRNA dysregulation. We applied our analysis method to several genome-wide association studies of common diseases to determine candidate targets for disease-associated dysfunctions in miRNA-related gene expression changes. The research in this thesis represents a set of computational tools and integrative analysis strategies that can be used to provide a detailed description of the genetic risk associated with a potentially complex inherited phenotype. Code developed in this project will be made available to the research community for further development and application to other genome-wide association studies.