University of Illinois Chicago
Browse

Novel Statistical Methods Through Data Integration for Disease-related Gene and Pathway Detection

Download (2.51 MB)
thesis
posted on 2018-07-25, 00:00 authored by Wenyi Qin
High-throughput technology such as microarray and next-generation sequencing (NGS) measure thousands of gene expression in one sample simultaneously. Detecting differentially expressed (DE) genes between disease and normal control group is one of the most common analyses in genome-wide transcriptomic data. DE genes are potential disease-related genes and are used for generating biological hypothesis of disease mechanism, developing potential clinical diagnosis tools and investigating potential drug targets. Pathway enrichment analysis is another widely accepted expression analysis tool which aims at detecting coordinated expression change within a pre-defined gene sets rather than individual genes. The benefit of gene set analysis over individual gene analysis includes more reproducible and interpretable results and detecting small but consistent change among gene set which could not be detected by DE gene analysis. There have been many successful applications of DE gene detection or gene set analysis in human diseases. However, when the sample size of a study is small, this will lead to lack of power to detect genes/pathways of importance to the disease. Public data integration would alleviate this situation. In this thesis, we first conducted a novel meta-analysis on schizophrenia patients aiming at identifying sex-related genes responsible for gender difference in schizophrenia patients. 46 genes were identified in male group while none were identified in female group due to lack of samples. Motivated by this, we then proposed a novel empirical Bayes based mixture model method to identify DE genes by borrowing shared information across multiple similar disease expression data sets based on the assumption that similar disease tend to share similar DE genes. Through simulation study and real data application, we demonstrated the improved identification power of the proposed method over single data set analysis and other popular meta-analysis methods. We further extended the proposed method to identify the altered gene sets. Simulation test and real data application demonstrated that more enriched pathways could be identified through our proposed methods over single data analysis alone. Overall, we expect that the methods presented in this thesis could provide researchers with a new approach of reusing public data sets when the sample size is limited.

History

Advisor

Lu, Hui

Chair

Lu, Hui

Department

Bioengineering

Degree Grantor

University of Illinois at Chicago

Degree Level

  • Doctoral

Committee Member

Dai, Yang Royston, Thomas Sodhi, Monsheel Zhang, Wei

Submitted date

May 2018

Issue date

2018-03-13

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC