High-throughput technology such as microarray and next-generation sequencing (NGS) measure thousands of gene expression in one sample simultaneously. Detecting differentially expressed (DE) genes between disease and normal control group is one of the most common analyses in genome-wide transcriptomic data. DE genes are potential disease-related genes and are used for generating biological hypothesis of disease mechanism, developing potential clinical diagnosis tools and investigating potential drug targets. Pathway enrichment analysis is another widely accepted expression analysis tool which aims at detecting coordinated expression change within a pre-defined gene sets rather than individual genes. The benefit of gene set analysis over individual gene analysis includes more reproducible and interpretable results and detecting small but consistent change among gene set which could not be detected by DE gene analysis. There have been many successful applications of DE gene detection or gene set analysis in human diseases. However, when the sample size of a study is small, this will lead to lack of power to detect genes/pathways of importance to the disease. Public data integration would alleviate this situation.
In this thesis, we first conducted a novel meta-analysis on schizophrenia patients aiming at identifying sex-related genes responsible for gender difference in schizophrenia patients. 46 genes were identified in male group while none were identified in female group due to lack of samples. Motivated by this, we then proposed a novel empirical Bayes based mixture model method to identify DE genes by borrowing shared information across multiple similar disease expression data sets based on the assumption that similar disease tend to share similar DE genes. Through simulation study and real data application, we demonstrated the improved identification power of the proposed method over single data set analysis and other popular meta-analysis methods. We further extended the proposed method to identify the altered gene sets. Simulation test and real data application demonstrated that more enriched pathways could be identified through our proposed methods over single data analysis alone. Overall, we expect that the methods presented in this thesis could provide researchers with a new approach of reusing public data sets when the sample size is limited.
History
Advisor
Lu, Hui
Chair
Lu, Hui
Department
Bioengineering
Degree Grantor
University of Illinois at Chicago
Degree Level
Doctoral
Committee Member
Dai, Yang
Royston, Thomas
Sodhi, Monsheel
Zhang, Wei