University of Illinois Chicago
Browse

Machine Learning Approaches for the Integrative Analysis of Multi-omics Data

Download (7.49 MB)
thesis
posted on 2022-12-01, 00:00 authored by Shang Gao
Multi-omics data is a collective term which refers to multiple levels of “-omic” sequencing datasets. It can provide a comprehensive understanding of the regulatory mechanisms that link the genotype to phenotype. A large number of “omics” datasets have been generated from studies in various biological scenarios but there is a need for advanced analytical tools that derive meaningful biological insights from these datasets. In this thesis, three projects have been developed for integrating multiple “omics” datasets to answer distinct biological questions. In the first project, dimension reduction and clustering methods are applied on single cell RNA-sequencing data from lung endothelial cells to identify the cellular heterogeneity within the lung endothelium during lung injury and regeneration. We identified three major subpopulations in lung endothelial cells at baseline and at the time points post injury. One subpopulation is enriched for the expression of immune-related genes while another subpopulation is enriched for the expression of developmental genes. In the second project, a Bayesian Inference model, BITFAM, was developed to infer the transcription factors activities in single cells by integrating single cell RNA-seq data and bulk ChIP-seq data. We were able to validate that BITFAM could indeed infer the transcription factors activities using known biological functions as well as a publicly available dataset in which transcription factor deletion was achieved by CRISPR/Cas9 targeting of transcription factors. In the third project, we assessed the relative effect of DNA sequences and epigenetics modifications on gene expression using a deep learning framework, iSEGnet. I investigated the optimal regions that achieve the best prediction of gene expression. I also explored the most important regions and epigenetics modifications that impact gene expression as well as the regulatory mechanisms in these regions. These projects highlight the value of using machine learning and deep learning approaches to analyze multi-omic datasets and thereby identify regulatory mechanisms underlying gene expression.

History

Advisor

Rehman, JaleesDai, Yang

Chair

Rehman, Jalees

Department

biomedical engineering

Degree Grantor

University of Illinois at Chicago

Degree Level

  • Doctoral

Degree name

PhD, Doctor of Philosophy

Committee Member

Khetani, Salman Bernabe, Beatriz Penalver Wang, Xiaowei

Submitted date

December 2022

Thesis type

application/pdf

Language

  • en

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC