University of Illinois Chicago
Browse

Computational Analysis of DNA Methylation and Gene Regulation

Download (4.46 MB)
thesis
posted on 2019-12-01, 00:00 authored by Jingting Xu
DNA methylation of CpG dinucleotides plays key roles in cellular differentiation, development, pathogenesis of diseases, cancer and aging. In this thesis, we propose computational methods to study three topics related to DNA methylation. Various machine learning models have been proposed to predict enhancer using histone modification profiles that are related to enhancer activities. However, LMRs that are highly enriched for enhances have not been explored for enhances prediction. Our framework, LMethyR-wSVM (Low Methylated Region-weighted Support Vector Machine) learns DNA sequence features derived from the WGBS DNA methylation profile to predict cell type-specific human enhancers. We have shown that our framework can predict enhancers with comparable accuracy and find unique enhancers. Moreover, new regulatory mechanisms may be revealed from the uniquely predicted enhancers from our framework. Second, we develop a publicly available R package MeDEStrand that transforms the enrichment-based DNA methylation profile (i.e. MeDIP-Seq data) to DNA absolute methylation level. Knowing the absolute DNA methylation level is important as it provides insights into biological processes as well as enables the comparison of the methylation levels between any loci genome-wide that enrichment-based signals cannot. We have shown that strand-specific processing of sequencing reads as well as using a sigmoidal CpG bias estimation improved from a previous work can significantly improve the accuracy to infer the absolute methylation levels. Our tool will facilitate the usage of the large amounts of MeDIP-Seq datasets in the public repository. Third, we develop a novel framework that uses convolutional neural networks (CNN) to integrate DNA methylation and DNA genomic sequences to study the regulatory relationship of gene expression. Previous works used correlation coefficient to unravel the association between regional DNA methylation level and gene expression. We have shown that our CNN models provide higher explained variation of gene expression. The CNN models unravel TF motifs that are validated by ChIP-Seq TF binding datasets. Further, the motifs associated with high-methylated binding sites are shown to be fully supported by the newly published MeDReaders database. In summary, the thesis works have shown the usefulness and insights into the researches related to DNA methylation provided by the computational methods.

History

Advisor

Dai, Yang

Chair

Dai, Yang

Department

Bioengineering

Degree Grantor

University of Illinois at Chicago

Degree Level

  • Doctoral

Degree name

PhD, Doctor of Philosophy

Committee Member

Ma, Ao Royston, Thomas Jeffery, Constance Benevolenskaya, Elizaveta

Submitted date

December 2019

Thesis type

application/pdf

Language

  • en

Issue date

2019-11-06

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC