University of Illinois Chicago
Browse

Improving the Human Genome Annotation Using Integrative Analysis and Deep Learning Methods.

Download (2.3 MB)
thesis
posted on 2023-05-01, 00:00 authored by Amira Kefi
The Human Genome Annotation (HGA) file is a database where different features describing elements of the genome (genes, transcripts, etc) are stored. HGA is the process of identifying those elements, characterizing them and elucidating their roles. Currently, HGA is still incomplete because it suffers from missed annotation and mis-annotation. Mis-annotation happens when some elements are wrongly annotated or labeled. Missed annotation happens when some elements are absent from the annotation files due to the limitations of analytical and experimental procedures. On one hand, improved identification of novel genome elements is required to help solve the problem of missed annotation. On the other hand, to address the problem of mis-annotation, better classification methods are needed to characterize and validate the novel elements. This thesis addresses the problem of incomplete human genome annotation and proposes an improved identification and validation approach via integration of second and third generation of sequencing. We apply this integrative approach to detect novel mono-exonic genes (MNEGs) and confirm their transcription and translation. Up until recent studies, MNEGs were thought to be artifacts and were discarded. However, our integrative analysis provided additional evidence for the genuine existence of these genes. In the second part of this project, we used computational methods based on a deep learning framework to validate these findings by characterizing MNEG types and classifying them into either proteins coding RNAs (pcRNAs) or long non-coding RNAs (lncRNAs). Our results showed that the majority of MNEGs are classified as lncRNAs and further investigation suggested that some of them are circRNAs. Finally, this work provides an innovative approach and a unique computational framework to address the problem of incomplete HGA and could be adopted by the annotators in their pipelines. This study is an important step towards the completion and the improvement of the human genome annotation.

History

Advisor

Liu, Chunyu

Chair

Dai, Yang

Department

Biomedical Engineering

Degree Grantor

University of Illinois at Chicago

Degree Level

  • Doctoral

Degree name

PhD, Doctor of Philosophy

Committee Member

Liang, Jie Yang, Jie Mankin, Alexander Glatt, Stephen J.

Submitted date

May 2023

Thesis type

application/pdf

Language

  • en

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC