The Human Genome Annotation (HGA) file is a database where different features describing elements of the genome (genes, transcripts, etc) are stored. HGA is the process of identifying those elements, characterizing them and elucidating their roles. Currently, HGA is still incomplete because it suffers from missed annotation and mis-annotation. Mis-annotation happens when some elements are wrongly annotated or labeled. Missed annotation happens when some elements are absent from the annotation files due to the limitations of analytical and experimental procedures. On one hand, improved identification of novel genome elements is required to help solve the problem of missed annotation. On the other hand, to address the problem of mis-annotation, better classification methods are needed to characterize and validate the novel elements. This thesis addresses the problem of incomplete human genome annotation and proposes an improved identification and validation approach via integration of second and third generation of sequencing. We apply this integrative approach to detect novel mono-exonic genes (MNEGs) and confirm their transcription and translation. Up until recent studies, MNEGs were thought to be artifacts and were discarded. However, our integrative analysis provided additional evidence for the genuine existence of these genes. In the second part of this project, we used computational methods based on a deep learning framework to validate these findings by characterizing MNEG types and classifying them into either proteins coding RNAs (pcRNAs) or long non-coding RNAs (lncRNAs). Our results showed that the majority of MNEGs are classified as lncRNAs and further investigation suggested that some of them are circRNAs. Finally, this work provides an innovative approach and a unique computational framework to address the problem of incomplete HGA and could be adopted by the annotators in their pipelines. This study is an important step towards the completion and the improvement of the human genome annotation.
History
Advisor
Liu, Chunyu
Chair
Dai, Yang
Department
Biomedical Engineering
Degree Grantor
University of Illinois at Chicago
Degree Level
Doctoral
Degree name
PhD, Doctor of Philosophy
Committee Member
Liang, Jie
Yang, Jie
Mankin, Alexander
Glatt, Stephen J.