posted on 2019-08-01, 00:00authored byGuido Walter Di Donato
The advent of the Next Generation Sequencing produced an explosion in the amount of genomic data generated, which resulted in the birth and early development of personalized medicine. In order to boost the research in this field, new bioinformatic tool are needed, which can keep up with the pace of NGS technologies.
In this scenario, the aim of this thesis is the design and the implementation of an efficient, easy-to-use short sequence mapper, to be used in various bioinformatic applications. At the core of the proposed tool there is an efficient implementation of a succinct data structure, allowing to compress the genomic data while still providing efficient queries on them. A com- prehensive description of the data encoding scheme is presented in this work, together with the characterization of the proposed data structure in terms of memory utilization and execution time.
The resulting sequence mapper is made available through an intuitive web application that guarantees high usability and provides great user experience. Moreover this thesis presents the design of an easily accessible hybrid sequence aligner, leveraging the compression capability of the proposed data structure to fully exploit the highly parallel architecture of FPGAs.
A validation of the presented software will be presented, in order to test the reliability of the results it produces. Finally, some consideration about future developments of this project will be proposed.